Class DNGPolicy<N,​A>

  • Type Parameters:
    N -
    A -
    All Implemented Interfaces:
    IPathUpdatablePolicy<N,​A,​java.lang.Double>, IPolicy<N,​A>, org.api4.java.common.control.ILoggingCustomizable, org.api4.java.common.event.IEventEmitter<java.lang.Object>, org.api4.java.common.event.IRelaxedEventEmitter

    public class DNGPolicy<N,​A>
    extends java.lang.Object
    implements IPathUpdatablePolicy<N,​A,​java.lang.Double>, org.api4.java.common.control.ILoggingCustomizable, org.api4.java.common.event.IRelaxedEventEmitter
    This is the implementation of the DNG-algorithm (for MDPs) presented in
    • Constructor Summary

      Constructors 
      Constructor Description
      DNGPolicy​(double gammaMDP, java.util.function.Predicate<N> terminalStatePredicate, double varianceFactor, double lambda)  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      A getAction​(N node, java.util.Collection<A> actionsWithSuccessors)  
      java.lang.String getLoggerName()  
      double getQValue​(N state, A action)
      In the deterministic case (and when transitions are clear), without discounts, and inner rewards = 0, the QValue function in the paper degenerates to just returning the value of the successor state of the given state.
      double getValue​(N state)
      The Value procedure of the paper
      boolean isSampling()  
      void registerListener​(java.lang.Object listener)  
      ai.libs.jaicore.basic.sets.Pair<java.lang.Double,​java.lang.Double> sampleWithNormalGamma​(N state)  
      A sampleWithThompson​(N state, java.util.Collection<A> actions)
      The ThompsonSampling procedure of the paper
      void setLoggerName​(java.lang.String name)  
      void setSampling​(boolean sampling)  
      void updatePath​(org.api4.java.datastructure.graph.ILabeledPath<N,​A> path, java.util.List<java.lang.Double> scores)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • DNGPolicy

        public DNGPolicy​(double gammaMDP,
                         java.util.function.Predicate<N> terminalStatePredicate,
                         double varianceFactor,
                         double lambda)
    • Method Detail

      • isSampling

        public boolean isSampling()
      • setSampling

        public void setSampling​(boolean sampling)
      • updatePath

        public void updatePath​(org.api4.java.datastructure.graph.ILabeledPath<N,​A> path,
                               java.util.List<java.lang.Double> scores)
        Specified by:
        updatePath in interface IPathUpdatablePolicy<N,​A,​java.lang.Double>
      • sampleWithThompson

        public A sampleWithThompson​(N state,
                                    java.util.Collection<A> actions)
                             throws java.lang.InterruptedException
        The ThompsonSampling procedure of the paper
        Parameters:
        state -
        actions -
        Returns:
        Throws:
        java.lang.InterruptedException
        org.api4.java.common.attributedobjects.ObjectEvaluationFailedException
      • getQValue

        public double getQValue​(N state,
                                A action)
                         throws java.lang.InterruptedException
        In the deterministic case (and when transitions are clear), without discounts, and inner rewards = 0, the QValue function in the paper degenerates to just returning the value of the successor state of the given state.
        Parameters:
        state - This is not needed and may be null; is rather for documentation here.
        successorState -
        Returns:
        Throws:
        java.lang.InterruptedException
        org.api4.java.common.attributedobjects.ObjectEvaluationFailedException
      • sampleWithNormalGamma

        public ai.libs.jaicore.basic.sets.Pair<java.lang.Double,​java.lang.Double> sampleWithNormalGamma​(N state)
      • getValue

        public double getValue​(N state)
                        throws java.lang.InterruptedException
        The Value procedure of the paper
        Returns:
        Throws:
        java.lang.InterruptedException
        org.api4.java.common.attributedobjects.ObjectEvaluationFailedException
      • getLoggerName

        public java.lang.String getLoggerName()
        Specified by:
        getLoggerName in interface org.api4.java.common.control.ILoggingCustomizable
      • setLoggerName

        public void setLoggerName​(java.lang.String name)
        Specified by:
        setLoggerName in interface org.api4.java.common.control.ILoggingCustomizable
      • registerListener

        public void registerListener​(java.lang.Object listener)
        Specified by:
        registerListener in interface org.api4.java.common.event.IEventEmitter<N>