Class DNGPolicy<N,A>
- java.lang.Object
-
- ai.libs.jaicore.search.algorithms.mdp.mcts.thompson.DNGPolicy<N,A>
-
- Type Parameters:
N-A-
- All Implemented Interfaces:
IPathUpdatablePolicy<N,A,java.lang.Double>,IPolicy<N,A>,org.api4.java.common.control.ILoggingCustomizable,org.api4.java.common.event.IEventEmitter<java.lang.Object>,org.api4.java.common.event.IRelaxedEventEmitter
public class DNGPolicy<N,A> extends java.lang.Object implements IPathUpdatablePolicy<N,A,java.lang.Double>, org.api4.java.common.control.ILoggingCustomizable, org.api4.java.common.event.IRelaxedEventEmitter
This is the implementation of the DNG-algorithm (for MDPs) presented in
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description AgetAction(N node, java.util.Collection<A> actionsWithSuccessors)java.lang.StringgetLoggerName()doublegetQValue(N state, A action)In the deterministic case (and when transitions are clear), without discounts, and inner rewards = 0, the QValue function in the paper degenerates to just returning the value of the successor state of the given state.doublegetValue(N state)The Value procedure of the paperbooleanisSampling()voidregisterListener(java.lang.Object listener)ai.libs.jaicore.basic.sets.Pair<java.lang.Double,java.lang.Double>sampleWithNormalGamma(N state)AsampleWithThompson(N state, java.util.Collection<A> actions)The ThompsonSampling procedure of the papervoidsetLoggerName(java.lang.String name)voidsetSampling(boolean sampling)voidupdatePath(org.api4.java.datastructure.graph.ILabeledPath<N,A> path, java.util.List<java.lang.Double> scores)
-
-
-
Constructor Detail
-
DNGPolicy
public DNGPolicy(double gammaMDP, java.util.function.Predicate<N> terminalStatePredicate, double varianceFactor, double lambda)
-
-
Method Detail
-
isSampling
public boolean isSampling()
-
setSampling
public void setSampling(boolean sampling)
-
getAction
public A getAction(N node, java.util.Collection<A> actionsWithSuccessors) throws ActionPredictionFailedException, java.lang.InterruptedException
- Specified by:
getActionin interfaceIPolicy<N,A>- Throws:
ActionPredictionFailedExceptionjava.lang.InterruptedException
-
updatePath
public void updatePath(org.api4.java.datastructure.graph.ILabeledPath<N,A> path, java.util.List<java.lang.Double> scores)
- Specified by:
updatePathin interfaceIPathUpdatablePolicy<N,A,java.lang.Double>
-
sampleWithThompson
public A sampleWithThompson(N state, java.util.Collection<A> actions) throws java.lang.InterruptedException
The ThompsonSampling procedure of the paper- Parameters:
state-actions-- Returns:
- Throws:
java.lang.InterruptedExceptionorg.api4.java.common.attributedobjects.ObjectEvaluationFailedException
-
getQValue
public double getQValue(N state, A action) throws java.lang.InterruptedException
In the deterministic case (and when transitions are clear), without discounts, and inner rewards = 0, the QValue function in the paper degenerates to just returning the value of the successor state of the given state.- Parameters:
state- This is not needed and may be null; is rather for documentation here.successorState-- Returns:
- Throws:
java.lang.InterruptedExceptionorg.api4.java.common.attributedobjects.ObjectEvaluationFailedException
-
sampleWithNormalGamma
public ai.libs.jaicore.basic.sets.Pair<java.lang.Double,java.lang.Double> sampleWithNormalGamma(N state)
-
getValue
public double getValue(N state) throws java.lang.InterruptedException
The Value procedure of the paper- Returns:
- Throws:
java.lang.InterruptedExceptionorg.api4.java.common.attributedobjects.ObjectEvaluationFailedException
-
getLoggerName
public java.lang.String getLoggerName()
- Specified by:
getLoggerNamein interfaceorg.api4.java.common.control.ILoggingCustomizable
-
setLoggerName
public void setLoggerName(java.lang.String name)
- Specified by:
setLoggerNamein interfaceorg.api4.java.common.control.ILoggingCustomizable
-
registerListener
public void registerListener(java.lang.Object listener)
- Specified by:
registerListenerin interfaceorg.api4.java.common.event.IEventEmitter<N>
-
-