class PGCriterion[T] extends TensorCriterion[T]
The Criterion to compute the negative policy gradient given a multinomial distribution and the sampled action and reward.
The input to this criterion should be a 2-D tensor representing a batch of multinomial distribution, the target should also be a 2-D tensor with the same size of input, representing the sampled action and reward/advantage with the index of non-zero element in the vector represents the sampled action and the non-zero element itself represents the reward. If the action is space is large, you should consider using SparseTensor for target.
The loss computed is simple the standard policy gradient,
loss = - 1/n * sum(R_{n} dot_product log(P_{n}))
where R_{n} is the reward vector, and P_{n} is the input distribution.
- Annotations
- @SerialVersionUID()
- Alphabetic
- By Inheritance
- PGCriterion
- TensorCriterion
- AbstractCriterion
- Serializable
- Serializable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
PGCriterion(sizeAverage: Boolean = false)(implicit arg0: ClassTag[T], ev: TensorNumeric[T])
- sizeAverage
whether to average the loss over each observations.
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
backward(input: Tensor[T], target: Tensor[T]): Tensor[T]
Performs a back-propagation step through the criterion, with respect to the given input.
Performs a back-propagation step through the criterion, with respect to the given input.
- input
input data
- target
target
- returns
gradient corresponding to input data
- Definition Classes
- AbstractCriterion
-
def
canEqual(other: Any): Boolean
- Definition Classes
- AbstractCriterion
-
def
clone(): AnyRef
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @native() @throws( ... )
-
def
cloneCriterion(): AbstractCriterion[Tensor[T], Tensor[T], T]
Deep copy this criterion
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(other: Any): Boolean
- Definition Classes
- AbstractCriterion → AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
forward(input: Tensor[T], target: Tensor[T]): T
Takes an input object, and computes the corresponding loss of the criterion, compared with
target.Takes an input object, and computes the corresponding loss of the criterion, compared with
target.- input
input data
- target
target
- returns
the loss of criterion
- Definition Classes
- AbstractCriterion
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
var
gradInput: Tensor[T]
- Definition Classes
- AbstractCriterion
-
def
hashCode(): Int
- Definition Classes
- AbstractCriterion → AnyRef → Any
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
var
output: T
- Definition Classes
- AbstractCriterion
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
updateGradInput(input: Tensor[T], target: Tensor[T]): Tensor[T]
Computing the gradient of the criterion with respect to its own input.
Computing the gradient of the criterion with respect to its own input. This is returned in gradInput. Also, the gradInput state variable is updated accordingly.
- input
input data
- target
target data / labels
- returns
gradient of input
- Definition Classes
- PGCriterion → AbstractCriterion
-
def
updateOutput(input: Tensor[T], target: Tensor[T]): T
Computes the loss using input and objective function.
Computes the loss using input and objective function. This function returns the result which is stored in the output field.
- input
input of the criterion
- target
target or labels
- returns
the loss of the criterion
- Definition Classes
- PGCriterion → AbstractCriterion
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @throws( ... )