random generator seed, default=2018 ------------------------------------------------------------------------------------------------------------ REGULARIZATION
l2 regularization, default=0.0
l1 regularization, default=0.0
l2 bias term, default=0.0
l1 bias term, default=0.0
whether to use weight noise (drop connect), default=false
weight retain probability for the weight noise (drop-connect), default=1 (no drop-connect)
whether apply to biases for the weight noise (drop-connect), default=false ------------------------------------------------------------------------------------------------------------------ OPTIMIZATION
optimization algorithm (default=STOCHASTIC_GRADIENT_DESCENT)
STOCHASTIC_GRADIENT_DESCENT://StochasticGradientDescent.java LINE_GRADIENT_DESCENT://LineGradientDescent.java CONJUGATE_GRADIENT://ConjugateGradient.java LBFGS://LBFGS.java
whether to use mini-batch, default=true
learning rate, default=0.1
gradient moving avg decay rate, default=0.9
gradient sqrt decay rate, default=0.999
default=1E-8
NESTEROVS momentum, default=0.9
RMSPROP decay rate, default=0.95
ADADELTA decay rate, default=0.95
weights updater, (default = NESTEROVS). Options:
SGD: //Sgd.java learningRate: learning rate (default = 1E-3) ADAM: //Adam.java learningRate: learning rate, DEFAULT_ADAM_LEARNING_RATE = 1e-3; beta1: gradient moving avg decay rate, DEFAULT_ADAM_BETA1_MEAN_DECAY = 0.9; beta2: gradient sqrt decay rate, DEFAULT_ADAM_BETA2_VAR_DECAY = 0.999; epsilon: epsilon, DEFAULT_ADAM_EPSILON = 1e-8; //Adam: A Method for Stochastic Optimization ADAMAX: //AdaMax.java learningRate: learning rate, DEFAULT_ADAMAX_LEARNING_RATE = 1e-3; beta1: gradient moving avg decay rate, DEFAULT_ADAMAX_BETA1_MEAN_DECAY = 0.9; beta2: gradient sqrt decay rate, DEFAULT_ADAMAX_BETA2_VAR_DECAY = 0.999; epsilon: epsilon, DEFAULT_ADAMAX_EPSILON = 1e-8; //Adam: A Method for Stochastic Optimization NADAM://Nadam.java learningRate: learning rate, DEFAULT_NADAM_LEARNING_RATE = 1e-3; epsilon: DEFAULT_NADAM_EPSILON = 1e-8; beta1: gradient moving avg decay rate, DEFAULT_NADAM_BETA1_MEAN_DECAY = 0.9; beta2: gradient sqrt decay rate, DEFAULT_NADAM_BETA2_VAR_DECAY = 0.999; //An overview of gradient descent optimization algorithms AMSGRAD: //AMSGrad.java learningRate: learning rate, DEFAULT_AMSGRAD_LEARNING_RATE = 1e-3; epsilon: DEFAULT_AMSGRAD_EPSILON = 1e-8; beta1: DEFAULT_AMSGRAD_BETA1_MEAN_DECAY = 0.9; beta2: DEFAULT_AMSGRAD_BETA2_VAR_DECAY = 0.999; ADAGRAD: Vectorized Learning Rate used per Connection Weight//AdaGrad.java learningRate: learning rate, DEFAULT_ADAGRAD_LEARNING_RATE = 1e-1; epsilon: DEFAULT_ADAGRAD_EPSILON = 1e-6; //Adaptive Subgradient Methods for Online Learning and Stochastic Optimization //Adagrad – eliminating learning rates in stochastic gradient descent NESTEROVS: tracks previous layer's gradient and uses it as a way of updating the gradient //Nesterovs.java learningRate: learning rate, DEFAULT_NESTEROV_LEARNING_RATE = 0.1; momentum: DEFAULT_NESTEROV_MOMENTUM = 0.9; RMSPROP: //RmsProp.java learningRate: learning rate, DEFAULT_RMSPROP_LEARNING_RATE = 1e-1; epsilon: DEFAULT_RMSPROP_EPSILON = 1e-8; rmsDecay: decay rate, DEFAULT_RMSPROP_RMSDECAY = 0.95; //Neural Networks for Machine Learning ADADELTA: //AdaDelta.java rho: decay rate, controlling the decay of the previous parameter updates, DEFAULT_ADADELTA_RHO = 0.95; epsilon: DEFAULT_ADADELTA_EPSILON = 1e-6; (no need to manually set the learning rate) //ADADELTA: AN ADAPTIVE LEARNING RATE METHOD NONE: no updates //NoOp.java
gradient normalization, default=None Options: GradientNormalization.X
ClipElementWiseAbsoluteValue: g <- sign(g)*max(maxAllowedValue,|g|). ClipL2PerLayer: GOut = G if l2Norm(G) < threshold (i.e., no change) GOut = threshold * G / l2Norm(G) otherwise ClipL2PerParamType: conditional renormalization. Very similar to ClipL2PerLayer, however instead of clipping per layer, do clipping on each parameter type separately. None: no gradient normalization RenormalizeL2PerLayer: rescale gradients by dividing by the L2 norm of all gradients for the layer RenormalizeL2PerParamType: GOut_weight = G_weight / l2(G_weight) GOut_bias = G_bias / l2(G_bias)
gradient threshold, default=0.5 -------------------------------------------------------------------------------------------------------------------------------------------
Base Configuration Builder
Base Configuration Builder
Optimization Configuration Builder
Optimization Configuration Builder
Regularization Configuration Builder
Regularization Configuration Builder
Neural Network Base
BASE -------------------------------------------------------------------------------------------------------------