Class SDNN


  • public class SDNN
    extends SDOps
    • Constructor Detail

      • SDNN

        public SDNN​(SameDiff sameDiff)
    • Method Detail

      • cReLU

        public SDVariable cReLU​(SDVariable x)
        Concatenates a ReLU which selects only the positive part of the activation with a ReLU which selects only the negative part of the activation. Note that as a result this non-linearity doubles the depth of the activations.
        Parameters:
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • cReLU

        public SDVariable cReLU​(String name,
                                SDVariable x)
        Concatenates a ReLU which selects only the positive part of the activation with a ReLU which selects only the negative part of the activation. Note that as a result this non-linearity doubles the depth of the activations.
        Parameters:
        name - name May be null. Name for the output variable
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • batchNorm

        public SDVariable batchNorm​(SDVariable input,
                                    SDVariable mean,
                                    SDVariable variance,
                                    SDVariable gamma,
                                    SDVariable beta,
                                    double epsilon,
                                    int... axis)
        Neural network batch normalization operation.
        For details, see https://arxiv.org/abs/1502.03167
        Parameters:
        input - Input variable. (NUMERIC type)
        mean - Mean value. For 1d axis, this should match input.size(axis) (NUMERIC type)
        variance - Variance value. For 1d axis, this should match input.size(axis) (NUMERIC type)
        gamma - Gamma value. For 1d axis, this should match input.size(axis) (NUMERIC type)
        beta - Beta value. For 1d axis, this should match input.size(axis) (NUMERIC type)
        epsilon - Epsilon constant for numerical stability (to avoid division by 0)
        axis - For 2d CNN activations: 1 for NCHW format activations, or 3 for NHWC format activations. For 3d CNN activations: 1 for NCDHW format, 4 for NDHWC For 1d/RNN activations: 1 for NCW format, 2 for NWC (Size: AtLeast(min=1))
        Returns:
        output variable for batch normalization (NUMERIC type)
      • batchNorm

        public SDVariable batchNorm​(String name,
                                    SDVariable input,
                                    SDVariable mean,
                                    SDVariable variance,
                                    SDVariable gamma,
                                    SDVariable beta,
                                    double epsilon,
                                    int... axis)
        Neural network batch normalization operation.
        For details, see https://arxiv.org/abs/1502.03167
        Parameters:
        name - name May be null. Name for the output variable
        input - Input variable. (NUMERIC type)
        mean - Mean value. For 1d axis, this should match input.size(axis) (NUMERIC type)
        variance - Variance value. For 1d axis, this should match input.size(axis) (NUMERIC type)
        gamma - Gamma value. For 1d axis, this should match input.size(axis) (NUMERIC type)
        beta - Beta value. For 1d axis, this should match input.size(axis) (NUMERIC type)
        epsilon - Epsilon constant for numerical stability (to avoid division by 0)
        axis - For 2d CNN activations: 1 for NCHW format activations, or 3 for NHWC format activations. For 3d CNN activations: 1 for NCDHW format, 4 for NDHWC For 1d/RNN activations: 1 for NCW format, 2 for NWC (Size: AtLeast(min=1))
        Returns:
        output variable for batch normalization (NUMERIC type)
      • biasAdd

        public SDVariable biasAdd​(SDVariable input,
                                  SDVariable bias,
                                  boolean nchw)
        Bias addition operation: a special case of addition, typically used with CNN 4D activations and a 1D bias vector
        Parameters:
        input - 4d input variable (NUMERIC type)
        bias - 1d bias (NUMERIC type)
        nchw - The format - nchw=true means [minibatch, channels, height, width] format; nchw=false - [minibatch, height, width, channels]. Unused for 2d inputs
        Returns:
        output Output variable, after applying bias add operation (NUMERIC type)
      • biasAdd

        public SDVariable biasAdd​(String name,
                                  SDVariable input,
                                  SDVariable bias,
                                  boolean nchw)
        Bias addition operation: a special case of addition, typically used with CNN 4D activations and a 1D bias vector
        Parameters:
        name - name May be null. Name for the output variable
        input - 4d input variable (NUMERIC type)
        bias - 1d bias (NUMERIC type)
        nchw - The format - nchw=true means [minibatch, channels, height, width] format; nchw=false - [minibatch, height, width, channels]. Unused for 2d inputs
        Returns:
        output Output variable, after applying bias add operation (NUMERIC type)
      • dotProductAttention

        public SDVariable dotProductAttention​(SDVariable queries,
                                              SDVariable keys,
                                              SDVariable values,
                                              SDVariable mask,
                                              boolean scaled)
        This operation performs dot product attention on the given timeseries input with the given queries
        out = sum(similarity(k_i, q) * v_i)

        similarity(k, q) = softmax(k * q) where x * q is the dot product of x and q

        Optionally with normalization step:
        similarity(k, q) = softmax(k * q / sqrt(size(q))

        See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, p. 4, eq. 1)

        Note: This supports multiple queries at once, if only one query is available the queries vector still has to
        be 3D but can have queryCount = 1

        Note: keys and values usually is the same array. If you want to use it as the same array, simply pass it for
        both.

        Note: Queries, keys and values must either be all rank 3 or all rank 4 arrays. Mixing them doesn't work. The
        output rank will depend on the input rank.
        Parameters:
        queries - input 3D array "queries" of shape [batchSize, featureKeys, queryCount] or 4D array of shape [batchSize, numHeads, featureKeys, queryCount] (NUMERIC type)
        keys - input 3D array "keys" of shape [batchSize, featureKeys, timesteps] or 4D array of shape [batchSize, numHeads, featureKeys, timesteps] (NUMERIC type)
        values - input 3D array "values" of shape [batchSize, featureValues, timesteps] or 4D array of shape [batchSize, numHeads, featureValues, timesteps] (NUMERIC type)
        mask - OPTIONAL; array that defines which values should be skipped of shape [batchSize, timesteps] (NUMERIC type)
        scaled - normalization, false -> do not apply normalization, true -> apply normalization
        Returns:
        output Attention result arrays of shape [batchSize, featureValues, queryCount] or [batchSize, numHeads, featureValues, queryCount], (optionally) Attention Weights of shape [batchSize, timesteps, queryCount] or [batchSize, numHeads, timesteps, queryCount] (NUMERIC type)
      • dotProductAttention

        public SDVariable dotProductAttention​(String name,
                                              SDVariable queries,
                                              SDVariable keys,
                                              SDVariable values,
                                              SDVariable mask,
                                              boolean scaled)
        This operation performs dot product attention on the given timeseries input with the given queries
        out = sum(similarity(k_i, q) * v_i)

        similarity(k, q) = softmax(k * q) where x * q is the dot product of x and q

        Optionally with normalization step:
        similarity(k, q) = softmax(k * q / sqrt(size(q))

        See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, p. 4, eq. 1)

        Note: This supports multiple queries at once, if only one query is available the queries vector still has to
        be 3D but can have queryCount = 1

        Note: keys and values usually is the same array. If you want to use it as the same array, simply pass it for
        both.

        Note: Queries, keys and values must either be all rank 3 or all rank 4 arrays. Mixing them doesn't work. The
        output rank will depend on the input rank.
        Parameters:
        name - name May be null. Name for the output variable
        queries - input 3D array "queries" of shape [batchSize, featureKeys, queryCount] or 4D array of shape [batchSize, numHeads, featureKeys, queryCount] (NUMERIC type)
        keys - input 3D array "keys" of shape [batchSize, featureKeys, timesteps] or 4D array of shape [batchSize, numHeads, featureKeys, timesteps] (NUMERIC type)
        values - input 3D array "values" of shape [batchSize, featureValues, timesteps] or 4D array of shape [batchSize, numHeads, featureValues, timesteps] (NUMERIC type)
        mask - OPTIONAL; array that defines which values should be skipped of shape [batchSize, timesteps] (NUMERIC type)
        scaled - normalization, false -> do not apply normalization, true -> apply normalization
        Returns:
        output Attention result arrays of shape [batchSize, featureValues, queryCount] or [batchSize, numHeads, featureValues, queryCount], (optionally) Attention Weights of shape [batchSize, timesteps, queryCount] or [batchSize, numHeads, timesteps, queryCount] (NUMERIC type)
      • dropout

        public SDVariable dropout​(SDVariable input,
                                  double inputRetainProbability)
        Dropout operation
        Parameters:
        input - Input array (NUMERIC type)
        inputRetainProbability - Probability of retaining an input (set to 0 with probability 1-p)
        Returns:
        output Output (NUMERIC type)
      • dropout

        public SDVariable dropout​(String name,
                                  SDVariable input,
                                  double inputRetainProbability)
        Dropout operation
        Parameters:
        name - name May be null. Name for the output variable
        input - Input array (NUMERIC type)
        inputRetainProbability - Probability of retaining an input (set to 0 with probability 1-p)
        Returns:
        output Output (NUMERIC type)
      • dropoutInverted

        public SDVariable dropoutInverted​(SDVariable input,
                                          double p)
        Dropout inverted operation. The dropout probability p is the probability of dropping an input.
        Parameters:
        input - Input array (NUMERIC type)
        p - Probability of dropping an input (set to 0 with probability p)
        Returns:
        output Output (NUMERIC type)
      • dropoutInverted

        public SDVariable dropoutInverted​(String name,
                                          SDVariable input,
                                          double p)
        Dropout inverted operation. The dropout probability p is the probability of dropping an input.
        Parameters:
        name - name May be null. Name for the output variable
        input - Input array (NUMERIC type)
        p - Probability of dropping an input (set to 0 with probability p)
        Returns:
        output Output (NUMERIC type)
      • elu

        public SDVariable elu​(SDVariable x)
        Element-wise exponential linear unit (ELU) function:
        out = x if x > 0
        out = a * (exp(x) - 1) if x <= 0
        with constant a = 1.0


        See: https://arxiv.org/abs/1511.07289

        Parameters:
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • elu

        public SDVariable elu​(String name,
                              SDVariable x)
        Element-wise exponential linear unit (ELU) function:
        out = x if x > 0
        out = a * (exp(x) - 1) if x <= 0
        with constant a = 1.0


        See: https://arxiv.org/abs/1511.07289

        Parameters:
        name - name May be null. Name for the output variable
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • gelu

        public SDVariable gelu​(SDVariable x)
        GELU activation function - Gaussian Error Linear Units
        For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415
        This method uses the sigmoid approximation
        Parameters:
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • gelu

        public SDVariable gelu​(String name,
                               SDVariable x)
        GELU activation function - Gaussian Error Linear Units
        For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415
        This method uses the sigmoid approximation
        Parameters:
        name - name May be null. Name for the output variable
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • hardSigmoid

        public SDVariable hardSigmoid​(SDVariable x)
        Element-wise hard sigmoid function:
        out[i] = 0 if in[i] <= -2.5
        out[1] = 0.2*in[i]+0.5 if -2.5 < in[i] < 2.5
        out[i] = 1 if in[i] >= 2.5
        Parameters:
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • hardSigmoid

        public SDVariable hardSigmoid​(String name,
                                      SDVariable x)
        Element-wise hard sigmoid function:
        out[i] = 0 if in[i] <= -2.5
        out[1] = 0.2*in[i]+0.5 if -2.5 < in[i] < 2.5
        out[i] = 1 if in[i] >= 2.5
        Parameters:
        name - name May be null. Name for the output variable
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • hardTanh

        public SDVariable hardTanh​(SDVariable x)
        Element-wise hard tanh function:
        out[i] = -1 if in[i] <= -1
        out[1] = in[i] if -1 < in[i] < 1
        out[i] = 1 if in[i] >= 1
        Parameters:
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • hardTanh

        public SDVariable hardTanh​(String name,
                                   SDVariable x)
        Element-wise hard tanh function:
        out[i] = -1 if in[i] <= -1
        out[1] = in[i] if -1 < in[i] < 1
        out[i] = 1 if in[i] >= 1
        Parameters:
        name - name May be null. Name for the output variable
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • hardTanhDerivative

        public SDVariable hardTanhDerivative​(SDVariable x)
        Derivative (dOut/dIn) of the element-wise hard Tanh function - hardTanh(INDArray)
        Parameters:
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • hardTanhDerivative

        public SDVariable hardTanhDerivative​(String name,
                                             SDVariable x)
        Derivative (dOut/dIn) of the element-wise hard Tanh function - hardTanh(INDArray)
        Parameters:
        name - name May be null. Name for the output variable
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • layerNorm

        public SDVariable layerNorm​(SDVariable input,
                                    SDVariable gain,
                                    SDVariable bias,
                                    boolean channelsFirst,
                                    int... dimensions)
        Apply Layer Normalization

        y = gain * standardize(x) + bias
        Parameters:
        input - Input variable (NUMERIC type)
        gain - Gain (NUMERIC type)
        bias - Bias (NUMERIC type)
        channelsFirst - For 2D input - unused. True for NCHW (minibatch, channels, height, width), false for NHWC data
        dimensions - Dimensions to perform layer norm over - dimension=1 for 2d/MLP data, dimension=1,2,3 for CNNs (Size: AtLeast(min=1))
        Returns:
        output Output variable (NUMERIC type)
      • layerNorm

        public SDVariable layerNorm​(String name,
                                    SDVariable input,
                                    SDVariable gain,
                                    SDVariable bias,
                                    boolean channelsFirst,
                                    int... dimensions)
        Apply Layer Normalization

        y = gain * standardize(x) + bias
        Parameters:
        name - name May be null. Name for the output variable
        input - Input variable (NUMERIC type)
        gain - Gain (NUMERIC type)
        bias - Bias (NUMERIC type)
        channelsFirst - For 2D input - unused. True for NCHW (minibatch, channels, height, width), false for NHWC data
        dimensions - Dimensions to perform layer norm over - dimension=1 for 2d/MLP data, dimension=1,2,3 for CNNs (Size: AtLeast(min=1))
        Returns:
        output Output variable (NUMERIC type)
      • layerNorm

        public SDVariable layerNorm​(SDVariable input,
                                    SDVariable gain,
                                    boolean channelsFirst,
                                    int... dimensions)
        Apply Layer Normalization

        y = gain * standardize(x) + bias
        Parameters:
        input - Input variable (NUMERIC type)
        gain - Gain (NUMERIC type)
        channelsFirst - For 2D input - unused. True for NCHW (minibatch, channels, height, width), false for NHWC data
        dimensions - Dimensions to perform layer norm over - dimension=1 for 2d/MLP data, dimension=1,2,3 for CNNs (Size: AtLeast(min=1))
        Returns:
        output Output variable (NUMERIC type)
      • layerNorm

        public SDVariable layerNorm​(String name,
                                    SDVariable input,
                                    SDVariable gain,
                                    boolean channelsFirst,
                                    int... dimensions)
        Apply Layer Normalization

        y = gain * standardize(x) + bias
        Parameters:
        name - name May be null. Name for the output variable
        input - Input variable (NUMERIC type)
        gain - Gain (NUMERIC type)
        channelsFirst - For 2D input - unused. True for NCHW (minibatch, channels, height, width), false for NHWC data
        dimensions - Dimensions to perform layer norm over - dimension=1 for 2d/MLP data, dimension=1,2,3 for CNNs (Size: AtLeast(min=1))
        Returns:
        output Output variable (NUMERIC type)
      • leakyRelu

        public SDVariable leakyRelu​(SDVariable x,
                                    double alpha)
        Element-wise leaky ReLU function:
        out = x if x >= 0.0
        out = alpha * x if x < cutoff
        Alpha value is most commonly set to 0.01
        Parameters:
        x - Input variable (NUMERIC type)
        alpha - Cutoff - commonly 0.01
        Returns:
        output Output variable (NUMERIC type)
      • leakyRelu

        public SDVariable leakyRelu​(String name,
                                    SDVariable x,
                                    double alpha)
        Element-wise leaky ReLU function:
        out = x if x >= 0.0
        out = alpha * x if x < cutoff
        Alpha value is most commonly set to 0.01
        Parameters:
        name - name May be null. Name for the output variable
        x - Input variable (NUMERIC type)
        alpha - Cutoff - commonly 0.01
        Returns:
        output Output variable (NUMERIC type)
      • leakyReluDerivative

        public SDVariable leakyReluDerivative​(SDVariable x,
                                              double alpha)
        Leaky ReLU derivative: dOut/dIn given input.
        Parameters:
        x - Input variable (NUMERIC type)
        alpha - Cutoff - commonly 0.01
        Returns:
        output Output variable (NUMERIC type)
      • leakyReluDerivative

        public SDVariable leakyReluDerivative​(String name,
                                              SDVariable x,
                                              double alpha)
        Leaky ReLU derivative: dOut/dIn given input.
        Parameters:
        name - name May be null. Name for the output variable
        x - Input variable (NUMERIC type)
        alpha - Cutoff - commonly 0.01
        Returns:
        output Output variable (NUMERIC type)
      • linear

        public SDVariable linear​(SDVariable input,
                                 SDVariable weights,
                                 SDVariable bias)
        Linear layer operation: out = mmul(in,w) + bias
        Note that bias array is optional
        Parameters:
        input - Input data (NUMERIC type)
        weights - Weights variable, shape [nIn, nOut] (NUMERIC type)
        bias - Optional bias variable (may be null) (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • linear

        public SDVariable linear​(String name,
                                 SDVariable input,
                                 SDVariable weights,
                                 SDVariable bias)
        Linear layer operation: out = mmul(in,w) + bias
        Note that bias array is optional
        Parameters:
        name - name May be null. Name for the output variable
        input - Input data (NUMERIC type)
        weights - Weights variable, shape [nIn, nOut] (NUMERIC type)
        bias - Optional bias variable (may be null) (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • logSigmoid

        public SDVariable logSigmoid​(SDVariable x)
        Element-wise sigmoid function: out[i] = log(sigmoid(in[i]))
        Parameters:
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • logSigmoid

        public SDVariable logSigmoid​(String name,
                                     SDVariable x)
        Element-wise sigmoid function: out[i] = log(sigmoid(in[i]))
        Parameters:
        name - name May be null. Name for the output variable
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • logSoftmax

        public SDVariable logSoftmax​(SDVariable x)
        Log softmax activation
        Parameters:
        x - (NUMERIC type)
        Returns:
        output (NUMERIC type)
      • logSoftmax

        public SDVariable logSoftmax​(String name,
                                     SDVariable x)
        Log softmax activation
        Parameters:
        name - name May be null. Name for the output variable
        x - (NUMERIC type)
        Returns:
        output (NUMERIC type)
      • logSoftmax

        public SDVariable logSoftmax​(SDVariable x,
                                     int dimension)
        Log softmax activation
        Parameters:
        x - Input (NUMERIC type)
        dimension - Dimension along which to apply log softmax
        Returns:
        output Output - log(softmax(input)) (NUMERIC type)
      • logSoftmax

        public SDVariable logSoftmax​(String name,
                                     SDVariable x,
                                     int dimension)
        Log softmax activation
        Parameters:
        name - name May be null. Name for the output variable
        x - Input (NUMERIC type)
        dimension - Dimension along which to apply log softmax
        Returns:
        output Output - log(softmax(input)) (NUMERIC type)
      • multiHeadDotProductAttention

        public SDVariable multiHeadDotProductAttention​(SDVariable queries,
                                                       SDVariable keys,
                                                       SDVariable values,
                                                       SDVariable Wq,
                                                       SDVariable Wk,
                                                       SDVariable Wv,
                                                       SDVariable Wo,
                                                       SDVariable mask,
                                                       boolean scaled)
        This performs multi-headed dot product attention on the given timeseries input
        out = concat(head_1, head_2, ..., head_n) * Wo
        head_i = dot_product_attention(Wq_i*q, Wk_i*k, Wv_i*v)

        Optionally with normalization when calculating the attention for each head.

        See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, pp. 4,5, "3.2.2 Multi-Head Attention")

        This makes use of dot_product_attention OP support for rank 4 inputs.
        see dotProductAttention(INDArray, INDArray, INDArray, INDArray, boolean, boolean)
        Parameters:
        queries - input 3D array "queries" of shape [batchSize, featureKeys, queryCount] (NUMERIC type)
        keys - input 3D array "keys" of shape [batchSize, featureKeys, timesteps] (NUMERIC type)
        values - input 3D array "values" of shape [batchSize, featureValues, timesteps] (NUMERIC type)
        Wq - input query projection weights of shape [numHeads, projectedKeys, featureKeys] (NUMERIC type)
        Wk - input key projection weights of shape [numHeads, projectedKeys, featureKeys] (NUMERIC type)
        Wv - input value projection weights of shape [numHeads, projectedValues, featureValues] (NUMERIC type)
        Wo - output projection weights of shape [numHeads * projectedValues, outSize] (NUMERIC type)
        mask - OPTIONAL; array that defines which values should be skipped of shape [batchSize, timesteps] (NUMERIC type)
        scaled - normalization, false -> do not apply normalization, true -> apply normalization
        Returns:
        output Attention result arrays of shape [batchSize, outSize, queryCount] (optionally) Attention Weights of shape [batchSize, numHeads, timesteps, queryCount] (NUMERIC type)
      • multiHeadDotProductAttention

        public SDVariable multiHeadDotProductAttention​(String name,
                                                       SDVariable queries,
                                                       SDVariable keys,
                                                       SDVariable values,
                                                       SDVariable Wq,
                                                       SDVariable Wk,
                                                       SDVariable Wv,
                                                       SDVariable Wo,
                                                       SDVariable mask,
                                                       boolean scaled)
        This performs multi-headed dot product attention on the given timeseries input
        out = concat(head_1, head_2, ..., head_n) * Wo
        head_i = dot_product_attention(Wq_i*q, Wk_i*k, Wv_i*v)

        Optionally with normalization when calculating the attention for each head.

        See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, pp. 4,5, "3.2.2 Multi-Head Attention")

        This makes use of dot_product_attention OP support for rank 4 inputs.
        see dotProductAttention(INDArray, INDArray, INDArray, INDArray, boolean, boolean)
        Parameters:
        name - name May be null. Name for the output variable
        queries - input 3D array "queries" of shape [batchSize, featureKeys, queryCount] (NUMERIC type)
        keys - input 3D array "keys" of shape [batchSize, featureKeys, timesteps] (NUMERIC type)
        values - input 3D array "values" of shape [batchSize, featureValues, timesteps] (NUMERIC type)
        Wq - input query projection weights of shape [numHeads, projectedKeys, featureKeys] (NUMERIC type)
        Wk - input key projection weights of shape [numHeads, projectedKeys, featureKeys] (NUMERIC type)
        Wv - input value projection weights of shape [numHeads, projectedValues, featureValues] (NUMERIC type)
        Wo - output projection weights of shape [numHeads * projectedValues, outSize] (NUMERIC type)
        mask - OPTIONAL; array that defines which values should be skipped of shape [batchSize, timesteps] (NUMERIC type)
        scaled - normalization, false -> do not apply normalization, true -> apply normalization
        Returns:
        output Attention result arrays of shape [batchSize, outSize, queryCount] (optionally) Attention Weights of shape [batchSize, numHeads, timesteps, queryCount] (NUMERIC type)
      • pad

        public SDVariable pad​(SDVariable input,
                              SDVariable padding,
                              PadMode PadMode,
                              double constant)
        Padding operation
        Parameters:
        input - Input tensor (NUMERIC type)
        padding - Padding value (NUMERIC type)
        PadMode - Padding format
        constant - Padding constant
        Returns:
        output Padded input (NUMERIC type)
      • pad

        public SDVariable pad​(String name,
                              SDVariable input,
                              SDVariable padding,
                              PadMode PadMode,
                              double constant)
        Padding operation
        Parameters:
        name - name May be null. Name for the output variable
        input - Input tensor (NUMERIC type)
        padding - Padding value (NUMERIC type)
        PadMode - Padding format
        constant - Padding constant
        Returns:
        output Padded input (NUMERIC type)
      • pad

        public SDVariable pad​(SDVariable input,
                              SDVariable padding,
                              double constant)
        Padding operation
        Parameters:
        input - Input tensor (NUMERIC type)
        padding - Padding value (NUMERIC type)
        constant - Padding constant
        Returns:
        output Padded input (NUMERIC type)
      • pad

        public SDVariable pad​(String name,
                              SDVariable input,
                              SDVariable padding,
                              double constant)
        Padding operation
        Parameters:
        name - name May be null. Name for the output variable
        input - Input tensor (NUMERIC type)
        padding - Padding value (NUMERIC type)
        constant - Padding constant
        Returns:
        output Padded input (NUMERIC type)
      • preciseGelu

        public SDVariable preciseGelu​(SDVariable x)
        GELU activation function - Gaussian Error Linear Units
        For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415
        This method uses the precise method
        Parameters:
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • preciseGelu

        public SDVariable preciseGelu​(String name,
                                      SDVariable x)
        GELU activation function - Gaussian Error Linear Units
        For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415
        This method uses the precise method
        Parameters:
        name - name May be null. Name for the output variable
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • prelu

        public SDVariable prelu​(SDVariable input,
                                SDVariable alpha,
                                int... sharedAxes)
        PReLU (Parameterized Rectified Linear Unit) operation. Like LeakyReLU with a learnable alpha:
        out[i] = in[i] if in[i] >= 0
        out[i] = in[i] * alpha[i] otherwise

        sharedAxes allows you to share learnable parameters along axes.
        For example, if the input has shape [batchSize, channels, height, width]
        and you want each channel to have its own cutoff, use sharedAxes = [2, 3] and an
        alpha with shape [channels].
        Parameters:
        input - Input data (NUMERIC type)
        alpha - The cutoff variable. Note that the batch dimension (the 0th, whether it is batch or not) should not be part of alpha. (NUMERIC type)
        sharedAxes - Which axes to share cutoff parameters along. (Size: AtLeast(min=1))
        Returns:
        output Output (NUMERIC type)
      • prelu

        public SDVariable prelu​(String name,
                                SDVariable input,
                                SDVariable alpha,
                                int... sharedAxes)
        PReLU (Parameterized Rectified Linear Unit) operation. Like LeakyReLU with a learnable alpha:
        out[i] = in[i] if in[i] >= 0
        out[i] = in[i] * alpha[i] otherwise

        sharedAxes allows you to share learnable parameters along axes.
        For example, if the input has shape [batchSize, channels, height, width]
        and you want each channel to have its own cutoff, use sharedAxes = [2, 3] and an
        alpha with shape [channels].
        Parameters:
        name - name May be null. Name for the output variable
        input - Input data (NUMERIC type)
        alpha - The cutoff variable. Note that the batch dimension (the 0th, whether it is batch or not) should not be part of alpha. (NUMERIC type)
        sharedAxes - Which axes to share cutoff parameters along. (Size: AtLeast(min=1))
        Returns:
        output Output (NUMERIC type)
      • relu

        public SDVariable relu​(SDVariable x,
                               double cutoff)
        Element-wise rectified linear function with specified cutoff:
        out[i] = in[i] if in[i] >= cutoff
        out[i] = 0 otherwise
        Parameters:
        x - Input (NUMERIC type)
        cutoff - Cutoff value for ReLU operation - x > cutoff ? x : 0. Usually 0
        Returns:
        output Output (NUMERIC type)
      • relu

        public SDVariable relu​(String name,
                               SDVariable x,
                               double cutoff)
        Element-wise rectified linear function with specified cutoff:
        out[i] = in[i] if in[i] >= cutoff
        out[i] = 0 otherwise
        Parameters:
        name - name May be null. Name for the output variable
        x - Input (NUMERIC type)
        cutoff - Cutoff value for ReLU operation - x > cutoff ? x : 0. Usually 0
        Returns:
        output Output (NUMERIC type)
      • relu6

        public SDVariable relu6​(SDVariable x,
                                double cutoff)
        Element-wise "rectified linear 6" function with specified cutoff:
        out[i] = min(max(in, cutoff), 6)
        Parameters:
        x - Input (NUMERIC type)
        cutoff - Cutoff value for ReLU operation. Usually 0
        Returns:
        output Output (NUMERIC type)
      • relu6

        public SDVariable relu6​(String name,
                                SDVariable x,
                                double cutoff)
        Element-wise "rectified linear 6" function with specified cutoff:
        out[i] = min(max(in, cutoff), 6)
        Parameters:
        name - name May be null. Name for the output variable
        x - Input (NUMERIC type)
        cutoff - Cutoff value for ReLU operation. Usually 0
        Returns:
        output Output (NUMERIC type)
      • reluLayer

        public SDVariable reluLayer​(SDVariable input,
                                    SDVariable weights,
                                    SDVariable bias)
        ReLU (Rectified Linear Unit) layer operation: out = relu(mmul(in,w) + bias)
        Note that bias array is optional
        Parameters:
        input - Input data (NUMERIC type)
        weights - Weights variable (NUMERIC type)
        bias - Optional bias variable (may be null) (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • reluLayer

        public SDVariable reluLayer​(String name,
                                    SDVariable input,
                                    SDVariable weights,
                                    SDVariable bias)
        ReLU (Rectified Linear Unit) layer operation: out = relu(mmul(in,w) + bias)
        Note that bias array is optional
        Parameters:
        name - name May be null. Name for the output variable
        input - Input data (NUMERIC type)
        weights - Weights variable (NUMERIC type)
        bias - Optional bias variable (may be null) (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • selu

        public SDVariable selu​(SDVariable x)
        Element-wise SeLU function - Scaled exponential Lineal Unit: see Self-Normalizing Neural Networks

        out[i] = scale * alpha * (exp(in[i])-1) if in[i]>0, or 0 if in[i] <= 0
        Uses default scale and alpha values.
        Parameters:
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • selu

        public SDVariable selu​(String name,
                               SDVariable x)
        Element-wise SeLU function - Scaled exponential Lineal Unit: see Self-Normalizing Neural Networks

        out[i] = scale * alpha * (exp(in[i])-1) if in[i]>0, or 0 if in[i] <= 0
        Uses default scale and alpha values.
        Parameters:
        name - name May be null. Name for the output variable
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • sigmoid

        public SDVariable sigmoid​(SDVariable x)
        Element-wise sigmoid function: out[i] = 1.0/(1+exp(-in[i]))
        Parameters:
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • sigmoid

        public SDVariable sigmoid​(String name,
                                  SDVariable x)
        Element-wise sigmoid function: out[i] = 1.0/(1+exp(-in[i]))
        Parameters:
        name - name May be null. Name for the output variable
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • sigmoidDerivative

        public SDVariable sigmoidDerivative​(SDVariable x,
                                            SDVariable wrt)
        Element-wise sigmoid function derivative: dL/dIn given input and dL/dOut
        Parameters:
        x - Input Variable (NUMERIC type)
        wrt - Gradient at the output - dL/dOut. Must have same shape as the input (NUMERIC type)
        Returns:
        output Output (gradient at input of sigmoid) (NUMERIC type)
      • sigmoidDerivative

        public SDVariable sigmoidDerivative​(String name,
                                            SDVariable x,
                                            SDVariable wrt)
        Element-wise sigmoid function derivative: dL/dIn given input and dL/dOut
        Parameters:
        name - name May be null. Name for the output variable
        x - Input Variable (NUMERIC type)
        wrt - Gradient at the output - dL/dOut. Must have same shape as the input (NUMERIC type)
        Returns:
        output Output (gradient at input of sigmoid) (NUMERIC type)
      • softmax

        public SDVariable softmax​(SDVariable x,
                                  int dimension)
        Softmax activation, along the specified dimension
        Parameters:
        x - Input (NUMERIC type)
        dimension - Dimension along which to apply softmax
        Returns:
        output Output variable (NUMERIC type)
      • softmax

        public SDVariable softmax​(String name,
                                  SDVariable x,
                                  int dimension)
        Softmax activation, along the specified dimension
        Parameters:
        name - name May be null. Name for the output variable
        x - Input (NUMERIC type)
        dimension - Dimension along which to apply softmax
        Returns:
        output Output variable (NUMERIC type)
      • softmax

        public SDVariable softmax​(SDVariable x)
        Softmax activation, along the specified dimension
        Parameters:
        x - Input (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • softmax

        public SDVariable softmax​(String name,
                                  SDVariable x)
        Softmax activation, along the specified dimension
        Parameters:
        name - name May be null. Name for the output variable
        x - Input (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • softmaxDerivative

        public SDVariable softmaxDerivative​(SDVariable x,
                                            SDVariable wrt,
                                            int dimension)
        Softmax derivative function
        Parameters:
        x - Softmax input (NUMERIC type)
        wrt - Gradient at output, dL/dx (NUMERIC type)
        dimension - Softmax dimension
        Returns:
        output (NUMERIC type)
      • softmaxDerivative

        public SDVariable softmaxDerivative​(String name,
                                            SDVariable x,
                                            SDVariable wrt,
                                            int dimension)
        Softmax derivative function
        Parameters:
        name - name May be null. Name for the output variable
        x - Softmax input (NUMERIC type)
        wrt - Gradient at output, dL/dx (NUMERIC type)
        dimension - Softmax dimension
        Returns:
        output (NUMERIC type)
      • softplus

        public SDVariable softplus​(SDVariable x)
        Element-wise softplus function: out = log(exp(x) + 1)
        Parameters:
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • softplus

        public SDVariable softplus​(String name,
                                   SDVariable x)
        Element-wise softplus function: out = log(exp(x) + 1)
        Parameters:
        name - name May be null. Name for the output variable
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • softsign

        public SDVariable softsign​(SDVariable x)
        Element-wise softsign function: out = x / (abs(x) + 1)
        Parameters:
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • softsign

        public SDVariable softsign​(String name,
                                   SDVariable x)
        Element-wise softsign function: out = x / (abs(x) + 1)
        Parameters:
        name - name May be null. Name for the output variable
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • softsignDerivative

        public SDVariable softsignDerivative​(SDVariable x)
        Element-wise derivative (dOut/dIn) of the softsign function softsign(INDArray)
        Parameters:
        x - Input variable (NUMERIC type)
        Returns:
        output Output (NUMERIC type)
      • softsignDerivative

        public SDVariable softsignDerivative​(String name,
                                             SDVariable x)
        Element-wise derivative (dOut/dIn) of the softsign function softsign(INDArray)
        Parameters:
        name - name May be null. Name for the output variable
        x - Input variable (NUMERIC type)
        Returns:
        output Output (NUMERIC type)
      • swish

        public SDVariable swish​(String name,
                                SDVariable x)
        Element-wise "swish" function: out = x * sigmoid(b*x) with b=1.0
        See: https://arxiv.org/abs/1710.05941
        Parameters:
        name - name May be null. Name for the output variable
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • tanh

        public SDVariable tanh​(SDVariable x)
        Elementwise tanh (hyperbolic tangent) operation: out = tanh(x)
        Parameters:
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • tanh

        public SDVariable tanh​(String name,
                               SDVariable x)
        Elementwise tanh (hyperbolic tangent) operation: out = tanh(x)
        Parameters:
        name - name May be null. Name for the output variable
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • topK

        public SDVariable[] topK​(SDVariable input,
                                 double k,
                                 boolean sorted)
        Find values and indices for the largest k entries along the last dimension.
        Parameters:
        input - Input data (NUMERIC type)
        k - The number of values to return
        sorted - Whether to return the values sorted or not
      • topK

        public SDVariable[] topK​(String[] names,
                                 SDVariable input,
                                 double k,
                                 boolean sorted)
        Find values and indices for the largest k entries along the last dimension.
        Parameters:
        names - names May be null. Arrays of names for the output variables.
        input - Input data (NUMERIC type)
        k - The number of values to return
        sorted - Whether to return the values sorted or not