Packages

package dataset

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. trait AbstractDataSet[D, DataSequence] extends AnyRef

    A set of data which is used in the model optimization process.

    A set of data which is used in the model optimization process. The dataset can be access in a random data sample sequence. In the training process, the data sequence is a looped endless sequence. While in the validation process, the data sequence is a limited length sequence. User can use the data() method to get the data sequence.

    The sequence of the data is not fixed. It can be changed by the shuffle() method.

    User can create a dataset from a RDD, an array and a folder, etc. The DataSet object provides many factory methods.

    D

    Data type

    DataSequence

    Represent a sequence of data

  2. class ArraySample[T] extends Sample[T]

    A kind of sample who use only one array

  3. case class ByteRecord(data: Array[Byte], label: Float) extends Product with Serializable

    A byte array and a label.

    A byte array and a label. It can contain anything.

  4. class CachedDistriDataSet[T] extends DistributedDataSet[T]

    Wrap a RDD as a DataSet.

  5. class ChainedTransformer[A, B, C] extends Transformer[A, C]

    A transformer chain two transformer together.

    A transformer chain two transformer together. The output type of the first transformer should be same with the input type of the second transformer.

    A

    input type of the first transformer

    B

    output type of the first transformer, as well as the input type of the last transformer

    C

    output of the last transformer

  6. class DefaultPadding extends PaddingStrategy
  7. trait DistributedDataSet[T] extends AbstractDataSet[T, RDD[T]]

    Represent a distributed data.

    Represent a distributed data. Use RDD to go through all data.

  8. case class FixedLength(fixedLength: Array[Int]) extends PaddingStrategy with Product with Serializable

    Set the first dimension's length to fixed length.

    Set the first dimension's length to fixed length.

    fixedLength

    fixed length

  9. class Identity[A] extends Transformer[A, A]

    Just transform the input to output.

  10. abstract class Image extends Serializable

    Represent an image

  11. trait Label[T] extends AnyRef

    Represent a label

  12. class LocalArrayDataSet[T] extends LocalDataSet[T]

    Wrap an array as a DataSet.

  13. trait LocalDataSet[T] extends AbstractDataSet[T, Iterator[T]]

    Manage some 'local' data, e.g.

    Manage some 'local' data, e.g. data in files or memory. We use iterator to go through the data.

  14. class LocalImagePath extends AnyRef

    Represent a local file path of an image file

  15. case class LocalSeqFilePath(path: Path) extends Product with Serializable

    Represent a local file path of a hadoop sequence file

  16. trait MiniBatch[T] extends Serializable

    A interface for MiniBatch.

    A interface for MiniBatch. A MiniBatch contains a few samples.

    T

    Numeric type

  17. case class PaddingLongest(paddingLength: Array[Int]) extends PaddingStrategy with Product with Serializable

    Add an constant length to longest feature in the first dimension

  18. case class PaddingParam[T](paddingTensor: Option[Array[Tensor[T]]] = None, paddingStrategy: PaddingStrategy = new DefaultPadding)(implicit evidence$14: ClassTag[T]) extends Serializable with Product

    Feature Padding param for MiniBatch.

    Feature Padding param for MiniBatch.

    For constructing a mini batch, we need to make sure all samples' feature and label in this mini batch have the same size. If the size is different, we will pad them to the same size.

    By default, we will pad the first dimension to the longest size with zero in the MiniBatch. If you want to specify the padding values, you can set paddingTensor; If you want to specify the padding length, you can use PaddingLongest or FixedLength.

    For example, your feature size is n*m*k, you should provide a 2D tensor in a size of m*k. If your feature is 1D, you can provide a one-element 1D tensor.

    For example, we have 3 Sample, and convert them into a MiniBatch. Sample1's feature is a 2*3 tensor {1, 2, 3, 4, 5, 6}

    Sample2's feature is a 1*3 tensor {7, 8, 9}

    Sample3's feature is a 3*3 tensor {10, 11, 12, 13, 14, 15, 16, 17, 18}

    And the paddingTensor is {-1, -2, -3}, use FixedLength(Array(4)), the MiniBatch will be a tensor of 3*4*3: {1, 2, 3, 4, 5, 6, -1, -2, -3, -1, -2, -3

    7, 8, 9, -1, -2, -3, -1, -2, -3, -1, -2, -3

    10, 11, 12, 13, 14, 15, 16, 17, 18 -1, -2, -3}

    T

    numeric type

    paddingTensor

    paddings tensor for the first dimension(by default None, meaning zero padding).

    paddingStrategy

    See PaddingLongest, FixedLength

  19. abstract class PaddingStrategy extends Serializable
  20. abstract class Sample[T] extends Serializable

    Class that represents the features and labels of a data sample.

    Class that represents the features and labels of a data sample.

    T

    numeric type

  21. class SampleToMiniBatch[T] extends Transformer[Sample[T], MiniBatch[T]]

    Convert a sequence of Sample to a sequence of MiniBatch through function toMiniBatch.

  22. abstract class Sentence[T] extends Serializable

    Represent a sentence

  23. class SparseMiniBatch[T] extends ArrayTensorMiniBatch[T]

    SparseMiniBatch is a MiniBatch type for TensorSample.

    SparseMiniBatch is a MiniBatch type for TensorSample. And SparseMiniBatch could deal with SparseTensors in TensorSample.

    T

    Numeric type

  24. class TensorSample[T] extends Sample[T]

    A kind of Sample who hold both DenseTensor and SparseTensor as features.

    A kind of Sample who hold both DenseTensor and SparseTensor as features.

    T

    numeric type

  25. trait Transformer[A, B] extends Serializable

    Transform a data stream of type A to type B.

    Transform a data stream of type A to type B. It is usually used in data pre-process stage. Different transformers can compose a pipeline. For example, if there're transformer1 from A to B, transformer2 from B to C, and transformer3 from C to D, you can compose them into a bigger transformer from A to D by transformer1 -> transformer2 -> transformer 3.

    The purpose of transformer is for code reuse. Many deep learning share many common data pre-process steps. User needn't write them every time, but can reuse others work.

    Transformer can be used with RDD(rdd.mapPartition), iterator and DataSet.

  26. class SampleToBatch[T] extends Transformer[Sample[T], MiniBatch[T]]

    Convert a sequence of single-feature and single-label Sample to a sequence of MiniBatch, optionally padding all the features (or labels) in the mini-batch to the same length

    Convert a sequence of single-feature and single-label Sample to a sequence of MiniBatch, optionally padding all the features (or labels) in the mini-batch to the same length

    Annotations
    @deprecated
    Deprecated

    (Since version 0.2.0) Use SampleToMiniBatch instead

Value Members

  1. object ArraySample extends Serializable
  2. object DataSet

    Common used DataSet builder.

  3. object Identity extends Serializable
  4. object MiniBatch extends Serializable
  5. object Sample extends Serializable
  6. object SampleToBatch extends Serializable

    Convert a sequence of Sample to a sequence of MiniBatch, optionally padding all the features (or labels) in the mini-batch to the same length

  7. object SampleToMiniBatch extends Serializable
  8. object SparseMiniBatch extends Serializable
  9. object TensorSample extends Serializable
  10. object Utils

Ungrouped