public class DataSet extends Object implements DataSet
| Constructor and Description |
|---|
DataSet() |
DataSet(INDArray first,
INDArray second)
Creates a dataset with the specified input matrix and labels
|
DataSet(INDArray features,
INDArray labels,
INDArray featuresMask,
INDArray labelsMask)
Create a dataset with the specified input INDArray and labels (output) INDArray, plus (optionally) mask arrays
for the features and labels
|
| Modifier and Type | Method and Description |
|---|---|
void |
addFeatureVector(INDArray toAdd)
Adds a feature for each example on to the current feature vector
|
void |
addFeatureVector(INDArray feature,
int example)
The feature to add, and the example/row number
|
void |
addRow(DataSet d,
int i) |
List<DataSet> |
asList()
Extract each example in the DataSet into its own DataSet object, and return all of them as a list
|
List<DataSet> |
batchBy(int num)
Partitions a dataset in to mini batches where
each dataset in each list is of the specified number of examples
|
List<DataSet> |
batchByNumLabels() |
void |
binarize()
Same as calling binarize(0)
|
void |
binarize(double cutoff)
Binarizes the dataset such that any number greater than cutoff is 1 otherwise zero
|
DataSet |
copy()
Clone the dataset
|
List<DataSet> |
dataSetBatches(int num)
Partitions the data transform by the specified number.
|
void |
detach()
This method detaches this DataSet from current Workspace (if any)
|
void |
divideBy(int num)
Divide the features by a scalar
|
static DataSet |
empty()
Returns a single dataset (all fields are null)
|
boolean |
equals(Object o) |
INDArray |
exampleMaxs() |
INDArray |
exampleMeans() |
INDArray |
exampleSums() |
void |
filterAndStrip(int[] labels)
Strips the dataset down to the specified labels
and remaps them
|
DataSet |
filterBy(int[] labels)
Strips the data transform of all but the passed in labels
|
DataSet |
get(int i)
Gets a copy of example i
|
DataSet |
get(int[] i)
Gets a copy of example i
|
List<String> |
getColumnNames()
Deprecated.
|
List<Serializable> |
getExampleMetaData()
Get the example metadata, or null if no metadata has been set
|
<T extends Serializable> |
getExampleMetaData(Class<T> metaDataType)
Get the example metadata, or null if no metadata has been set
Note: this method results in an unchecked cast - care should be taken when using this! |
INDArray |
getFeatures()
Returns the features array for the DataSet
|
INDArray |
getFeaturesMaskArray()
Input mask array: a mask array for input, where each value is in {0,1} in order to specify whether an input is
actually present or not.
|
String |
getLabelName(int idx) |
List<String> |
getLabelNames()
Deprecated.
|
List<String> |
getLabelNames(INDArray idxs) |
List<String> |
getLabelNamesList()
Gets the optional label names
|
INDArray |
getLabels()
Returns the labels for the dataset
|
INDArray |
getLabelsMaskArray()
Labels (output) mask array: a mask array for input, where each value is in {0,1} in order to specify whether an
output is actually present or not.
|
long |
getMemoryFootprint()
This method returns memory used by this DataSet
|
DataSet |
getRange(int from,
int to) |
int |
hashCode() |
boolean |
hasMaskArrays()
Whether the labels or input (features) mask arrays are present for this DataSet
|
String |
id() |
boolean |
isEmpty() |
boolean |
isPreProcessed() |
DataSetIterator |
iterateWithMiniBatches() |
Iterator<DataSet> |
iterator() |
Map<Integer,Double> |
labelCounts()
Calculate and return a count of each label, by index.
|
void |
load(File from)
Load the contents of the DataSet from the specified File.
|
void |
load(InputStream from)
Load the contents of the DataSet from the specified InputStream.
|
void |
markAsPreProcessed() |
static DataSet |
merge(List<? extends DataSet> data)
Merge the list of datasets in to one list.
|
void |
migrate()
This method migrates this DataSet into current Workspace (if any)
|
void |
multiplyBy(double num)
Multiply the features by a scalar
|
void |
normalize()
Normalize this DataSet to mean 0, stdev 1 per input.
|
void |
normalizeZeroMeanZeroUnitVariance()
Deprecated.
|
int |
numExamples()
Number of examples in the DataSet
|
int |
numInputs()
The number of inputs in the feature matrix
|
int |
numOutcomes()
Returns the number of outcomes (size of the labels array for each example)
|
int |
outcome() |
DataSet |
reshape(int rows,
int cols)
Reshapes the input in to the given rows and columns
|
void |
roundToTheNearest(int roundTo) |
DataSet |
sample(int numSamples)
Sample without replacement and a random rng
|
DataSet |
sample(int numSamples,
boolean withReplacement)
Sample a dataset numSamples times
|
DataSet |
sample(int numSamples,
Random rng)
Sample without replacement
|
DataSet |
sample(int numSamples,
Random rng,
boolean withReplacement)
Sample a dataset
|
void |
save(File to)
Save this DataSet to a file.
|
void |
save(OutputStream to)
Write the contents of this DataSet to the specified OutputStream
|
void |
scale()
Divides the input data transform
by the max number in each row
|
void |
scaleMinAndMax(double min,
double max) |
void |
setColumnNames(List<String> columnNames)
Deprecated.
|
void |
setExampleMetaData(List<? extends Serializable> exampleMetaData)
Set the metadata for this DataSet
By convention: the metadata can be any serializable object, one per example in the DataSet |
void |
setFeatures(INDArray features)
Set the features array for the DataSet
|
void |
setFeaturesMaskArray(INDArray featuresMask)
Set the features mask array in this DataSet
|
void |
setLabelNames(List<String> labelNames)
Sets the label names, will throw an exception if the passed
in label names doesn't equal the number of outcomes
|
void |
setLabels(INDArray labels) |
void |
setLabelsMaskArray(INDArray labelsMask)
Set the labels mask array in this data set
|
void |
setNewNumberOfLabels(int labels)
Clears the outcome matrix setting a new number of labels
|
void |
setOutcome(int example,
int label)
Sets the outcome of a particular example
|
void |
shuffle()
Shuffle the order of the rows in the DataSet.
|
void |
shuffle(long seed)
Shuffles the dataset in place, given a seed for a random number generator.
|
List<DataSet> |
sortAndBatchByNumLabels()
Sorts the dataset by label:
Splits the data transform such that examples are sorted by their labels.
|
void |
sortByLabel()
Organizes the dataset to minimize sampling error
while still allowing efficient batching.
|
SplitTestAndTrain |
splitTestAndTrain(double fractionTrain)
SplitV the DataSet into two DataSets randomly
|
SplitTestAndTrain |
splitTestAndTrain(int numHoldout)
Splits a dataset in to test and train
|
SplitTestAndTrain |
splitTestAndTrain(int numHoldout,
Random rng)
Splits a dataset in to test and train randomly.
|
void |
squishToRange(double min,
double max)
Squeezes input data to a max and a min
|
MultiDataSet |
toMultiDataSet() |
String |
toString() |
void |
validate() |
clone, finalize, getClass, notify, notifyAll, wait, wait, waitforEach, spliteratorpublic DataSet()
public DataSet(INDArray first, INDArray second)
first - the feature matrixsecond - the labels (these should be binarized label matrices such that the specified label
has a value of 1 in the desired column with the label)public DataSet(INDArray features, INDArray labels, INDArray featuresMask, INDArray labelsMask)
features - Features (input)labels - Labels (output)featuresMask - Mask array for features, may be nulllabelsMask - Mask array for labels, may be nullpublic List<Serializable> getExampleMetaData()
DataSetgetExampleMetaData in interface DataSet#getExampleMetaData(Class)} for convenience method for typespublic <T extends Serializable> List<T> getExampleMetaData(Class<T> metaDataType)
DataSetgetExampleMetaData in interface DataSetT - Type of metadatametaDataType - Class of the metadata (used for opType information)public void setExampleMetaData(List<? extends Serializable> exampleMetaData)
DataSetsetExampleMetaData in interface DataSetexampleMetaData - Example metadata to setpublic boolean isPreProcessed()
public void markAsPreProcessed()
public static DataSet empty()
public static DataSet merge(List<? extends DataSet> data)
data - the data to mergepublic void load(InputStream from)
DataSetDataSet.save(OutputStream)public void load(File from)
DataSetDataSet.save(File)public void save(OutputStream to)
DataSetpublic void save(File to)
DataSetpublic DataSetIterator iterateWithMiniBatches()
iterateWithMiniBatches in interface DataSetpublic INDArray getFeatures()
DataSetgetFeatures in interface DataSetpublic void setFeatures(INDArray features)
DataSetsetFeatures in interface DataSetfeatures - Features to setpublic Map<Integer,Double> labelCounts()
DataSetlabelCounts in interface DataSetpublic DataSet copy()
public DataSet reshape(int rows, int cols)
public void multiplyBy(double num)
DataSetmultiplyBy in interface DataSetpublic void divideBy(int num)
DataSetpublic void shuffle()
DataSetpublic void shuffle(long seed)
seed - Seed to use for the random Number Generatorpublic void squishToRange(double min,
double max)
squishToRange in interface DataSetmin - the min value to occur in the datasetmax - the max value to ccur in the datasetpublic void scaleMinAndMax(double min,
double max)
scaleMinAndMax in interface DataSetpublic void scale()
public void addFeatureVector(INDArray toAdd)
addFeatureVector in interface DataSettoAdd - the feature vector to addpublic void addFeatureVector(INDArray feature, int example)
addFeatureVector in interface DataSetfeature - the feature vector to addexample - the number of the example to append topublic void normalize()
DataSetNormalizerStandardizepublic void binarize()
public void binarize(double cutoff)
@Deprecated public void normalizeZeroMeanZeroUnitVariance()
normalizeZeroMeanZeroUnitVariance in interface DataSetpublic int numInputs()
public void setNewNumberOfLabels(int labels)
setNewNumberOfLabels in interface DataSetlabels - the number of labels/columns in the outcome matrix
Note that this clears the labels for each examplepublic void setOutcome(int example,
int label)
setOutcome in interface DataSetexample - the example to transformlabel - the label of the outcomepublic DataSet get(int i)
public DataSet get(int[] i)
public List<DataSet> batchBy(int num)
public DataSet filterBy(int[] labels)
public void filterAndStrip(int[] labels)
filterAndStrip in interface DataSetlabels - the labels to strip down topublic List<DataSet> dataSetBatches(int num)
dataSetBatches in interface DataSetnum - the number to split bypublic List<DataSet> sortAndBatchByNumLabels()
sortAndBatchByNumLabels in interface DataSetpublic List<DataSet> batchByNumLabels()
batchByNumLabels in interface DataSetpublic List<DataSet> asList()
DataSetpublic SplitTestAndTrain splitTestAndTrain(int numHoldout, Random rng)
splitTestAndTrain in interface DataSetnumHoldout - the number to hold out for trainingrng - Random Number Generator to use to shuffle the datasetpublic SplitTestAndTrain splitTestAndTrain(int numHoldout)
splitTestAndTrain in interface DataSetnumHoldout - the number to hold out for trainingpublic INDArray getLabels()
public String getLabelName(int idx)
getLabelName in interface DataSetidx - the index to pullRows the string label value out of the list if it existspublic List<String> getLabelNames(INDArray idxs)
getLabelNames in interface DataSetidxs - list of index to pullRows the string label value out of the list if it existspublic void sortByLabel()
sortByLabel in interface DataSetpublic INDArray exampleSums()
exampleSums in interface DataSetpublic INDArray exampleMaxs()
exampleMaxs in interface DataSetpublic INDArray exampleMeans()
exampleMeans in interface DataSetpublic DataSet sample(int numSamples)
public DataSet sample(int numSamples, boolean withReplacement)
public void roundToTheNearest(int roundTo)
roundToTheNearest in interface DataSetpublic int numOutcomes()
DataSetnumOutcomes in interface DataSetpublic int numExamples()
DataSetnumExamples in interface DataSet@Deprecated public List<String> getLabelNames()
getLabelNames in interface DataSetpublic List<String> getLabelNamesList()
getLabelNamesList in interface DataSetpublic void setLabelNames(List<String> labelNames)
setLabelNames in interface DataSetlabelNames - the label names to use@Deprecated public List<String> getColumnNames()
getColumnNames in interface DataSet@Deprecated public void setColumnNames(List<String> columnNames)
setColumnNames in interface DataSetcolumnNames - public SplitTestAndTrain splitTestAndTrain(double fractionTrain)
DataSetsplitTestAndTrain in interface DataSetfractionTrain - Fraction (in range 0 to 1) of examples to be returned in the training DataSet objectpublic INDArray getFeaturesMaskArray()
DataSetgetFeaturesMaskArray in interface DataSetpublic void setFeaturesMaskArray(INDArray featuresMask)
DataSetsetFeaturesMaskArray in interface DataSetpublic INDArray getLabelsMaskArray()
DataSetgetLabelsMaskArray in interface DataSetpublic void setLabelsMaskArray(INDArray labelsMask)
DataSetsetLabelsMaskArray in interface DataSetpublic boolean hasMaskArrays()
DataSethasMaskArrays in interface DataSetpublic long getMemoryFootprint()
getMemoryFootprint in interface DataSetpublic void migrate()
DataSetpublic void detach()
DataSetpublic boolean isEmpty()
public MultiDataSet toMultiDataSet()
toMultiDataSet in interface DataSetCopyright © 2021. All rights reserved.