Package ai.djl.training.dataset
Class RandomAccessDataset
- java.lang.Object
-
- ai.djl.training.dataset.RandomAccessDataset
-
- All Implemented Interfaces:
Dataset
- Direct Known Subclasses:
ArrayDataset
public abstract class RandomAccessDataset extends java.lang.Object implements Dataset
RandomAccessDataset represent the dataset that support random access reads. i.e. it could access a specific data item given the index.Almost all datasets in DJL extend, either directly or indirectly,
RandomAccessDataset.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classRandomAccessDataset.BaseBuilder<T extends RandomAccessDataset.BaseBuilder<T>>The Builder to construct aRandomAccessDataset.-
Nested classes/interfaces inherited from interface ai.djl.training.dataset.Dataset
Dataset.Usage
-
-
Field Summary
Fields Modifier and Type Field Description protected BatchifierdataBatchifierprotected Devicedeviceprotected BatchifierlabelBatchifierprotected longlimitprotected Pipelinepipelineprotected intprefetchNumberprotected Samplersamplerprotected PipelinetargetPipeline
-
Constructor Summary
Constructors Constructor Description RandomAccessDataset(RandomAccessDataset.BaseBuilder<?> builder)Creates a new instance ofRandomAccessDatasetwith the given necessary configurations.
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected abstract longavailableSize()Returns the number of records available to be read in thisDataset.abstract Recordget(NDManager manager, long index)Gets theRecordfor the given index from the dataset.java.lang.Iterable<Batch>getData(NDManager manager)Fetches an iterator that can iterate through theDataset.java.lang.Iterable<Batch>getData(NDManager manager, Sampler sampler)Fetches an iterator that can iterate through theDatasetwith a custom sampler.java.lang.Iterable<Batch>getData(NDManager manager, Sampler sampler, java.util.concurrent.ExecutorService executorService)Fetches an iterator that can iterate through theDatasetwith a custom sampler multi-threaded.java.lang.Iterable<Batch>getData(NDManager manager, java.util.concurrent.ExecutorService executorService)Fetches an iterator that can iterate through theDatasetwith multiple threads.protected RandomAccessDatasetnewSubDataset(int[] indices, int from, int to)protected RandomAccessDatasetnewSubDataset(java.util.List<java.lang.Long> subIndices)RandomAccessDataset[]randomSplit(int... ratio)Splits the dataset set into multiple portions.longsize()Returns the size of thisDataset.RandomAccessDatasetsubDataset(int fromIndex, int toIndex)Returns a view of the portion of this data between the specifiedfromIndex, inclusive, andtoIndex, exclusive.RandomAccessDatasetsubDataset(java.util.List<java.lang.Long> subIndices)Returns a view of the portion of this data for the specifiedsubIndices.<K> RandomAccessDatasetsubDataset(java.util.List<K> recordKeys, java.util.List<K> subRecordKeys)Returns a view of the portion of this data for the specified record keys.<K> RandomAccessDatasetsubDataset(java.util.Map<K,java.lang.Long> indicesOfRecordKeys, java.util.List<K> subRecordKeys)Returns a view of the portion of this data for the specified record keys.ai.djl.util.Pair<java.lang.Number[][],java.lang.Number[][]>toArray(NDManager manager)Returns the dataset contents as a Java array.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface ai.djl.training.dataset.Dataset
matchingTranslatorOptions, prepare, prepare
-
-
-
-
Field Detail
-
sampler
protected Sampler sampler
-
dataBatchifier
protected Batchifier dataBatchifier
-
labelBatchifier
protected Batchifier labelBatchifier
-
pipeline
protected Pipeline pipeline
-
targetPipeline
protected Pipeline targetPipeline
-
prefetchNumber
protected int prefetchNumber
-
limit
protected long limit
-
device
protected Device device
-
-
Constructor Detail
-
RandomAccessDataset
public RandomAccessDataset(RandomAccessDataset.BaseBuilder<?> builder)
Creates a new instance ofRandomAccessDatasetwith the given necessary configurations.- Parameters:
builder- a builder with the necessary configurations
-
-
Method Detail
-
get
public abstract Record get(NDManager manager, long index) throws java.io.IOException
Gets theRecordfor the given index from the dataset.- Parameters:
manager- the manager used to create the arraysindex- the index of the requested data item- Returns:
- a
Recordthat contains the data and label of the requested data item - Throws:
java.io.IOException- if an I/O error occurs
-
getData
public java.lang.Iterable<Batch> getData(NDManager manager) throws java.io.IOException, TranslateException
Fetches an iterator that can iterate through theDataset.- Specified by:
getDatain interfaceDataset- Parameters:
manager- the dataset to iterate through- Returns:
- an
IterableofBatchthat contains batches of data from the dataset - Throws:
java.io.IOException- for various exceptions depending on the datasetTranslateException- if there is an error while processing input
-
getData
public java.lang.Iterable<Batch> getData(NDManager manager, java.util.concurrent.ExecutorService executorService) throws java.io.IOException, TranslateException
Fetches an iterator that can iterate through theDatasetwith multiple threads.- Specified by:
getDatain interfaceDataset- Parameters:
manager- the dataset to iterate throughexecutorService- the executorService to use for multi-threading- Returns:
- an
IterableofBatchthat contains batches of data from the dataset - Throws:
java.io.IOException- for various exceptions depending on the datasetTranslateException- if there is an error while processing input
-
getData
public java.lang.Iterable<Batch> getData(NDManager manager, Sampler sampler) throws java.io.IOException, TranslateException
Fetches an iterator that can iterate through theDatasetwith a custom sampler.- Parameters:
manager- the manager to create the arrayssampler- the sampler to use to iterate through the dataset- Returns:
- an
IterableofBatchthat contains batches of data from the dataset - Throws:
java.io.IOException- for various exceptions depending on the datasetTranslateException- if there is an error while processing input
-
getData
public java.lang.Iterable<Batch> getData(NDManager manager, Sampler sampler, java.util.concurrent.ExecutorService executorService) throws java.io.IOException, TranslateException
Fetches an iterator that can iterate through theDatasetwith a custom sampler multi-threaded.- Parameters:
manager- the manager to create the arrayssampler- the sampler to use to iterate through the datasetexecutorService- the executorService to multi-thread with- Returns:
- an
IterableofBatchthat contains batches of data from the dataset - Throws:
java.io.IOException- for various exceptions depending on the datasetTranslateException- if there is an error while processing input
-
size
public long size()
Returns the size of thisDataset.- Returns:
- the size of this
Dataset
-
availableSize
protected abstract long availableSize()
Returns the number of records available to be read in thisDataset.- Returns:
- the number of records available to be read in this
Dataset
-
randomSplit
public RandomAccessDataset[] randomSplit(int... ratio) throws java.io.IOException, TranslateException
Splits the dataset set into multiple portions.- Parameters:
ratio- the ratio of each sub dataset- Returns:
- an array of the sub dataset
- Throws:
java.io.IOException- for various exceptions depending on the datasetTranslateException- if there is an error while processing input
-
subDataset
public RandomAccessDataset subDataset(int fromIndex, int toIndex)
Returns a view of the portion of this data between the specifiedfromIndex, inclusive, andtoIndex, exclusive.- Parameters:
fromIndex- low endpoint (inclusive) of the subDatasettoIndex- high endpoint (exclusive) of the subData- Returns:
- a view of the specified range within this dataset
-
subDataset
public RandomAccessDataset subDataset(java.util.List<java.lang.Long> subIndices)
Returns a view of the portion of this data for the specifiedsubIndices.- Parameters:
subIndices- sub-set of indices of this dataset- Returns:
- a view of the specified indices within this dataset
-
subDataset
public <K> RandomAccessDataset subDataset(java.util.List<K> recordKeys, java.util.List<K> subRecordKeys)
Returns a view of the portion of this data for the specified record keys. Assuming that the records of this database are represented by the keys inrecordKeys, thensubRecordKeysdefines the view on the corresponding records of the database.- Type Parameters:
K- the record key type.- Parameters:
recordKeys- unique keys for all records of this dataset.subRecordKeys- keys to define the view on the dataset. All keys insubRecordKeysmust be contained inrecordKeysbut may occur more than once.- Returns:
- a view of the specified records within this dataset
-
subDataset
public <K> RandomAccessDataset subDataset(java.util.Map<K,java.lang.Long> indicesOfRecordKeys, java.util.List<K> subRecordKeys)
Returns a view of the portion of this data for the specified record keys. Assuming that the records of this database are represented by the keys inindicesOfRecordKeys, thensubRecordKeysdefines the view on the corresponding records of the database.- Type Parameters:
K- the record key type.- Parameters:
indicesOfRecordKeys- Map for keys of the records in this dataset to their index position within this dataset. While this map typically maps all records, technically it just needs to map the ones occuring insubRecordKeys.subRecordKeys- Keys to define the view on the dataset. All keys insubRecordKeysmust be contained inindicesOfRecordKeysbut may occur more than once.- Returns:
- a view of the records identified by the specified keys of this dataset
-
newSubDataset
protected RandomAccessDataset newSubDataset(int[] indices, int from, int to)
-
newSubDataset
protected RandomAccessDataset newSubDataset(java.util.List<java.lang.Long> subIndices)
-
toArray
public ai.djl.util.Pair<java.lang.Number[][],java.lang.Number[][]> toArray(NDManager manager) throws java.io.IOException, TranslateException
Returns the dataset contents as a Java array.Each Number[] is a flattened dataset record and the Number[][] is the array of all records.
- Parameters:
manager- the manager to create the arrays- Returns:
- the dataset contents as a Java array
- Throws:
java.io.IOException- for various exceptions depending on the datasetTranslateException- if there is an error while processing input
-
-