Package ai.djl.basicdataset.nlp
Class PennTreebankText
- java.lang.Object
-
- ai.djl.training.dataset.RandomAccessDataset
-
- ai.djl.basicdataset.nlp.TextDataset
-
- ai.djl.basicdataset.nlp.PennTreebankText
-
- All Implemented Interfaces:
ai.djl.training.dataset.Dataset
public class PennTreebankText extends TextDataset
The Penn Treebank (PTB) project selected 2,499 stories from a three year Wall Street Journal (WSJ) collection of 98,732 stories for syntactic annotation (see here for details).
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classPennTreebankText.BuilderA builder to construct aPennTreebankText.-
Nested classes/interfaces inherited from class ai.djl.basicdataset.nlp.TextDataset
TextDataset.Sample
-
-
Field Summary
-
Fields inherited from class ai.djl.basicdataset.nlp.TextDataset
manager, mrl, prepared, samples, sourceTextData, targetTextData, usage
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected longavailableSize()static PennTreebankText.Builderbuilder()Creates a builder to build aPennTreebankText.ai.djl.training.dataset.Recordget(ai.djl.ndarray.NDManager manager, long index)voidprepare(ai.djl.util.Progress progress)Prepares the dataset for use with tracked progress.-
Methods inherited from class ai.djl.basicdataset.nlp.TextDataset
getProcessedText, getRawText, getSamples, getTextEmbedding, getVocabulary, preprocess
-
Methods inherited from class ai.djl.training.dataset.RandomAccessDataset
getData, getData, getData, getData, newSubDataset, newSubDataset, randomSplit, size, subDataset, subDataset, subDataset, subDataset, toArray
-
-
-
-
Method Detail
-
builder
public static PennTreebankText.Builder builder()
Creates a builder to build aPennTreebankText.- Returns:
- a new
PennTreebankText.Builderobject
-
get
public ai.djl.training.dataset.Record get(ai.djl.ndarray.NDManager manager, long index) throws java.io.IOException- Specified by:
getin classai.djl.training.dataset.RandomAccessDataset- Throws:
java.io.IOException
-
availableSize
protected long availableSize()
- Specified by:
availableSizein classai.djl.training.dataset.RandomAccessDataset
-
prepare
public void prepare(ai.djl.util.Progress progress) throws java.io.IOException, ai.djl.modality.nlp.embedding.EmbeddingExceptionPrepares the dataset for use with tracked progress.- Parameters:
progress- the progress tracker- Throws:
java.io.IOException- for various exceptions depending on the datasetai.djl.modality.nlp.embedding.EmbeddingException
-
-