Package ai.djl.basicdataset.nlp
Class UniversalDependenciesEnglishEWT
- java.lang.Object
-
- ai.djl.training.dataset.RandomAccessDataset
-
- ai.djl.basicdataset.nlp.TextDataset
-
- ai.djl.basicdataset.nlp.UniversalDependenciesEnglishEWT
-
- All Implemented Interfaces:
ai.djl.training.dataset.Dataset
public class UniversalDependenciesEnglishEWT extends TextDataset
A Gold Standard Universal Dependencies Corpus for English, built over the source material of the English Web Treebank LDC2012T13.- See Also:
- English Web Treebank LDC2012T13
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classUniversalDependenciesEnglishEWT.BuilderA builder for aUniversalDependenciesEnglishEWT.-
Nested classes/interfaces inherited from class ai.djl.basicdataset.nlp.TextDataset
TextDataset.Sample
-
-
Field Summary
-
Fields inherited from class ai.djl.basicdataset.nlp.TextDataset
manager, mrl, prepared, samples, sourceTextData, targetTextData, usage
-
-
Constructor Summary
Constructors Modifier Constructor Description protectedUniversalDependenciesEnglishEWT(UniversalDependenciesEnglishEWT.Builder builder)Creates a new instance ofUniversalDependenciesEnglish.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected longavailableSize()Returns the number of records available to be read in thisDataset.static UniversalDependenciesEnglishEWT.Builderbuilder()Creates a new builder to build aUniversalDependenciesEnglishEWT.ai.djl.training.dataset.Recordget(ai.djl.ndarray.NDManager manager, long index)Gets theRecordfor the given index from the dataset.voidprepare(ai.djl.util.Progress progress)Prepares the dataset for use with tracked progress.-
Methods inherited from class ai.djl.basicdataset.nlp.TextDataset
getProcessedText, getRawText, getSamples, getTextEmbedding, getVocabulary, preprocess
-
Methods inherited from class ai.djl.training.dataset.RandomAccessDataset
getData, getData, getData, getData, newSubDataset, newSubDataset, randomSplit, size, subDataset, subDataset, subDataset, subDataset, toArray
-
-
-
-
Constructor Detail
-
UniversalDependenciesEnglishEWT
protected UniversalDependenciesEnglishEWT(UniversalDependenciesEnglishEWT.Builder builder)
Creates a new instance ofUniversalDependenciesEnglish.- Parameters:
builder- the builder object to build from
-
-
Method Detail
-
builder
public static UniversalDependenciesEnglishEWT.Builder builder()
Creates a new builder to build aUniversalDependenciesEnglishEWT.- Returns:
- a new builder
-
prepare
public void prepare(ai.djl.util.Progress progress) throws java.io.IOException, ai.djl.modality.nlp.embedding.EmbeddingExceptionPrepares the dataset for use with tracked progress. In this method the TXT file will be parsed. The texts will be added tosourceTextDataand the Universal POS tags will be added touniversalPosTags. OnlysourceTextDatawill then be preprocessed.- Parameters:
progress- the progress tracker- Throws:
java.io.IOException- for various exceptions depending on the datasetai.djl.modality.nlp.embedding.EmbeddingException- if there are exceptions during the embedding process
-
get
public ai.djl.training.dataset.Record get(ai.djl.ndarray.NDManager manager, long index)Gets theRecordfor the given index from the dataset.- Specified by:
getin classai.djl.training.dataset.RandomAccessDataset- Parameters:
manager- the manager used to create the arraysindex- the index of the requested data item- Returns:
- a
Recordthat contains the data and label of the requested data item. The dataNDListcontains oneNDArrayrepresenting the text embedding, The labelNDListcontains oneNDArrayincluding the indices of the Universal POS tags of each token. For the index of each Universal POS tag, see the enum classUniversalDependenciesEnglishEWT.UniversalPosTag.
-
availableSize
protected long availableSize()
Returns the number of records available to be read in thisDataset.- Specified by:
availableSizein classai.djl.training.dataset.RandomAccessDataset- Returns:
- the number of records available to be read in this
Dataset
-
-