Package ai.djl.basicdataset.utils
Class TextData
- java.lang.Object
-
- ai.djl.basicdataset.utils.TextData
-
public class TextData extends java.lang.Object
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classTextData.ConfigurationThe configuration for creating aTextDatavalue in aDataset.
-
Constructor Summary
Constructors Constructor Description TextData(TextData.Configuration config)Constructs a newTextData.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static TextData.ConfigurationgetDefaultConfiguration()Returns a good defaultTextData.Configurationto use for the constructor with defaults.ai.djl.ndarray.NDArraygetEmbedding(ai.djl.ndarray.NDManager manager, long index)Gets the text embedding for the given index of the text input.java.util.List<java.lang.String>getProcessedText(long index)Gets the textual input after preprocessing.java.lang.StringgetRawText(long index)Gets the raw textual input.intgetSize()Returns the size of the data.ai.djl.modality.nlp.embedding.TextEmbeddinggetTextEmbedding()Gets theTextEmbeddingused to embed the data with.ai.djl.modality.nlp.VocabularygetVocabulary()Gets theDefaultVocabularybuilt while preprocessing the text data.voidpreprocess(ai.djl.ndarray.NDManager manager, java.util.List<java.lang.String> newTextData)Preprocess the textData intoNDArrayby providing the data from the dataset.voidsetEmbeddingSize(int embeddingSize)Sets the embedding size.voidsetTextEmbedding(ai.djl.modality.nlp.embedding.TextEmbedding textEmbedding)Sets the textEmbedding to embed the data with.voidsetTextProcessors(java.util.List<ai.djl.modality.nlp.preprocess.TextProcessor> textProcessors)Sets the text processors.
-
-
-
Constructor Detail
-
TextData
public TextData(TextData.Configuration config)
Constructs a newTextData.- Parameters:
config- the configuration for theTextData
-
-
Method Detail
-
getDefaultConfiguration
public static TextData.Configuration getDefaultConfiguration()
Returns a good defaultTextData.Configurationto use for the constructor with defaults.- Returns:
- a good default
TextData.Configurationto use for the constructor with defaults
-
preprocess
public void preprocess(ai.djl.ndarray.NDManager manager, java.util.List<java.lang.String> newTextData) throws ai.djl.modality.nlp.embedding.EmbeddingExceptionPreprocess the textData intoNDArrayby providing the data from the dataset.- Parameters:
manager- thenewTextData- the data from the dataset- Throws:
ai.djl.modality.nlp.embedding.EmbeddingException- if there is an error while embedding input
-
setTextProcessors
public void setTextProcessors(java.util.List<ai.djl.modality.nlp.preprocess.TextProcessor> textProcessors)
Sets the text processors.- Parameters:
textProcessors- the new textProcessors
-
setTextEmbedding
public void setTextEmbedding(ai.djl.modality.nlp.embedding.TextEmbedding textEmbedding)
Sets the textEmbedding to embed the data with.- Parameters:
textEmbedding- the textEmbedding
-
getTextEmbedding
public ai.djl.modality.nlp.embedding.TextEmbedding getTextEmbedding()
Gets theTextEmbeddingused to embed the data with.- Returns:
- the
TextEmbedding
-
setEmbeddingSize
public void setEmbeddingSize(int embeddingSize)
Sets the embedding size.- Parameters:
embeddingSize- the embedding size
-
getVocabulary
public ai.djl.modality.nlp.Vocabulary getVocabulary()
Gets theDefaultVocabularybuilt while preprocessing the text data.- Returns:
- the
DefaultVocabulary
-
getEmbedding
public ai.djl.ndarray.NDArray getEmbedding(ai.djl.ndarray.NDManager manager, long index)Gets the text embedding for the given index of the text input.- Parameters:
manager- the manager for the embedding arrayindex- the index of the text input- Returns:
- the
NDArraycontaining the text embedding
-
getRawText
public java.lang.String getRawText(long index)
Gets the raw textual input.- Parameters:
index- the index of the text input- Returns:
- the raw text
-
getProcessedText
public java.util.List<java.lang.String> getProcessedText(long index)
Gets the textual input after preprocessing.- Parameters:
index- the index of the text input- Returns:
- the list of processed tokens
-
getSize
public int getSize()
Returns the size of the data.- Returns:
- the size of the data
-
-