Package ai.djl.basicdataset.nlp
Class GoEmotions
- java.lang.Object
-
- ai.djl.training.dataset.RandomAccessDataset
-
- ai.djl.basicdataset.nlp.TextDataset
-
- ai.djl.basicdataset.nlp.GoEmotions
-
- All Implemented Interfaces:
ai.djl.training.dataset.Dataset
public class GoEmotions extends TextDataset
GoEmotions is a corpus of 58k carefully curated comments extracted from Reddit, with human annotations to 27 emotion categories or Neutral. This version of data is filtered based on rater-agreement on top of the raw data, and contains a train/test/validation split. The emotion categories are: admiration, amusement, anger, annoyance, approval, caring, confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear, gratitude, grief, joy, love, nervousness, optimism, pride, realization, relief, remorse, sadness, surprise.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classGoEmotions.BuilderA builder to construct aGoEmotions.-
Nested classes/interfaces inherited from class ai.djl.basicdataset.nlp.TextDataset
TextDataset.Sample
-
-
Field Summary
-
Fields inherited from class ai.djl.basicdataset.nlp.TextDataset
manager, mrl, prepared, samples, sourceTextData, targetTextData, usage
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected longavailableSize()Returns the number of records available to be read in thisDataset.static GoEmotions.Builderbuilder()Creates a builder to build aGoEmotions.ai.djl.training.dataset.Recordget(ai.djl.ndarray.NDManager manager, long index)Gets theRecordfor the given index from the dataset.voidprepare(ai.djl.util.Progress progress)Prepares the dataset for use with tracked progress.-
Methods inherited from class ai.djl.basicdataset.nlp.TextDataset
getProcessedText, getRawText, getSamples, getTextEmbedding, getVocabulary, preprocess
-
Methods inherited from class ai.djl.training.dataset.RandomAccessDataset
getData, getData, getData, getData, newSubDataset, newSubDataset, randomSplit, size, subDataset, subDataset, subDataset, subDataset, toArray
-
-
-
-
Method Detail
-
prepare
public void prepare(ai.djl.util.Progress progress) throws java.io.IOException, ai.djl.modality.nlp.embedding.EmbeddingExceptionPrepares the dataset for use with tracked progress. In this method the TSV file will be parsed. All datasets will be preprocessed.- Parameters:
progress- the progress tracker- Throws:
java.io.IOException- for various exceptions depending on the datasetai.djl.modality.nlp.embedding.EmbeddingException
-
get
public ai.djl.training.dataset.Record get(ai.djl.ndarray.NDManager manager, long index) throws java.io.IOExceptionGets theRecordfor the given index from the dataset.- Specified by:
getin classai.djl.training.dataset.RandomAccessDataset- Parameters:
manager- the manager used to create the arraysindex- the index of the requested data item- Returns:
- a
Recordthat contains the data and label of the requested data item. The dataNDListcontains threeNDArrays representing the embedded title, context and question, which are named accordingly. The labelNDListcontains multipleNDArrays corresponding to each embedded answer. - Throws:
java.io.IOException
-
availableSize
protected long availableSize()
Returns the number of records available to be read in thisDataset. In this implementation, the actual size of available records are the size ofquestionInfoList.- Specified by:
availableSizein classai.djl.training.dataset.RandomAccessDataset- Returns:
- the number of records available to be read in this
Dataset
-
builder
public static GoEmotions.Builder builder()
Creates a builder to build aGoEmotions.- Returns:
- a new builder
-
-