Package ai.djl.basicdataset.nlp
Contains a library of built-in datasets for
Application.NLP.-
Class Summary Class Description AmazonReview TheAmazonReviewdataset contains aApplication.NLP.SENTIMENT_ANALYSISset of reviews and their sentiment ratings.AmazonReview.Builder A builder to construct aAmazonReview.CookingStackExchange A text classification dataset contains questions from cooking.stackexchange.com and their associated tags on the site.CookingStackExchange.Builder A builder to construct aCookingStackExchange.PennTreebankText The Penn Treebank (PTB) project selected 2,499 stories from a three year Wall Street Journal (WSJ) collection of 98,732 stories for syntactic annotation (see here for details).PennTreebankText.Builder A builder to construct aPennTreebankText.StanfordMovieReview TheStanfordMovieReviewdataset contains aApplication.NLP.SENTIMENT_ANALYSISset of movie reviews and their sentiment ratings.StanfordMovieReview.Builder A builder for aStanfordMovieReview.StanfordQuestionAnsweringDataset Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.StanfordQuestionAnsweringDataset.Builder A builder for aStanfordQuestionAnsweringDataset.TatoebaEnglishFrenchDataset TatoebaEnglishFrenchDatasetis a English-French machine translation dataset from The Tatoeba Project (http://www.manythings.org/anki/).TatoebaEnglishFrenchDataset.Builder A builder for aTatoebaEnglishFrenchDataset.TextDataset TextDatasetis an abstract dataset that can be used for datasets for natural language processing where either the source or target are text-based data.TextDataset.Builder<T extends TextDataset.Builder<T>> Abstract Builder that helps build aTextDataset.TextDataset.Sample A class storesTextDatasetsample information.WikiText2 The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia.WikiText2.Builder A builder to construct aWikiText2.