Package ai.djl.basicdataset.nlp
Class WikiText2
- java.lang.Object
-
- ai.djl.basicdataset.nlp.WikiText2
-
- All Implemented Interfaces:
ai.djl.training.dataset.Dataset,ai.djl.training.dataset.RawDataset<java.nio.file.Path>
public class WikiText2 extends java.lang.Object implements ai.djl.training.dataset.RawDataset<java.nio.file.Path>The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classWikiText2.BuilderA builder to construct aWikiText2.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static WikiText2.Builderbuilder()Creates a builder to build aWikiText2.java.nio.file.PathgetData()Get data from the WikiText2 dataset.java.lang.Iterable<ai.djl.training.dataset.Batch>getData(ai.djl.ndarray.NDManager manager)Fetches an iterator that can iterate through theDataset.voidprepare(ai.djl.util.Progress progress)Prepares the dataset for use with tracked progress.
-
-
-
Method Detail
-
builder
public static WikiText2.Builder builder()
Creates a builder to build aWikiText2.- Returns:
- a new
WikiText2.Builderobject
-
prepare
public void prepare(ai.djl.util.Progress progress) throws java.io.IOExceptionPrepares the dataset for use with tracked progress.- Specified by:
preparein interfaceai.djl.training.dataset.Dataset- Parameters:
progress- the progress tracker- Throws:
java.io.IOException- for various exceptions depending on the dataset
-
getData
public java.lang.Iterable<ai.djl.training.dataset.Batch> getData(ai.djl.ndarray.NDManager manager) throws java.io.IOException, ai.djl.translate.TranslateExceptionFetches an iterator that can iterate through theDataset. This method is not implemented for the WikiText2 dataset because the WikiText2 dataset is not suitable for iteration. If the method is called, it will directly returnnull.- Specified by:
getDatain interfaceai.djl.training.dataset.Dataset- Parameters:
manager- the dataset to iterate through- Returns:
- an
IterableofBatchthat contains batches of data from the dataset - Throws:
java.io.IOExceptionai.djl.translate.TranslateException
-
getData
public java.nio.file.Path getData() throws java.io.IOExceptionGet data from the WikiText2 dataset. This method will directly return the whole dataset.- Specified by:
getDatain interfaceai.djl.training.dataset.RawDataset<java.nio.file.Path>- Returns:
- a
Pathobject locating the WikiText2 dataset file - Throws:
java.io.IOException
-
-