public class KerasTokenizer extends Object
| Constructor and Description |
|---|
KerasTokenizer()
Default Keras tokenizer constructor
|
KerasTokenizer(Integer numWords)
Tokenizer constructor with only numWords specified
|
KerasTokenizer(Integer numWords,
String filters,
boolean lower,
String split,
boolean charLevel,
String outOfVocabularyToken)
Create a Keras Tokenizer instance with full set of properties.
|
| Modifier and Type | Method and Description |
|---|---|
void |
fitOnSequences(Integer[][] sequences)
Fit this tokenizer on a corpus of word indices
|
void |
fitOnTexts(String[] texts)
Fit this tokenizer on a corpus of texts.
|
static KerasTokenizer |
fromJson(String jsonFileName)
Import Keras Tokenizer from JSON file created with `tokenizer.to_json()` in Python.
|
INDArray |
sequencesToMatrix(Integer[][] sequences,
TokenizerMode mode)
Turns an array of index sequences into an ND4J matrix of shape
(number of texts, number of words in vocabulary)
|
String[] |
sequencesToTexts(Integer[][] sequences)
Turns index sequences back into texts
|
INDArray |
textsToMatrix(String[] texts,
TokenizerMode mode)
Turns an array of texts into an ND4J matrix of shape
(number of texts, number of words in vocabulary)
|
Integer[][] |
textsToSequences(String[] texts)
Transforms a bunch of texts into their index representations.
|
static String[] |
textToWordSequence(String text,
String filters,
boolean lower,
String split)
Turns a String text into a sequence of tokens.
|
public KerasTokenizer(Integer numWords, String filters, boolean lower, String split, boolean charLevel, String outOfVocabularyToken)
numWords - The maximum vocabulary size, can be nullfilters - Characters to filterlower - whether to lowercase input or notsplit - by which string to split words (usually single space)charLevel - whether to operate on character- or word-leveloutOfVocabularyToken - replace items outside the vocabulary by this tokenpublic KerasTokenizer(Integer numWords)
numWords - The maximum vocabulary size, can be nullpublic KerasTokenizer()
public static KerasTokenizer fromJson(String jsonFileName) throws IOException, InvalidKerasConfigurationException
jsonFileName - Full path of the JSON file to loadIOException - I/O exceptionInvalidKerasConfigurationException - Invalid Keras configurationpublic static String[] textToWordSequence(String text, String filters, boolean lower, String split)
text - input textfilters - characters to filterlower - whether to lowercase input or notsplit - by which string to split words (usually single space)public void fitOnTexts(String[] texts)
texts - array of strings to fit tokenizer on.public void fitOnSequences(Integer[][] sequences)
sequences - array of indices derived from a text.public Integer[][] textsToSequences(String[] texts)
texts - input textspublic String[] sequencesToTexts(Integer[][] sequences)
sequences - index sequencespublic INDArray textsToMatrix(String[] texts, TokenizerMode mode)
texts - input textsmode - TokenizerMode that controls how to vectorize datapublic INDArray sequencesToMatrix(Integer[][] sequences, TokenizerMode mode)
sequences - input sequencesmode - TokenizerMode that controls how to vectorize dataCopyright © 2021. All rights reserved.