public static final class DefaultVocabulary.Builder
extends java.lang.Object
DefaultVocabulary.| Modifier and Type | Method and Description |
|---|---|
DefaultVocabulary.Builder |
add(java.util.List<java.lang.String> sentence)
Adds the given sentence to the
DefaultVocabulary. |
DefaultVocabulary.Builder |
addAll(java.util.List<java.util.List<java.lang.String>> sentences)
Adds the given list of sentences to the
DefaultVocabulary. |
DefaultVocabulary.Builder |
addFromCustomizedFile(java.net.URL url,
java.util.function.Function<java.net.URL,java.util.List<java.lang.String>> lambda)
Adds a customized vocabulary to the
DefaultVocabulary. |
DefaultVocabulary.Builder |
addFromTextFile(java.nio.file.Path path)
Adds a text vocabulary to the
DefaultVocabulary. |
DefaultVocabulary.Builder |
addFromTextFile(java.net.URL url)
Adds a text vocabulary to the
DefaultVocabulary. |
DefaultVocabulary |
build()
Builds the
DefaultVocabulary object with the set arguments. |
DefaultVocabulary.Builder |
optMaxTokens(int maxTokens)
Sets the optional limit on the size of the vocabulary.
|
DefaultVocabulary.Builder |
optMinFrequency(int minFrequency)
Sets the optional parameter that specifies the minimum frequency to consider a token to
be part of the
DefaultVocabulary. |
DefaultVocabulary.Builder |
optReservedTokens(java.util.Collection<java.lang.String> reservedTokens)
Sets the optional parameter that sets the list of reserved tokens.
|
DefaultVocabulary.Builder |
optUnknownToken()
Sets the optional parameter that specifies the unknown token's string value with
">unk<".
|
DefaultVocabulary.Builder |
optUnknownToken(java.lang.String unknownToken)
Sets the optional parameter that specifies the unknown token's string value.
|
public DefaultVocabulary.Builder optMinFrequency(int minFrequency)
DefaultVocabulary. Defaults to no minimum.minFrequency - the minimum frequency to consider a token to be part of the DefaultVocabulary or -1 for no minimumVocabularyBuilderpublic DefaultVocabulary.Builder optMaxTokens(int maxTokens)
The size includes the reservedTokens. If the number of added tokens exceeds the maxToken limit, it keeps the most frequent tokens.
maxTokens - the maximum number of tokens or -1 for no maximumDefaultVocabulary.Builderpublic DefaultVocabulary.Builder optUnknownToken()
VocabularyBuilderpublic DefaultVocabulary.Builder optUnknownToken(java.lang.String unknownToken)
unknownToken - the string value of the unknown tokenVocabularyBuilderpublic DefaultVocabulary.Builder optReservedTokens(java.util.Collection<java.lang.String> reservedTokens)
reservedTokens - the list of reserved tokensVocabularyBuilderpublic DefaultVocabulary.Builder add(java.util.List<java.lang.String> sentence)
DefaultVocabulary.sentence - the sentence to be addedVocabularyBuilderpublic DefaultVocabulary.Builder addAll(java.util.List<java.util.List<java.lang.String>> sentences)
DefaultVocabulary.sentences - the list of sentences to be addedVocabularyBuilderpublic DefaultVocabulary.Builder addFromTextFile(java.nio.file.Path path) throws java.io.IOException
DefaultVocabulary.
Example text file(vocab.txt): token1 token2 token3 will be mapped to index of 0 1 2
path - the path to the text fileVocabularyBuilderjava.io.IOException - if failed to read vocabulary filepublic DefaultVocabulary.Builder addFromTextFile(java.net.URL url) throws java.io.IOException
DefaultVocabulary.url - the text file urlVocabularyBuilderjava.io.IOException - if failed to read vocabulary filepublic DefaultVocabulary.Builder addFromCustomizedFile(java.net.URL url, java.util.function.Function<java.net.URL,java.util.List<java.lang.String>> lambda)
DefaultVocabulary.url - the text file urllambda - the function to parse the vocabulary fileVocabularyBuilderpublic DefaultVocabulary build()
DefaultVocabulary object with the set arguments.DefaultVocabulary object built