Package ai.djl.modality.nlp
Class DefaultVocabulary.Builder
- java.lang.Object
-
- ai.djl.modality.nlp.DefaultVocabulary.Builder
-
- Enclosing class:
- DefaultVocabulary
public static final class DefaultVocabulary.Builder extends java.lang.ObjectBuilder class that is used to build theDefaultVocabulary.
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description DefaultVocabulary.Builderadd(java.util.List<java.lang.String> sentence)Adds the given sentence to theDefaultVocabulary.DefaultVocabulary.BuilderaddAll(java.util.List<java.util.List<java.lang.String>> sentences)Adds the given list of sentences to theDefaultVocabulary.DefaultVocabulary.BuilderaddFromCustomizedFile(java.net.URL url, java.util.function.Function<java.net.URL,java.util.List<java.lang.String>> lambda)Adds a customized vocabulary to theDefaultVocabulary.DefaultVocabulary.BuilderaddFromTextFile(java.net.URL url)Adds a text vocabulary to theDefaultVocabulary.DefaultVocabulary.BuilderaddFromTextFile(java.nio.file.Path path)Adds a text vocabulary to theDefaultVocabulary.DefaultVocabularybuild()Builds theDefaultVocabularyobject with the set arguments.DefaultVocabulary.BuilderoptMaxTokens(int maxTokens)Sets the optional limit on the size of the vocabulary.DefaultVocabulary.BuilderoptMinFrequency(int minFrequency)Sets the optional parameter that specifies the minimum frequency to consider a token to be part of theDefaultVocabulary.DefaultVocabulary.BuilderoptReservedTokens(java.util.Collection<java.lang.String> reservedTokens)Sets the optional parameter that sets the list of reserved tokens.DefaultVocabulary.BuilderoptUnknownToken()Sets the optional parameter that specifies the unknown token's string value with ">unk<".DefaultVocabulary.BuilderoptUnknownToken(java.lang.String unknownToken)Sets the optional parameter that specifies the unknown token's string value.
-
-
-
Method Detail
-
optMinFrequency
public DefaultVocabulary.Builder optMinFrequency(int minFrequency)
Sets the optional parameter that specifies the minimum frequency to consider a token to be part of theDefaultVocabulary. Defaults to no minimum.- Parameters:
minFrequency- the minimum frequency to consider a token to be part of theDefaultVocabularyor -1 for no minimum- Returns:
- this
VocabularyBuilder
-
optMaxTokens
public DefaultVocabulary.Builder optMaxTokens(int maxTokens)
Sets the optional limit on the size of the vocabulary.The size includes the reservedTokens. If the number of added tokens exceeds the maxToken limit, it keeps the most frequent tokens.
- Parameters:
maxTokens- the maximum number of tokens or -1 for no maximum- Returns:
- this
DefaultVocabulary.Builder
-
optUnknownToken
public DefaultVocabulary.Builder optUnknownToken()
Sets the optional parameter that specifies the unknown token's string value with ">unk<".- Returns:
- this
VocabularyBuilder
-
optUnknownToken
public DefaultVocabulary.Builder optUnknownToken(java.lang.String unknownToken)
Sets the optional parameter that specifies the unknown token's string value.- Parameters:
unknownToken- the string value of the unknown token- Returns:
- this
VocabularyBuilder
-
optReservedTokens
public DefaultVocabulary.Builder optReservedTokens(java.util.Collection<java.lang.String> reservedTokens)
Sets the optional parameter that sets the list of reserved tokens.- Parameters:
reservedTokens- the list of reserved tokens- Returns:
- this
VocabularyBuilder
-
add
public DefaultVocabulary.Builder add(java.util.List<java.lang.String> sentence)
Adds the given sentence to theDefaultVocabulary.- Parameters:
sentence- the sentence to be added- Returns:
- this
VocabularyBuilder
-
addAll
public DefaultVocabulary.Builder addAll(java.util.List<java.util.List<java.lang.String>> sentences)
Adds the given list of sentences to theDefaultVocabulary.- Parameters:
sentences- the list of sentences to be added- Returns:
- this
VocabularyBuilder
-
addFromTextFile
public DefaultVocabulary.Builder addFromTextFile(java.nio.file.Path path) throws java.io.IOException
Adds a text vocabulary to theDefaultVocabulary.Example text file(vocab.txt): token1 token2 token3 will be mapped to index of 0 1 2
- Parameters:
path- the path to the text file- Returns:
- this
VocabularyBuilder - Throws:
java.io.IOException- if failed to read vocabulary file
-
addFromTextFile
public DefaultVocabulary.Builder addFromTextFile(java.net.URL url) throws java.io.IOException
Adds a text vocabulary to theDefaultVocabulary.- Parameters:
url- the text file url- Returns:
- this
VocabularyBuilder - Throws:
java.io.IOException- if failed to read vocabulary file
-
addFromCustomizedFile
public DefaultVocabulary.Builder addFromCustomizedFile(java.net.URL url, java.util.function.Function<java.net.URL,java.util.List<java.lang.String>> lambda)
Adds a customized vocabulary to theDefaultVocabulary.- Parameters:
url- the text file urllambda- the function to parse the vocabulary file- Returns:
- this
VocabularyBuilder
-
build
public DefaultVocabulary build()
Builds theDefaultVocabularyobject with the set arguments.- Returns:
- the
DefaultVocabularyobject built
-
-