Class DefaultVocabulary.Builder

  • Enclosing class:
    DefaultVocabulary

    public static final class DefaultVocabulary.Builder
    extends java.lang.Object
    Builder class that is used to build the DefaultVocabulary.
    • Method Detail

      • optMinFrequency

        public DefaultVocabulary.Builder optMinFrequency​(int minFrequency)
        Sets the optional parameter that specifies the minimum frequency to consider a token to be part of the DefaultVocabulary. Defaults to no minimum.
        Parameters:
        minFrequency - the minimum frequency to consider a token to be part of the DefaultVocabulary or -1 for no minimum
        Returns:
        this VocabularyBuilder
      • optMaxTokens

        public DefaultVocabulary.Builder optMaxTokens​(int maxTokens)
        Sets the optional limit on the size of the vocabulary.

        The size includes the reservedTokens. If the number of added tokens exceeds the maxToken limit, it keeps the most frequent tokens.

        Parameters:
        maxTokens - the maximum number of tokens or -1 for no maximum
        Returns:
        this DefaultVocabulary.Builder
      • optUnknownToken

        public DefaultVocabulary.Builder optUnknownToken()
        Sets the optional parameter that specifies the unknown token's string value with ">unk<".
        Returns:
        this VocabularyBuilder
      • optUnknownToken

        public DefaultVocabulary.Builder optUnknownToken​(java.lang.String unknownToken)
        Sets the optional parameter that specifies the unknown token's string value.
        Parameters:
        unknownToken - the string value of the unknown token
        Returns:
        this VocabularyBuilder
      • optReservedTokens

        public DefaultVocabulary.Builder optReservedTokens​(java.util.Collection<java.lang.String> reservedTokens)
        Sets the optional parameter that sets the list of reserved tokens.
        Parameters:
        reservedTokens - the list of reserved tokens
        Returns:
        this VocabularyBuilder
      • add

        public DefaultVocabulary.Builder add​(java.util.List<java.lang.String> sentence)
        Adds the given sentence to the DefaultVocabulary.
        Parameters:
        sentence - the sentence to be added
        Returns:
        this VocabularyBuilder
      • addAll

        public DefaultVocabulary.Builder addAll​(java.util.List<java.util.List<java.lang.String>> sentences)
        Adds the given list of sentences to the DefaultVocabulary.
        Parameters:
        sentences - the list of sentences to be added
        Returns:
        this VocabularyBuilder
      • addFromTextFile

        public DefaultVocabulary.Builder addFromTextFile​(java.nio.file.Path path)
                                                  throws java.io.IOException
        Adds a text vocabulary to the DefaultVocabulary.
           Example text file(vocab.txt):
           token1
           token2
           token3
           will be mapped to index of 0 1 2
         
        Parameters:
        path - the path to the text file
        Returns:
        this VocabularyBuilder
        Throws:
        java.io.IOException - if failed to read vocabulary file
      • addFromTextFile

        public DefaultVocabulary.Builder addFromTextFile​(java.net.URL url)
                                                  throws java.io.IOException
        Adds a text vocabulary to the DefaultVocabulary.
        Parameters:
        url - the text file url
        Returns:
        this VocabularyBuilder
        Throws:
        java.io.IOException - if failed to read vocabulary file
      • addFromCustomizedFile

        public DefaultVocabulary.Builder addFromCustomizedFile​(java.net.URL url,
                                                               java.util.function.Function<java.net.URL,​java.util.List<java.lang.String>> lambda)
        Adds a customized vocabulary to the DefaultVocabulary.
        Parameters:
        url - the text file url
        lambda - the function to parse the vocabulary file
        Returns:
        this VocabularyBuilder