Class WhitespaceTokenizer

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    public final class WhitespaceTokenizer
    extends CharTokenizer
    A tokenizer that divides text at whitespace characters as defined by Character.isWhitespace(int). Note: That definition explicitly excludes the non-breaking space. Adjacent sequences of non-Whitespace characters form tokens.
    See Also:
    UnicodeWhitespaceTokenizer
    • Constructor Detail

      • WhitespaceTokenizer

        public WhitespaceTokenizer()
        Construct a new WhitespaceTokenizer.
      • WhitespaceTokenizer

        public WhitespaceTokenizer​(AttributeFactory factory)
        Construct a new WhitespaceTokenizer using a given AttributeFactory.
        Parameters:
        factory - the attribute factory to use for this Tokenizer
      • WhitespaceTokenizer

        public WhitespaceTokenizer​(AttributeFactory factory,
                                   int maxTokenLen)
        Construct a new WhitespaceTokenizer using a given AttributeFactory.
        Parameters:
        factory - the attribute factory to use for this Tokenizer
        maxTokenLen - maximum token length the tokenizer will emit. Must be greater than 0 and less than MAX_TOKEN_LENGTH_LIMIT (1024*1024)
        Throws:
        IllegalArgumentException - if maxTokenLen is invalid.