Class UAX29URLEmailAnalyzer

All Implemented Interfaces:
Closeable, AutoCloseable

public final class UAX29URLEmailAnalyzer extends StopwordAnalyzerBase
Filters UAX29URLEmailTokenizer with StandardFilter, LowerCaseFilter and StopFilter, using a list of English stop words.

You must specify the required Version compatibility when creating UAX29URLEmailAnalyzer

  • Field Details

    • DEFAULT_MAX_TOKEN_LENGTH

      public static final int DEFAULT_MAX_TOKEN_LENGTH
      Default maximum allowed token length
      See Also:
    • STOP_WORDS_SET

      public static final CharArraySet STOP_WORDS_SET
      An unmodifiable set containing some common English words that are usually not useful for searching.
  • Constructor Details

    • UAX29URLEmailAnalyzer

      public UAX29URLEmailAnalyzer(Version matchVersion, CharArraySet stopWords)
      Builds an analyzer with the given stop words.
      Parameters:
      matchVersion - Lucene version to match See invalid input: '{@link <a href="#version">above</a>'}
      stopWords - stop words
    • UAX29URLEmailAnalyzer

      public UAX29URLEmailAnalyzer(Version matchVersion)
      Builds an analyzer with the default stop words (STOP_WORDS_SET).
      Parameters:
      matchVersion - Lucene version to match See invalid input: '{@link <a href="#version">above</a>'}
    • UAX29URLEmailAnalyzer

      public UAX29URLEmailAnalyzer(Version matchVersion, Reader stopwords) throws IOException
      Builds an analyzer with the stop words from the given reader.
      Parameters:
      matchVersion - Lucene version to match See invalid input: '{@link <a href="#version">above</a>'}
      stopwords - Reader to read stop words from
      Throws:
      IOException
      See Also:
  • Method Details

    • setMaxTokenLength

      public void setMaxTokenLength(int length)
      Set maximum allowed token length. If a token is seen that exceeds this length then it is discarded. This setting only takes effect the next time tokenStream or tokenStream is called.
    • getMaxTokenLength

      public int getMaxTokenLength()
      See Also: