Class StandardAnalyzer

All Implemented Interfaces:
Closeable, AutoCloseable

public final class StandardAnalyzer extends StopwordAnalyzerBase
Filters StandardTokenizer with StandardFilter, LowerCaseFilter and StopFilter, using a list of English stop words.

You must specify the required Version compatibility when creating StandardAnalyzer:

  • As of 3.4, Hiragana and Han characters are no longer wrongly split from their combining characters. If you use a previous version number, you get the exact broken behavior for backwards compatibility.
  • As of 3.1, StandardTokenizer implements Unicode text segmentation, and StopFilter correctly handles Unicode 4.0 supplementary characters in stopwords. ClassicTokenizer and ClassicAnalyzer are the pre-3.1 implementations of StandardTokenizer and StandardAnalyzer.
  • As of 2.9, StopFilter preserves position increments
  • As of 2.4, Tokens incorrectly identified as acronyms are corrected (see LUCENE-1068)
  • Field Details

    • DEFAULT_MAX_TOKEN_LENGTH

      public static final int DEFAULT_MAX_TOKEN_LENGTH
      Default maximum allowed token length
      See Also:
    • STOP_WORDS_SET

      public static final CharArraySet STOP_WORDS_SET
      An unmodifiable set containing some common English words that are usually not useful for searching.
  • Constructor Details

    • StandardAnalyzer

      public StandardAnalyzer(Version matchVersion, CharArraySet stopWords)
      Builds an analyzer with the given stop words.
      Parameters:
      matchVersion - Lucene version to match See invalid input: '{@link <a href="#version">above</a>'}
      stopWords - stop words
    • StandardAnalyzer

      public StandardAnalyzer(Version matchVersion)
      Builds an analyzer with the default stop words (STOP_WORDS_SET).
      Parameters:
      matchVersion - Lucene version to match See invalid input: '{@link <a href="#version">above</a>'}
    • StandardAnalyzer

      public StandardAnalyzer(Version matchVersion, Reader stopwords) throws IOException
      Builds an analyzer with the stop words from the given reader.
      Parameters:
      matchVersion - Lucene version to match See invalid input: '{@link <a href="#version">above</a>'}
      stopwords - Reader to read stop words from
      Throws:
      IOException
      See Also:
  • Method Details

    • setMaxTokenLength

      public void setMaxTokenLength(int length)
      Set maximum allowed token length. If a token is seen that exceeds this length then it is discarded. This setting only takes effect the next time tokenStream or tokenStream is called.
    • getMaxTokenLength

      public int getMaxTokenLength()
      See Also: