Class StopFilter

All Implemented Interfaces:
Closeable, AutoCloseable

public final class StopFilter extends FilteringTokenFilter
Removes stop words from a token stream.

You must specify the required Version compatibility when creating StopFilter:

  • As of 3.1, StopFilter correctly handles Unicode 4.0 supplementary characters in stopwords and position increments are preserved
  • Constructor Details

    • StopFilter

      public StopFilter(Version matchVersion, TokenStream in, CharArraySet stopWords)
      Constructs a filter which removes words from the input TokenStream that are named in the Set.
      Parameters:
      matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the stop set if Version > 3.0. See above for details.
      in - Input stream
      stopWords - A CharArraySet representing the stopwords.
      See Also:
  • Method Details

    • makeStopSet

      public static CharArraySet makeStopSet(Version matchVersion, String... stopWords)
      Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor. This permits this stopWords construction to be cached once when an Analyzer is constructed.
      Parameters:
      matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the returned set if Version > 3.0
      stopWords - An array of stopwords
      See Also:
    • makeStopSet

      public static CharArraySet makeStopSet(Version matchVersion, List<?> stopWords)
      Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor. This permits this stopWords construction to be cached once when an Analyzer is constructed.
      Parameters:
      matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the returned set if Version > 3.0
      stopWords - A List of Strings or char[] or any other toString()-able list representing the stopwords
      Returns:
      A Set (CharArraySet) containing the words
      See Also:
    • makeStopSet

      public static CharArraySet makeStopSet(Version matchVersion, String[] stopWords, boolean ignoreCase)
      Creates a stopword set from the given stopword array.
      Parameters:
      matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the returned set if Version > 3.0
      stopWords - An array of stopwords
      ignoreCase - If true, all words are lower cased first.
      Returns:
      a Set containing the words
    • makeStopSet

      public static CharArraySet makeStopSet(Version matchVersion, List<?> stopWords, boolean ignoreCase)
      Creates a stopword set from the given stopword list.
      Parameters:
      matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the returned set if Version > 3.0
      stopWords - A List of Strings or char[] or any other toString()-able list representing the stopwords
      ignoreCase - if true, all words are lower cased first
      Returns:
      A Set (CharArraySet) containing the words