Package org.apache.lucene.analysis.core
Class StopFilter
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
org.apache.lucene.analysis.util.FilteringTokenFilter
org.apache.lucene.analysis.core.StopFilter
- All Implemented Interfaces:
Closeable,AutoCloseable
Removes stop words from a token stream.
You must specify the required Version
compatibility when creating StopFilter:
- As of 3.1, StopFilter correctly handles Unicode 4.0 supplementary characters in stopwords and position increments are preserved
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.AttributeFactory, AttributeSource.State -
Constructor Summary
ConstructorsConstructorDescriptionStopFilter(Version matchVersion, TokenStream in, CharArraySet stopWords) Constructs a filter which removes words from the input TokenStream that are named in the Set. -
Method Summary
Modifier and TypeMethodDescriptionstatic CharArraySetmakeStopSet(Version matchVersion, String... stopWords) Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor.static CharArraySetmakeStopSet(Version matchVersion, String[] stopWords, boolean ignoreCase) Creates a stopword set from the given stopword array.static CharArraySetmakeStopSet(Version matchVersion, List<?> stopWords) Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor.static CharArraySetmakeStopSet(Version matchVersion, List<?> stopWords, boolean ignoreCase) Creates a stopword set from the given stopword list.Methods inherited from class org.apache.lucene.analysis.util.FilteringTokenFilter
end, getEnablePositionIncrements, incrementToken, reset, setEnablePositionIncrementsMethods inherited from class org.apache.lucene.analysis.TokenFilter
closeMethods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
-
Constructor Details
-
StopFilter
Constructs a filter which removes words from the input TokenStream that are named in the Set.- Parameters:
matchVersion- Lucene version to enable correct Unicode 4.0 behavior in the stop set if Version > 3.0. See above for details.in- Input streamstopWords- ACharArraySetrepresenting the stopwords.- See Also:
-
-
Method Details
-
makeStopSet
Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor. This permits this stopWords construction to be cached once when an Analyzer is constructed.- Parameters:
matchVersion- Lucene version to enable correct Unicode 4.0 behavior in the returned set if Version > 3.0stopWords- An array of stopwords- See Also:
-
makeStopSet
Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor. This permits this stopWords construction to be cached once when an Analyzer is constructed.- Parameters:
matchVersion- Lucene version to enable correct Unicode 4.0 behavior in the returned set if Version > 3.0stopWords- A List of Strings or char[] or any other toString()-able list representing the stopwords- Returns:
- A Set (
CharArraySet) containing the words - See Also:
-
makeStopSet
public static CharArraySet makeStopSet(Version matchVersion, String[] stopWords, boolean ignoreCase) Creates a stopword set from the given stopword array.- Parameters:
matchVersion- Lucene version to enable correct Unicode 4.0 behavior in the returned set if Version > 3.0stopWords- An array of stopwordsignoreCase- If true, all words are lower cased first.- Returns:
- a Set containing the words
-
makeStopSet
Creates a stopword set from the given stopword list.- Parameters:
matchVersion- Lucene version to enable correct Unicode 4.0 behavior in the returned set if Version > 3.0stopWords- A List of Strings or char[] or any other toString()-able list representing the stopwordsignoreCase- if true, all words are lower cased first- Returns:
- A Set (
CharArraySet) containing the words
-