Class StandardAnalyzer
java.lang.Object
org.apache.lucene.analysis.Analyzer
org.apache.lucene.analysis.util.StopwordAnalyzerBase
org.apache.lucene.analysis.standard.StandardAnalyzer
- All Implemented Interfaces:
Closeable,AutoCloseable
Filters
StandardTokenizer with StandardFilter, LowerCaseFilter and StopFilter, using a list of
English stop words.
You must specify the required Version
compatibility when creating StandardAnalyzer:
- As of 3.4, Hiragana and Han characters are no longer wrongly split from their combining characters. If you use a previous version number, you get the exact broken behavior for backwards compatibility.
- As of 3.1, StandardTokenizer implements Unicode text segmentation,
and StopFilter correctly handles Unicode 4.0 supplementary characters
in stopwords.
ClassicTokenizerandClassicAnalyzerare the pre-3.1 implementations of StandardTokenizer and StandardAnalyzer. - As of 2.9, StopFilter preserves position increments
- As of 2.4, Tokens incorrectly identified as acronyms are corrected (see LUCENE-1068)
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
Analyzer.GlobalReuseStrategy, Analyzer.PerFieldReuseStrategy, Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intDefault maximum allowed token lengthstatic final CharArraySetAn unmodifiable set containing some common English words that are usually not useful for searching.Fields inherited from class org.apache.lucene.analysis.Analyzer
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY -
Constructor Summary
ConstructorsConstructorDescriptionStandardAnalyzer(Version matchVersion) Builds an analyzer with the default stop words (STOP_WORDS_SET).StandardAnalyzer(Version matchVersion, Reader stopwords) Builds an analyzer with the stop words from the given reader.StandardAnalyzer(Version matchVersion, CharArraySet stopWords) Builds an analyzer with the given stop words. -
Method Summary
Modifier and TypeMethodDescriptionintvoidsetMaxTokenLength(int length) Set maximum allowed token length.Methods inherited from class org.apache.lucene.analysis.util.StopwordAnalyzerBase
getStopwordSetMethods inherited from class org.apache.lucene.analysis.Analyzer
close, getOffsetGap, getPositionIncrementGap, getReuseStrategy, tokenStream, tokenStream
-
Field Details
-
DEFAULT_MAX_TOKEN_LENGTH
public static final int DEFAULT_MAX_TOKEN_LENGTHDefault maximum allowed token length- See Also:
-
STOP_WORDS_SET
An unmodifiable set containing some common English words that are usually not useful for searching.
-
-
Constructor Details
-
StandardAnalyzer
Builds an analyzer with the given stop words.- Parameters:
matchVersion- Lucene version to match See invalid input: '{@link <a href="#version">above</a>'}stopWords- stop words
-
StandardAnalyzer
Builds an analyzer with the default stop words (STOP_WORDS_SET).- Parameters:
matchVersion- Lucene version to match See invalid input: '{@link <a href="#version">above</a>'}
-
StandardAnalyzer
Builds an analyzer with the stop words from the given reader.- Parameters:
matchVersion- Lucene version to match See invalid input: '{@link <a href="#version">above</a>'}stopwords- Reader to read stop words from- Throws:
IOException- See Also:
-
-
Method Details
-
setMaxTokenLength
public void setMaxTokenLength(int length) Set maximum allowed token length. If a token is seen that exceeds this length then it is discarded. This setting only takes effect the next time tokenStream or tokenStream is called. -
getMaxTokenLength
public int getMaxTokenLength()- See Also:
-