org.elasticsearch.index.analysis
Class ICUNormalizer2Filter
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
org.elasticsearch.index.analysis.ICUNormalizer2Filter
- All Implemented Interfaces:
- java.io.Closeable
- Direct Known Subclasses:
- ICUFoldingFilter
public class ICUNormalizer2Filter
- extends org.apache.lucene.analysis.TokenFilter
Normalize token text with ICU's Normalizer2
With this filter, you can normalize text in the following ways:
- NFKC Normalization, Case Folding, and removing Ignorables (the default)
- Using a standard Normalization mode (NFC, NFD, NFKC, NFKD)
- Based on rules from a custom normalization mapping.
If you use the defaults, this filter is a simple way to standardize Unicode text
in a language-independent way for search:
- The case folding that it does can be seen as a replacement for
LowerCaseFilter.
- Ignorables such as Zero-Width Joiner and Variation Selectors are removed.
These are typically modifier characters that affect display.
- See Also:
Normalizer2,
FilteredNormalizer2
| Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource |
org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State |
| Fields inherited from class org.apache.lucene.analysis.TokenFilter |
input |
|
Constructor Summary |
ICUNormalizer2Filter(org.apache.lucene.analysis.TokenStream input)
Create a new Normalizer2Filter that combines NFKC normalization, Case
Folding, and removes Default Ignorables (NFKC_Casefold) |
ICUNormalizer2Filter(org.apache.lucene.analysis.TokenStream input,
com.ibm.icu.text.Normalizer2 normalizer)
Create a new Normalizer2Filter with the specified Normalizer2 |
| Methods inherited from class org.apache.lucene.analysis.TokenFilter |
close, end, reset |
| Methods inherited from class org.apache.lucene.util.AttributeSource |
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, restoreState, toString |
| Methods inherited from class java.lang.Object |
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
ICUNormalizer2Filter
public ICUNormalizer2Filter(org.apache.lucene.analysis.TokenStream input)
- Create a new Normalizer2Filter that combines NFKC normalization, Case
Folding, and removes Default Ignorables (NFKC_Casefold)
ICUNormalizer2Filter
public ICUNormalizer2Filter(org.apache.lucene.analysis.TokenStream input,
com.ibm.icu.text.Normalizer2 normalizer)
- Create a new Normalizer2Filter with the specified Normalizer2
- Parameters:
input - streamnormalizer - normalizer to use
incrementToken
public final boolean incrementToken()
throws java.io.IOException
- Specified by:
incrementToken in class org.apache.lucene.analysis.TokenStream
- Throws:
java.io.IOException