Package org.apache.lucene.analysis.ar
Class ArabicLetterTokenizer
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.Tokenizer
org.apache.lucene.analysis.util.CharTokenizer
org.apache.lucene.analysis.core.LetterTokenizer
org.apache.lucene.analysis.ar.ArabicLetterTokenizer
- All Implemented Interfaces:
Closeable,AutoCloseable
Deprecated.
Tokenizer that breaks text into runs of letters and diacritics.
The problem with the standard Letter tokenizer is that it fails on diacritics. Handling similar to this is necessary for Indic Scripts, Hebrew, Thaana, etc.
You must specify the required Version compatibility when creating
ArabicLetterTokenizer:
- As of 3.1,
CharTokenizeruses an int based API to normalize and detect token characters. SeeisTokenChar(int)andCharTokenizer.normalize(int)for details.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.AttributeFactory, AttributeSource.State -
Constructor Summary
ConstructorsConstructorDescriptionArabicLetterTokenizer(Version matchVersion, Reader in) Deprecated.Construct a new ArabicLetterTokenizer.ArabicLetterTokenizer(Version matchVersion, AttributeSource.AttributeFactory factory, Reader in) Deprecated.Construct a new ArabicLetterTokenizer using a givenAttributeSource.AttributeFactory. -
Method Summary
Methods inherited from class org.apache.lucene.analysis.util.CharTokenizer
end, incrementToken, resetMethods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
-
Constructor Details
-
ArabicLetterTokenizer
Deprecated.Construct a new ArabicLetterTokenizer.- Parameters:
matchVersion- Lucene version to match See}invalid @link
{@link <a href="#version">above</a>in- the input to split up into tokens
-
ArabicLetterTokenizer
public ArabicLetterTokenizer(Version matchVersion, AttributeSource.AttributeFactory factory, Reader in) Deprecated.Construct a new ArabicLetterTokenizer using a givenAttributeSource.AttributeFactory. * @param matchVersion Lucene version to match See}invalid @link
{@link <a href="#version">above</a>- Parameters:
factory- the attribute factory to use for this Tokenizerin- the input to split up into tokens
-
StandardTokenizerinstead.