org.exoplatform.services.jcr.analyzer
Class WhitespaceTokenizer
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.Tokenizer
org.apache.lucene.analysis.CharTokenizer
org.exoplatform.services.jcr.analyzer.WhitespaceTokenizer
- All Implemented Interfaces:
- Closeable
public class WhitespaceTokenizer
- extends org.apache.lucene.analysis.CharTokenizer
Created by The eXo Platform SAS
Author : eXoPlatform
exo@exoplatform.com
Apr 9, 2013
A WhitespaceTokenizer is a tokenizer that divides text at whitespace.
Adjacent sequences of non-Whitespace characters form tokens.
You must specify the required Version compatibility when creating
WhitespaceTokenizer:
- As of 3.1,
CharTokenizer uses an int based API to normalize and
detect token characters. See CharTokenizer.isTokenChar(int) and
CharTokenizer.normalize(int) for details.
| Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource |
org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State |
| Fields inherited from class org.apache.lucene.analysis.Tokenizer |
input |
|
Constructor Summary |
WhitespaceTokenizer(org.apache.lucene.util.Version matchVersion,
org.apache.lucene.util.AttributeSource.AttributeFactory factory,
Reader in)
Construct a new WhitespaceTokenizer using a given
AttributeSource.AttributeFactory. |
WhitespaceTokenizer(org.apache.lucene.util.Version matchVersion,
org.apache.lucene.util.AttributeSource source,
Reader in)
Construct a new WhitespaceTokenizer using a given AttributeSource. |
WhitespaceTokenizer(org.apache.lucene.util.Version matchVersion,
Reader in)
Construct a new WhitespaceTokenizer. |
| Methods inherited from class org.apache.lucene.analysis.CharTokenizer |
end, incrementToken, isTokenChar, normalize, normalize, reset |
| Methods inherited from class org.apache.lucene.analysis.Tokenizer |
close, correctOffset |
| Methods inherited from class org.apache.lucene.analysis.TokenStream |
reset |
| Methods inherited from class org.apache.lucene.util.AttributeSource |
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString |
WhitespaceTokenizer
public WhitespaceTokenizer(org.apache.lucene.util.Version matchVersion,
Reader in)
- Construct a new WhitespaceTokenizer. * @param matchVersion Lucene version
to match
- Parameters:
in - the input to split up into tokens
WhitespaceTokenizer
public WhitespaceTokenizer(org.apache.lucene.util.Version matchVersion,
org.apache.lucene.util.AttributeSource source,
Reader in)
- Construct a new WhitespaceTokenizer using a given
AttributeSource.
- Parameters:
matchVersion - Lucene version to matchsource - the attribute source to use for this Tokenizerin - the input to split up into tokens
WhitespaceTokenizer
public WhitespaceTokenizer(org.apache.lucene.util.Version matchVersion,
org.apache.lucene.util.AttributeSource.AttributeFactory factory,
Reader in)
- Construct a new WhitespaceTokenizer using a given
AttributeSource.AttributeFactory.
- Parameters:
matchVersion - Lucene version to match See
<a href="#version">above</a>factory - the attribute factory to use for this Tokenizerin - the input to split up into tokens
isTokenChar
protected boolean isTokenChar(int c)
- Collects only characters which do not satisfy
Character.isWhitespace(int).
- Overrides:
isTokenChar in class org.apache.lucene.analysis.CharTokenizer
Copyright © 2016 eXo Platform SAS. All Rights Reserved.