Class UAX29URLEmailTokenizerImpl36
java.lang.Object
org.apache.lucene.analysis.standard.std36.UAX29URLEmailTokenizerImpl36
- All Implemented Interfaces:
StandardTokenizerInterface
@Deprecated
public final class UAX29URLEmailTokenizerImpl36
extends Object
implements StandardTokenizerInterface
Deprecated.
This class is only for exact backwards compatibility
This class implements UAX29URLEmailTokenizer using Unicode 6.0.0.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intDeprecated.static final intDeprecated.static final intDeprecated.static final intDeprecated.static final intDeprecated.static final intDeprecated.Numbersstatic final intDeprecated.Chars in class \p{Line_Break = Complex_Context} are from South East Asian scripts (Thai, Lao, Myanmar, Khmer, etc.).static final intDeprecated.static final intDeprecated.Alphanumeric sequencesstatic final intDeprecated.This character denotes the end of filestatic final intDeprecated.lexical states -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionintDeprecated.Resumes scanning until the next regular expression is matched, the end of input is encountered or an I/O-Error occurs.final voidDeprecated.Fills CharTermAttribute with the current token text.final voidyybegin(int newState) Deprecated.Enters a new lexical statefinal intyychar()Deprecated.Returns the current position.final charyycharat(int pos) Deprecated.Returns the character at position pos from the matched text.final voidyyclose()Deprecated.Closes the input stream.final intyylength()Deprecated.Returns the length of the matched text region.voidyypushback(int number) Deprecated.Pushes the specified amount of characters back into the input stream.final voidDeprecated.Resets the scanner to read from a new input stream.final intyystate()Deprecated.Returns the current lexical state.final Stringyytext()Deprecated.Returns the text matched by the current regular expression.
-
Field Details
-
YYEOF
public static final int YYEOFDeprecated.This character denotes the end of file- See Also:
-
YYINITIAL
public static final int YYINITIALDeprecated.lexical states- See Also:
-
WORD_TYPE
public static final int WORD_TYPEDeprecated.Alphanumeric sequences- See Also:
-
NUMERIC_TYPE
public static final int NUMERIC_TYPEDeprecated.Numbers- See Also:
-
SOUTH_EAST_ASIAN_TYPE
public static final int SOUTH_EAST_ASIAN_TYPEDeprecated.Chars in class \p{Line_Break = Complex_Context} are from South East Asian scripts (Thai, Lao, Myanmar, Khmer, etc.). Sequences of these are kept together as as a single token rather than broken up, because the logic required to break them at word boundaries is too complex for UAX#29.See Unicode Line Breaking Algorithm: http://www.unicode.org/reports/tr14/#SA
- See Also:
-
IDEOGRAPHIC_TYPE
public static final int IDEOGRAPHIC_TYPEDeprecated.- See Also:
-
HIRAGANA_TYPE
public static final int HIRAGANA_TYPEDeprecated.- See Also:
-
KATAKANA_TYPE
public static final int KATAKANA_TYPEDeprecated.- See Also:
-
HANGUL_TYPE
public static final int HANGUL_TYPEDeprecated.- See Also:
-
EMAIL_TYPE
public static final int EMAIL_TYPEDeprecated.- See Also:
-
URL_TYPE
public static final int URL_TYPEDeprecated.- See Also:
-
-
Constructor Details
-
UAX29URLEmailTokenizerImpl36
Deprecated.Creates a new scanner- Parameters:
in- the java.io.Reader to read input from.
-
-
Method Details
-
yychar
public final int yychar()Deprecated.Description copied from interface:StandardTokenizerInterfaceReturns the current position.- Specified by:
yycharin interfaceStandardTokenizerInterface
-
getText
Deprecated.Fills CharTermAttribute with the current token text.- Specified by:
getTextin interfaceStandardTokenizerInterface
-
yyclose
Deprecated.Closes the input stream.- Throws:
IOException
-
yyreset
Deprecated.Resets the scanner to read from a new input stream. Does not close the old reader. All internal variables are reset, the old input stream cannot be reused (internal buffer is discarded and lost). Lexical state is set to ZZ_INITIAL. Internal scan buffer is resized down to its initial length, if it has grown.- Specified by:
yyresetin interfaceStandardTokenizerInterface- Parameters:
reader- the new input stream
-
yystate
public final int yystate()Deprecated.Returns the current lexical state. -
yybegin
public final void yybegin(int newState) Deprecated.Enters a new lexical state- Parameters:
newState- the new lexical state
-
yytext
Deprecated.Returns the text matched by the current regular expression. -
yycharat
public final char yycharat(int pos) Deprecated.Returns the character at position pos from the matched text. It is equivalent to yytext().charAt(pos), but faster- Parameters:
pos- the position of the character to fetch. A value from 0 to yylength()-1.- Returns:
- the character at position pos
-
yylength
public final int yylength()Deprecated.Returns the length of the matched text region.- Specified by:
yylengthin interfaceStandardTokenizerInterface
-
yypushback
public void yypushback(int number) Deprecated.Pushes the specified amount of characters back into the input stream. They will be read again by then next call of the scanning method- Parameters:
number- the number of characters to be read again. This number must not be greater than yylength()!
-
getNextToken
Deprecated.Resumes scanning until the next regular expression is matched, the end of input is encountered or an I/O-Error occurs.- Specified by:
getNextTokenin interfaceStandardTokenizerInterface- Returns:
- the next token
- Throws:
IOException- if any I/O-Error occurs
-