Class HyphenationTree
java.lang.Object
org.apache.lucene.analysis.compound.hyphenation.TernaryTree
org.apache.lucene.analysis.compound.hyphenation.HyphenationTree
- All Implemented Interfaces:
Cloneable,PatternConsumer
This tree structure stores the hyphenation patterns in an efficient way for
fast lookup. It provides the provides the method to hyphenate a word.
This class has been taken from the Apache FOP project (http://xmlgraphics.apache.org/fop/). They have been slightly modified.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.analysis.compound.hyphenation.TernaryTree
TernaryTree.Iterator -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidAdd a character class to the tree.voidaddException(String word, ArrayList<Object> hyphenatedword) Add an exception to the tree.voidaddPattern(String pattern, String ivalue) Add a pattern to the tree.findPattern(String pat) hyphenate(char[] w, int offset, int len, int remainCharCount, int pushCharCount) Hyphenate word and return an array of hyphenation points.Hyphenate word and return a Hyphenation object.voidloadPatterns(File f) Read hyphenation patterns from an XML file.voidloadPatterns(InputSource source) Read hyphenation patterns from an XML file.voidprintStats(PrintStream out)
-
Constructor Details
-
HyphenationTree
public HyphenationTree()
-
-
Method Details
-
loadPatterns
Read hyphenation patterns from an XML file.- Parameters:
f- the filename- Throws:
IOException- In case the parsing fails
-
loadPatterns
Read hyphenation patterns from an XML file.- Parameters:
source- the InputSource for the file- Throws:
IOException- In case the parsing fails
-
findPattern
-
hyphenate
Hyphenate word and return a Hyphenation object.- Parameters:
word- the word to be hyphenatedremainCharCount- Minimum number of characters allowed before the hyphenation point.pushCharCount- Minimum number of characters allowed after the hyphenation point.- Returns:
- a
Hyphenationobject representing the hyphenated word or null if word is not hyphenated.
-
hyphenate
Hyphenate word and return an array of hyphenation points.- Parameters:
w- char array that contains the wordoffset- Offset to first character in wordlen- Length of wordremainCharCount- Minimum number of characters allowed before the hyphenation point.pushCharCount- Minimum number of characters allowed after the hyphenation point.- Returns:
- a
Hyphenationobject representing the hyphenated word or null if word is not hyphenated.
-
addClass
Add a character class to the tree. It is used byPatternParseras callback to add character classes. Character classes define the valid word characters for hyphenation. If a word contains a character not defined in any of the classes, it is not hyphenated. It also defines a way to normalize the characters in order to compare them with the stored patterns. Usually pattern files use only lower case characters, in this case a class for letter 'a', for example, should be defined as "aA", the first character being the normalization char.- Specified by:
addClassin interfacePatternConsumer- Parameters:
chargroup- character group
-
addException
Add an exception to the tree. It is used byPatternParserclass as callback to store the hyphenation exceptions.- Specified by:
addExceptionin interfacePatternConsumer- Parameters:
word- normalized wordhyphenatedword- a vector of alternating strings andhyphenobjects.
-
addPattern
Add a pattern to the tree. Mainly, to be used byPatternParserclass as callback to add a pattern to the tree.- Specified by:
addPatternin interfacePatternConsumer- Parameters:
pattern- the hyphenation patternivalue- interletter weight values indicating the desirability and priority of hyphenating at a given point within the pattern. It should contain only digit characters. (i.e. '0' to '9').
-
printStats
- Overrides:
printStatsin classTernaryTree
-