Class HyphenationCompoundWordTokenFilterFactory

java.lang.Object
org.apache.lucene.analysis.util.AbstractAnalysisFactory
org.apache.lucene.analysis.util.TokenFilterFactory
org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilterFactory
All Implemented Interfaces:
ResourceLoaderAware

public class HyphenationCompoundWordTokenFilterFactory extends TokenFilterFactory implements ResourceLoaderAware
Factory for HyphenationCompoundWordTokenFilter.

This factory accepts the following parameters:

  • hyphenator (mandatory): path to the FOP xml hyphenation pattern. See http://offo.sourceforge.net/hyphenation/.
  • encoding (optional): encoding of the xml hyphenation file. defaults to UTF-8.
  • dictionary (optional): dictionary of words. defaults to no dictionary.
  • minWordSize (optional): minimal word length that gets decomposed. defaults to 5.
  • minSubwordSize (optional): minimum length of subwords. defaults to 2.
  • maxSubwordSize (optional): maximum length of subwords. defaults to 15.
  • onlyLongestMatch (optional): if true, adds only the longest matching subword to the stream. defaults to false.

 <fieldType name="text_hyphncomp" class="solr.TextField" positionIncrementGap="100">
   <analyzer>
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
     <filter class="solr.HyphenationCompoundWordTokenFilterFactory" hyphenator="hyphenator.xml" encoding="UTF-8"
         dictionary="dictionary.txt" minWordSize="5" minSubwordSize="2" maxSubwordSize="15" onlyLongestMatch="false"/>
   </analyzer>
 </fieldType>
See Also: