Class CustomUnifiedHighlighter
- java.lang.Object
-
- org.apache.lucene.search.uhighlight.UnifiedHighlighter
-
- org.apache.lucene.search.uhighlight.CustomUnifiedHighlighter
-
public class CustomUnifiedHighlighter extends UnifiedHighlighter
Subclass of theUnifiedHighlighterthat works for a single field in a single document. Uses a customPassageFormatter. Accepts field content as a constructor argument, given that loadings field value can be done reading from _source field. Supports using differentBreakIteratorto break the text into fragments. Considers every distinct field value as a discrete passage for highlighting (unless the whole content needs to be highlighted). Supports both returning empty snippets and non highlighted snippets when no highlighting can be performed.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.search.uhighlight.UnifiedHighlighter
UnifiedHighlighter.HighlightFlag, UnifiedHighlighter.LimitedStoredFieldVisitor, UnifiedHighlighter.OffsetSource
-
-
Field Summary
Fields Modifier and Type Field Description static charMULTIVAL_SEP_CHAR-
Fields inherited from class org.apache.lucene.search.uhighlight.UnifiedHighlighter
DEFAULT_CACHE_CHARS_THRESHOLD, DEFAULT_MAX_LENGTH, fieldInfos, indexAnalyzer, searcher, ZERO_LEN_AUTOMATA_ARRAY
-
-
Constructor Summary
Constructors Constructor Description CustomUnifiedHighlighter(IndexSearcher searcher, Analyzer analyzer, UnifiedHighlighter.OffsetSource offsetSource, PassageFormatter passageFormatter, Locale breakIteratorLocale, BreakIterator breakIterator, String fieldValue, int noMatchSize)Creates a new instance ofCustomUnifiedHighlighter
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected BreakIteratorgetBreakIterator(String field)Returns theBreakIteratorto use for dividing text into passages.protected FieldHighlightergetFieldHighlighter(String field, Query query, Set<Term> allTerms, int maxPassages)protected PassageFormattergetFormatter(String field)Returns thePassageFormatterto use for formatting passages into highlighted snippets.protected UnifiedHighlighter.OffsetSourcegetOffsetSource(String field)Forces the offset source for this highlighterSnippet[]highlightField(String field, Query query, int docId, int maxPassages)Highlights terms extracted from the provided query within the content of the provided field nameprotected List<CharSequence[]>loadFieldValues(String[] fields, DocIdSetIterator docIter, int cacheCharsThreshold)Loads the String values for each docId by field to be highlighted.protected Collection<Query>preSpanQueryRewrite(Query query)When highlighting phrases accurately, we may need to handle custom queries that aren't supported in theWeightedSpanTermExtractoras called by thePhraseHelper.-
Methods inherited from class org.apache.lucene.search.uhighlight.UnifiedHighlighter
extractTerms, filterExtractedTerms, getAutomata, getCacheFieldValCharsThreshold, getFieldInfo, getFieldMatcher, getFlags, getHighlightComponents, getIndexAnalyzer, getIndexSearcher, getMaxLength, getMaxNoHighlightPassages, getOffsetStrategy, getOptimizedOffsetSource, getPhraseHelper, getScorer, hasUnrecognizedQuery, highlight, highlight, highlightFields, highlightFields, highlightFields, highlightFieldsAsObjects, highlightWithoutSearcher, newLimitedStoredFieldsVisitor, requiresRewrite, setBreakIterator, setCacheFieldValCharsThreshold, setFieldMatcher, setFormatter, setHandleMultiTermQuery, setHighlightPhrasesStrictly, setMaxLength, setMaxNoHighlightPassages, setScorer, shouldHandleMultiTermQuery, shouldHighlightPhrasesStrictly, shouldPreferPassageRelevancyOverSpeed
-
-
-
-
Field Detail
-
MULTIVAL_SEP_CHAR
public static final char MULTIVAL_SEP_CHAR
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
CustomUnifiedHighlighter
public CustomUnifiedHighlighter(IndexSearcher searcher, Analyzer analyzer, UnifiedHighlighter.OffsetSource offsetSource, PassageFormatter passageFormatter, @Nullable Locale breakIteratorLocale, @Nullable BreakIterator breakIterator, String fieldValue, int noMatchSize)
Creates a new instance ofCustomUnifiedHighlighter- Parameters:
analyzer- the analyzer used for the field at index time, used for multi term queries internally.passageFormatter- our ownCustomPassageFormatterwhich generates snippets in forms ofSnippetobjects.offsetSource- theUnifiedHighlighter.OffsetSourceto used for offsets retrieval.breakIteratorLocale- theLocaleto use for dividing text into passages. If nullLocale.ROOTis used.breakIterator- theBreakIteratorto use for dividing text into passages. If nullBreakIterator.getSentenceInstance(Locale)is used.fieldValue- the original field values delimited by MULTIVAL_SEP_CHAR.noMatchSize- The size of the text that should be returned when no highlighting can be performed.
-
-
Method Detail
-
highlightField
public Snippet[] highlightField(String field, Query query, int docId, int maxPassages) throws IOException
Highlights terms extracted from the provided query within the content of the provided field name- Throws:
IOException
-
loadFieldValues
protected List<CharSequence[]> loadFieldValues(String[] fields, DocIdSetIterator docIter, int cacheCharsThreshold) throws IOException
Description copied from class:UnifiedHighlighterLoads the String values for each docId by field to be highlighted. By default this loads from stored fields by the same name as given, but a subclass can change the source. The returned Strings must be identical to what was indexed (at least for postings or term-vectors offset sources). This method must load fields for at least one document from the givenDocIdSetIteratorbut need not return all of them; by default the character lengths are summed and this method will return early whencacheCharsThresholdis exceeded. Specifically if that number is 0, then only one document is fetched no matter what. Values in the array ofCharSequencewill be null if no value was found.- Overrides:
loadFieldValuesin classUnifiedHighlighter- Throws:
IOException
-
getBreakIterator
protected BreakIterator getBreakIterator(String field)
Description copied from class:UnifiedHighlighterReturns theBreakIteratorto use for dividing text into passages. This returnsBreakIterator.getSentenceInstance(Locale)by default; subclasses can override to customize.Note: this highlighter will call
BreakIterator.preceding(int)andBreakIterator.next()many times on it. The default generic JDK implementation ofprecedingperforms poorly.- Overrides:
getBreakIteratorin classUnifiedHighlighter
-
getFormatter
protected PassageFormatter getFormatter(String field)
Description copied from class:UnifiedHighlighterReturns thePassageFormatterto use for formatting passages into highlighted snippets. This returns a newPassageFormatterby default; subclasses can override to customize.- Overrides:
getFormatterin classUnifiedHighlighter
-
getFieldHighlighter
protected FieldHighlighter getFieldHighlighter(String field, Query query, Set<Term> allTerms, int maxPassages)
- Overrides:
getFieldHighlighterin classUnifiedHighlighter
-
preSpanQueryRewrite
protected Collection<Query> preSpanQueryRewrite(Query query)
Description copied from class:UnifiedHighlighterWhen highlighting phrases accurately, we may need to handle custom queries that aren't supported in theWeightedSpanTermExtractoras called by thePhraseHelper. Should custom query types be needed, this method should be overriden to return a collection of queries if appropriate, or null if nothing to do. If the query is not custom, simply returning null will allow the default rules to apply.- Overrides:
preSpanQueryRewritein classUnifiedHighlighter- Parameters:
query- Query to be highlighted- Returns:
- A Collection of Query object(s) if needs to be rewritten, otherwise null.
-
getOffsetSource
protected UnifiedHighlighter.OffsetSource getOffsetSource(String field)
Forces the offset source for this highlighter- Overrides:
getOffsetSourcein classUnifiedHighlighter
-
-