public abstract class SimilarityBase extends Similarity
Similarity that provides a simplified API for its
descendants. Subclasses are only required to implement the score(org.apache.lucene.search.similarities.BasicStats, float, float)
and toString() methods. Implementing
explain(Explanation, BasicStats, int, float, float) is optional,
inasmuch as SimilarityBase already provides a basic explanation of the score
and the term frequency. However, implementers of a subclass are encouraged to
include as much detail about the scoring method as possible.
Note: multi-word queries such as phrase queries are scored in a different way than Lucene's default ranking algorithm: whereas it "fakes" an IDF value for the phrase as a whole (since it does not know it), this class instead scores phrases as a summation of the individual term scores.
Similarity.SimScorer, Similarity.SimWeight| Constructor and Description |
|---|
SimilarityBase()
Sole constructor.
|
| Modifier and Type | Method and Description |
|---|---|
long |
computeNorm(FieldInvertState state)
Encodes the document length in the same way as
TFIDFSimilarity. |
Similarity.SimWeight |
computeWeight(float queryBoost,
CollectionStatistics collectionStats,
TermStatistics... termStats)
Compute any collection-level weight (e.g.
|
boolean |
getDiscountOverlaps()
Returns true if overlap tokens are discounted from the document's length.
|
static double |
log2(double x)
Returns the base two logarithm of
x. |
void |
setDiscountOverlaps(boolean v)
Determines whether overlap tokens (Tokens with
0 position increment) are ignored when computing
norm.
|
Similarity.SimScorer |
simScorer(Similarity.SimWeight stats,
AtomicReaderContext context)
Creates a new
Similarity.SimScorer to score matching documents from a segment of the inverted index. |
abstract String |
toString()
Subclasses must override this method to return the name of the Similarity
and preferably the values of parameters (if any) as well.
|
coord, queryNormpublic SimilarityBase()
public void setDiscountOverlaps(boolean v)
public boolean getDiscountOverlaps()
setDiscountOverlaps(boolean)public final Similarity.SimWeight computeWeight(float queryBoost, CollectionStatistics collectionStats, TermStatistics... termStats)
SimilaritycomputeWeight in class SimilarityqueryBoost - the query-time boost.collectionStats - collection-level statistics, such as the number of tokens in the collection.termStats - term-level statistics, such as the document frequency of a term across the collection.public Similarity.SimScorer simScorer(Similarity.SimWeight stats, AtomicReaderContext context) throws IOException
SimilaritySimilarity.SimScorer to score matching documents from a segment of the inverted index.simScorer in class Similaritystats - collection information from Similarity.computeWeight(float, CollectionStatistics, TermStatistics...)context - segment of the inverted index to be scored.contextIOException - if there is a low-level I/O errorpublic abstract String toString()
public long computeNorm(FieldInvertState state)
TFIDFSimilarity.computeNorm in class Similaritystate - current processing state for this fieldpublic static double log2(double x)
x.Copyright © 2010 - 2020 Adobe. All Rights Reserved