Class IBSimilarity
java.lang.Object
org.apache.lucene.search.similarities.Similarity
org.apache.lucene.search.similarities.SimilarityBase
org.apache.lucene.search.similarities.IBSimilarity
Provides a framework for the family of information-based models, as described
in Stéphane Clinchant and Eric Gaussier. 2010. Information-based
models for ad hoc IR. In Proceeding of the 33rd international ACM SIGIR
conference on Research and development in information retrieval (SIGIR '10).
ACM, New York, NY, USA, 234-241.
The retrieval function is of the form RSV(q, d) = ∑ -xqw log Prob(Xw ≥ tdw | λw), where
- xqw is the query boost;
- Xw is a random variable that counts the occurrences of word w;
- tdw is the normalized term frequency;
- λw is a parameter.
The framework described in the paper has many similarities to the DFR
framework (see DFRSimilarity). It is possible that the two
Similarities will be merged at one point.
To construct an IBSimilarity, you must specify the implementations for all three components of the Information-Based model.
Distribution: Probabilistic distribution used to model term occurrenceDistributionLL: Log-logisticDistributionLL: Smoothed power-law
Lambda: λw parameter of the probability distributionNormalization: Term frequency normalizationAny supported DFR normalization (listed in
DFRSimilarity)
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity
Similarity.SimScorer, Similarity.SimWeight -
Constructor Summary
ConstructorsConstructorDescriptionIBSimilarity(Distribution distribution, Lambda lambda, Normalization normalization) Creates IBSimilarity from the three components. -
Method Summary
Modifier and TypeMethodDescriptionReturns the distributionReturns the distribution's lambda parameterReturns the term frequency normalizationtoString()The name of IB methods follow the patternIB <distribution> <lambda><normalization>.Methods inherited from class org.apache.lucene.search.similarities.SimilarityBase
computeNorm, computeWeight, getDiscountOverlaps, log2, setDiscountOverlaps, simScorerMethods inherited from class org.apache.lucene.search.similarities.Similarity
coord, queryNorm
-
Constructor Details
-
IBSimilarity
Creates IBSimilarity from the three components.Note that
nullvalues are not allowed: if you want no normalization, instead passNormalization.NoNormalization.- Parameters:
distribution- probabilistic distribution modeling term occurrencelambda- distribution's λw parameternormalization- term frequency normalization
-
-
Method Details
-
toString
The name of IB methods follow the patternIB <distribution> <lambda><normalization>. The name of the distribution is the same as in the original paper; for the names of lambda parameters, refer to the javadoc of theLambdaclasses.- Specified by:
toStringin classSimilarityBase
-
getDistribution
Returns the distribution -
getLambda
Returns the distribution's lambda parameter -
getNormalization
Returns the term frequency normalization
-