Class SimilarityBase

java.lang.Object
org.apache.lucene.search.similarities.Similarity
org.apache.lucene.search.similarities.SimilarityBase
Direct Known Subclasses:
DFRSimilarity, IBSimilarity, LMSimilarity

public abstract class SimilarityBase extends Similarity
A subclass of Similarity that provides a simplified API for its descendants. Subclasses are only required to implement the score(org.apache.lucene.search.similarities.BasicStats, float, float) and toString() methods. Implementing explain(Explanation, BasicStats, int, float, float) is optional, inasmuch as SimilarityBase already provides a basic explanation of the score and the term frequency. However, implementers of a subclass are encouraged to include as much detail about the scoring method as possible.

Note: multi-word queries such as phrase queries are scored in a different way than Lucene's default ranking algorithm: whereas it "fakes" an IDF value for the phrase as a whole (since it does not know it), this class instead scores phrases as a summation of the individual term scores.

  • Constructor Details

    • SimilarityBase

      public SimilarityBase()
      Sole constructor. (For invocation by subclass constructors, typically implicit.)
  • Method Details

    • setDiscountOverlaps

      public void setDiscountOverlaps(boolean v)
      Determines whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. By default this is true, meaning overlap tokens do not count when computing norms.
      See Also:
    • getDiscountOverlaps

      public boolean getDiscountOverlaps()
      Returns true if overlap tokens are discounted from the document's length.
      See Also:
    • computeWeight

      public final Similarity.SimWeight computeWeight(float queryBoost, CollectionStatistics collectionStats, TermStatistics... termStats)
      Description copied from class: Similarity
      Compute any collection-level weight (e.g. IDF, average document length, etc) needed for scoring a query.
      Specified by:
      computeWeight in class Similarity
      Parameters:
      queryBoost - the query-time boost.
      collectionStats - collection-level statistics, such as the number of tokens in the collection.
      termStats - term-level statistics, such as the document frequency of a term across the collection.
      Returns:
      SimWeight object with the information this Similarity needs to score a query.
    • simScorer

      public Similarity.SimScorer simScorer(Similarity.SimWeight stats, AtomicReaderContext context) throws IOException
      Description copied from class: Similarity
      Creates a new Similarity.SimScorer to score matching documents from a segment of the inverted index.
      Specified by:
      simScorer in class Similarity
      Parameters:
      stats - collection information from Similarity.computeWeight(float, CollectionStatistics, TermStatistics...)
      context - segment of the inverted index to be scored.
      Returns:
      SloppySimScorer for scoring documents across context
      Throws:
      IOException - if there is a low-level I/O error
    • toString

      public abstract String toString()
      Subclasses must override this method to return the name of the Similarity and preferably the values of parameters (if any) as well.
      Overrides:
      toString in class Object
    • computeNorm

      public long computeNorm(FieldInvertState state)
      Encodes the document length in the same way as TFIDFSimilarity.
      Specified by:
      computeNorm in class Similarity
      Parameters:
      state - current processing state for this field
      Returns:
      computed norm value
    • log2

      public static double log2(double x)
      Returns the base two logarithm of x.