Package io.trino.util

Class HashCollisionsEstimator

java.lang.Object
io.trino.util.HashCollisionsEstimator

public final class HashCollisionsEstimator extends Object
Estimates number of collisions when inserting values into hash table. The number of hash collisions is estimated based on extrapolated precalculated values. Precalculated values are results of simulation by HashCollisionsSimulator which mimics hash table with linear conflict resolution strategy (e.g: after collision the next position is tried).

Extrapolation of estimates works very well for large hash tables. For instance assume c is the number of collisions when inserting x entries into y sized hash table. Then the number of collisions for inserting x*scale entries into y*scale sized hash table is approximately c*scale (see TestHashCollisionsEstimator#hashEstimatesShouldApproximateSimulations).

It is not possible to deduce closed collisions formula for hash tables with linear conflict resolution strategy. This is because one cannot assume uniform data distribution when inserting new value.

  • Method Details

    • estimateNumberOfHashCollisions

      public static double estimateNumberOfHashCollisions(int numberOfValues, int hashSize)