Class BloomFilter

java.lang.Object
io.trino.orc.metadata.statistics.BloomFilter
All Implemented Interfaces:
StatisticsHasher.Hashable

public class BloomFilter extends Object implements StatisticsHasher.Hashable
BloomFilter is a probabilistic data structure for set membership check. BloomFilters are highly space efficient when compared to using a HashSet. Because of the probabilistic nature of bloom filter false positive (element not present in bloom filter but test() says true) are possible but false negatives are not possible (if element is present then test() will never say false). The false positive probability is configurable (default: 5%) depending on which storage requirement may increase or decrease. Lower the false positive probability greater is the space requirement. Bloom filters are sensitive to number of elements that will be inserted in the bloom filter. During the creation of bloom filter expected number of entries must be specified. If the number of insertions exceed the specified initial number of entries then false positive probability will increase accordingly.

Internally, this implementation of bloom filter uses Murmur3 fast non-cryptographic hash algorithm. Although Murmur2 is slightly faster than Murmur3 in Java, it suffers from hash collisions for specific sequence of repeating bytes. Check the following link for more info https://code.google.com/p/smhasher/wiki/MurmurHash2Flaw

This class was forked from org.apache.orc.util.BloomFilter.

  • Constructor Details

    • BloomFilter

      public BloomFilter(long expectedEntries, double fpp)
    • BloomFilter

      public BloomFilter(long[] bits, int numFuncs)
      A constructor to support rebuilding the BloomFilter from a serialized representation.
      Parameters:
      bits - the serialized bits
      numFuncs - the number of functions used
  • Method Details

    • getRetainedSizeInBytes

      public long getRetainedSizeInBytes()
    • addHash

      public void addHash(StatisticsHasher hasher)
      Specified by:
      addHash in interface StatisticsHasher.Hashable
    • equals

      public boolean equals(Object other)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • add

      public void add(byte[] val)
    • add

      public void add(io.airlift.slice.Slice val)
    • addLong

      public void addLong(long val)
    • addDouble

      public void addDouble(double val)
    • addFloat

      public void addFloat(float val)
    • test

      public boolean test(byte[] val)
    • testSlice

      public boolean testSlice(io.airlift.slice.Slice val)
    • testLong

      public boolean testLong(long val)
    • testDouble

      public boolean testDouble(double val)
    • testFloat

      public boolean testFloat(float val)
    • getNumBits

      public int getNumBits()
    • getNumHashFunctions

      public int getNumHashFunctions()
    • getBitSet

      public long[] getBitSet()
    • toString

      public String toString()
      Overrides:
      toString in class Object