Class BloomFilter
java.lang.Object
io.trino.orc.metadata.statistics.BloomFilter
- All Implemented Interfaces:
StatisticsHasher.Hashable
BloomFilter is a probabilistic data structure for set membership check. BloomFilters are
highly space efficient when compared to using a HashSet. Because of the probabilistic nature of
bloom filter false positive (element not present in bloom filter but test() says true) are
possible but false negatives are not possible (if element is present then test() will never
say false). The false positive probability is configurable (default: 5%) depending on which
storage requirement may increase or decrease. Lower the false positive probability greater
is the space requirement.
Bloom filters are sensitive to number of elements that will be inserted in the bloom filter.
During the creation of bloom filter expected number of entries must be specified. If the number
of insertions exceed the specified initial number of entries then false positive probability will
increase accordingly.
Internally, this implementation of bloom filter uses Murmur3 fast non-cryptographic hash algorithm. Although Murmur2 is slightly faster than Murmur3 in Java, it suffers from hash collisions for specific sequence of repeating bytes. Check the following link for more info https://code.google.com/p/smhasher/wiki/MurmurHash2Flaw
This class was forked from org.apache.orc.util.BloomFilter.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classBare metal bit set implementation.static final classThis class was forked fromorg.apache.orc.util.Murmur3. -
Constructor Summary
ConstructorsConstructorDescriptionBloomFilter(long[] bits, int numFuncs) A constructor to support rebuilding the BloomFilter from a serialized representation.BloomFilter(long expectedEntries, double fpp) -
Method Summary
Modifier and TypeMethodDescriptionvoidadd(byte[] val) voidadd(io.airlift.slice.Slice val) voidaddDouble(double val) voidaddFloat(float val) voidaddHash(StatisticsHasher hasher) voidaddLong(long val) booleanlong[]intintlonginthashCode()booleantest(byte[] val) booleantestDouble(double val) booleantestFloat(float val) booleantestLong(long val) booleantestSlice(io.airlift.slice.Slice val) toString()
-
Constructor Details
-
BloomFilter
public BloomFilter(long expectedEntries, double fpp) -
BloomFilter
public BloomFilter(long[] bits, int numFuncs) A constructor to support rebuilding the BloomFilter from a serialized representation.- Parameters:
bits- the serialized bitsnumFuncs- the number of functions used
-
-
Method Details
-
getRetainedSizeInBytes
public long getRetainedSizeInBytes() -
addHash
- Specified by:
addHashin interfaceStatisticsHasher.Hashable
-
equals
-
hashCode
-
add
public void add(byte[] val) -
add
public void add(io.airlift.slice.Slice val) -
addLong
public void addLong(long val) -
addDouble
public void addDouble(double val) -
addFloat
public void addFloat(float val) -
test
public boolean test(byte[] val) -
testSlice
public boolean testSlice(io.airlift.slice.Slice val) -
testLong
public boolean testLong(long val) -
testDouble
public boolean testDouble(double val) -
testFloat
public boolean testFloat(float val) -
getNumBits
public int getNumBits() -
getNumHashFunctions
public int getNumHashFunctions() -
getBitSet
public long[] getBitSet() -
toString
-