public class BloomKFilter extends Object
BloomFilter. Unlike BloomFilter, BloomKFilter will spread
'k' hash bits within same cache line for better L1 cache performance. The way it works is,
First hash code is computed from key which is used to locate the block offset (n-longs in bitset constitute a block)
Subsequent 'k' hash codes are used to spread hash bits within the block. By default block size is chosen as 8,
which is to match cache line size (8 longs = 64 bytes = cache line size).
Refer addBytes(byte[]) for more info.
This implementation has much lesser L1 data cache misses than BloomFilter.| Modifier and Type | Class and Description |
|---|---|
static class |
BloomKFilter.BitSet
Bare metal bit set implementation.
|
| Modifier and Type | Field and Description |
|---|---|
static float |
DEFAULT_FPP |
static int |
START_OF_SERIALIZED_LONGS |
| Constructor and Description |
|---|
BloomKFilter(long maxNumEntries) |
BloomKFilter(long[] bits,
int numFuncs)
A constructor to support rebuilding the BloomFilter from a serialized representation.
|
| Modifier and Type | Method and Description |
|---|---|
void |
add(byte[] val) |
void |
addByte(byte val) |
void |
addBytes(byte[] val) |
void |
addBytes(byte[] val,
int offset,
int length) |
void |
addDouble(double val) |
void |
addFloat(float val) |
void |
addInt(int val) |
void |
addLong(long val) |
void |
addString(String val) |
static BloomKFilter |
deserialize(InputStream in)
Deserialize a bloom filter
Read a byte stream, which was written by serialize(OutputStream, BloomKFilter)
into a
BloomKFilter |
long[] |
getBitSet() |
int |
getBitSize() |
int |
getNumBits() |
int |
getNumHashFunctions() |
void |
merge(BloomKFilter that)
Merge the specified bloom filter with current bloom filter.
|
static void |
mergeBloomFilterBytes(byte[] bf1Bytes,
int bf1Start,
int bf1Length,
byte[] bf2Bytes,
int bf2Start,
int bf2Length)
Merges BloomKFilter bf2 into bf1.
|
void |
reset() |
static void |
serialize(OutputStream out,
BloomKFilter bloomFilter)
Serialize a bloom filter
|
long |
sizeInBytes() |
boolean |
test(byte[] val) |
boolean |
testByte(byte val) |
boolean |
testBytes(byte[] val) |
boolean |
testBytes(byte[] val,
int offset,
int length) |
boolean |
testDouble(double val) |
boolean |
testFloat(float val) |
boolean |
testInt(int val) |
boolean |
testLong(long val) |
boolean |
testString(String val) |
String |
toString() |
public static final float DEFAULT_FPP
public static final int START_OF_SERIALIZED_LONGS
public BloomKFilter(long maxNumEntries)
public BloomKFilter(long[] bits,
int numFuncs)
bits - numFuncs - public void add(byte[] val)
public void addBytes(byte[] val,
int offset,
int length)
public void addBytes(byte[] val)
public void addString(String val)
public void addByte(byte val)
public void addInt(int val)
public void addLong(long val)
public void addFloat(float val)
public void addDouble(double val)
public boolean test(byte[] val)
public boolean testBytes(byte[] val)
public boolean testBytes(byte[] val,
int offset,
int length)
public boolean testString(String val)
public boolean testByte(byte val)
public boolean testInt(int val)
public boolean testLong(long val)
public boolean testFloat(float val)
public boolean testDouble(double val)
public long sizeInBytes()
public int getBitSize()
public int getNumHashFunctions()
public int getNumBits()
public long[] getBitSet()
public void merge(BloomKFilter that)
that - - bloom filter to mergepublic void reset()
public static void serialize(OutputStream out, BloomKFilter bloomFilter) throws IOException
out - output stream to write tobloomFilter - BloomKFilter that needs to be seralizedIOExceptionpublic static BloomKFilter deserialize(InputStream in) throws IOException
BloomKFilterin - input bytestreamIOExceptionpublic static void mergeBloomFilterBytes(byte[] bf1Bytes,
int bf1Start,
int bf1Length,
byte[] bf2Bytes,
int bf2Start,
int bf2Length)
bf1Bytes - bf1Start - bf1Length - bf2Bytes - bf2Start - bf2Length - Copyright © 2020 The Apache Software Foundation. All rights reserved.