Package org.apache.druid.hll
Class HyperLogLogCollector
- java.lang.Object
-
- org.apache.druid.hll.HyperLogLogCollector
-
- All Implemented Interfaces:
Comparable<HyperLogLogCollector>
- Direct Known Subclasses:
VersionOneHyperLogLogCollector,VersionZeroHyperLogLogCollector
public abstract class HyperLogLogCollector extends Object implements Comparable<HyperLogLogCollector>
Implements the HyperLogLog cardinality estimator described in: http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf Run this code to see a simple indication of expected errors based on different m values:for (int i = 1; i < 20; ++i) { System.out.printf("i[%,d], val[%,d] => error[%f%%]%n", i, 2 << i, 104 / Math.sqrt(2 << i)); }This class is *not* multi-threaded. It can be passed among threads, but it is written with the assumption that only one thread is ever calling methods on it. If you have multiple threads calling methods on this concurrently, I hope you manage to get correct behavior. Note that despite the non-thread-safety of this class, it is actually currently used by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and "get" methods can be called simultaneously by OnheapIncrementalIndex, since its "doAggregate" and "getMetricObjectValue" methods are not synchronized. So, watch out for that.
-
-
Field Summary
Fields Modifier and Type Field Description static intBITS_FOR_BUCKETSstatic doubleCORRECTION_PARAMETERstatic intDENSE_THRESHOLDstatic doubleHIGH_CORRECTION_THRESHOLDstatic doubleLOW_CORRECTION_THRESHOLDstatic intNUM_BUCKETSstatic intNUM_BYTES_FOR_BUCKETS
-
Constructor Summary
Constructors Constructor Description HyperLogLogCollector(ByteBuffer byteBuffer)
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description voidadd(byte[] hashedValue)voidadd(short bucket, byte positionOf1)static doubleapplyCorrection(double e, int zeroCount)intcompareTo(HyperLogLogCollector other)booleanequals(Object o)static doubleestimateByteBuffer(ByteBuffer buf)doubleestimateCardinality()longestimateCardinalityRound()HyperLogLogCollectorfold(ByteBuffer buffer)HyperLogLogCollectorfold(HyperLogLogCollector other)protected intgetInitPosition()static intgetLatestNumBytesForDenseStorage()abstract shortgetMaxOverflowRegister()abstract bytegetMaxOverflowValue()abstract intgetNumBytesForDenseStorage()abstract intgetNumHeaderBytes()abstract shortgetNumNonZeroRegisters()abstract intgetPayloadBytePosition()abstract intgetPayloadBytePosition(ByteBuffer buffer)abstract bytegetRegisterOffset()protected ByteBuffergetStorageBuffer()abstract bytegetVersion()inthashCode()static HyperLogLogCollectormakeCollector(ByteBuffer buffer)Create a wrapper object around an HLL sketch contained within a buffer.static HyperLogLogCollectormakeCollectorSharingStorage(HyperLogLogCollector otherCollector)Creates new collector which shares others collector buffer (by usingByteBuffer.duplicate())static byte[]makeEmptyVersionedByteArray()static HyperLogLogCollectormakeLatestCollector()abstract voidsetMaxOverflowRegister(short register)abstract voidsetMaxOverflowRegister(ByteBuffer buffer, short register)abstract voidsetMaxOverflowValue(byte value)abstract voidsetMaxOverflowValue(ByteBuffer buffer, byte value)abstract voidsetNumNonZeroRegisters(short numNonZeroRegisters)abstract voidsetNumNonZeroRegisters(ByteBuffer buffer, short numNonZeroRegisters)abstract voidsetRegisterOffset(byte registerOffset)abstract voidsetRegisterOffset(ByteBuffer buffer, byte registerOffset)abstract voidsetVersion(ByteBuffer buffer)byte[]toByteArray()ByteBuffertoByteBuffer()StringtoString()
-
-
-
Field Detail
-
DENSE_THRESHOLD
public static final int DENSE_THRESHOLD
- See Also:
- Constant Field Values
-
BITS_FOR_BUCKETS
public static final int BITS_FOR_BUCKETS
- See Also:
- Constant Field Values
-
NUM_BUCKETS
public static final int NUM_BUCKETS
- See Also:
- Constant Field Values
-
NUM_BYTES_FOR_BUCKETS
public static final int NUM_BYTES_FOR_BUCKETS
- See Also:
- Constant Field Values
-
LOW_CORRECTION_THRESHOLD
public static final double LOW_CORRECTION_THRESHOLD
- See Also:
- Constant Field Values
-
HIGH_CORRECTION_THRESHOLD
public static final double HIGH_CORRECTION_THRESHOLD
-
CORRECTION_PARAMETER
public static final double CORRECTION_PARAMETER
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
HyperLogLogCollector
public HyperLogLogCollector(ByteBuffer byteBuffer)
-
-
Method Detail
-
makeLatestCollector
public static HyperLogLogCollector makeLatestCollector()
-
makeCollector
public static HyperLogLogCollector makeCollector(ByteBuffer buffer)
Create a wrapper object around an HLL sketch contained within a buffer. The position and limit of the buffer may be changed; if you do not want this to happen, you can duplicate the buffer before passing it in. The mark and byte order of the buffer will not be modified.- Parameters:
buffer- buffer containing an HLL sketch starting at its position and ending at its limit- Returns:
- HLLC wrapper object
-
makeCollectorSharingStorage
public static HyperLogLogCollector makeCollectorSharingStorage(HyperLogLogCollector otherCollector)
Creates new collector which shares others collector buffer (by usingByteBuffer.duplicate())- Parameters:
otherCollector- collector which buffer will be shared- Returns:
- collector
-
getLatestNumBytesForDenseStorage
public static int getLatestNumBytesForDenseStorage()
-
makeEmptyVersionedByteArray
public static byte[] makeEmptyVersionedByteArray()
-
applyCorrection
public static double applyCorrection(double e, int zeroCount)
-
estimateByteBuffer
public static double estimateByteBuffer(ByteBuffer buf)
-
getVersion
public abstract byte getVersion()
-
setVersion
public abstract void setVersion(ByteBuffer buffer)
-
getRegisterOffset
public abstract byte getRegisterOffset()
-
setRegisterOffset
public abstract void setRegisterOffset(byte registerOffset)
-
setRegisterOffset
public abstract void setRegisterOffset(ByteBuffer buffer, byte registerOffset)
-
getNumNonZeroRegisters
public abstract short getNumNonZeroRegisters()
-
setNumNonZeroRegisters
public abstract void setNumNonZeroRegisters(short numNonZeroRegisters)
-
setNumNonZeroRegisters
public abstract void setNumNonZeroRegisters(ByteBuffer buffer, short numNonZeroRegisters)
-
getMaxOverflowValue
public abstract byte getMaxOverflowValue()
-
setMaxOverflowValue
public abstract void setMaxOverflowValue(byte value)
-
setMaxOverflowValue
public abstract void setMaxOverflowValue(ByteBuffer buffer, byte value)
-
getMaxOverflowRegister
public abstract short getMaxOverflowRegister()
-
setMaxOverflowRegister
public abstract void setMaxOverflowRegister(short register)
-
setMaxOverflowRegister
public abstract void setMaxOverflowRegister(ByteBuffer buffer, short register)
-
getNumHeaderBytes
public abstract int getNumHeaderBytes()
-
getNumBytesForDenseStorage
public abstract int getNumBytesForDenseStorage()
-
getPayloadBytePosition
public abstract int getPayloadBytePosition()
-
getPayloadBytePosition
public abstract int getPayloadBytePosition(ByteBuffer buffer)
-
getInitPosition
protected int getInitPosition()
-
getStorageBuffer
protected ByteBuffer getStorageBuffer()
-
add
public void add(byte[] hashedValue)
-
add
public void add(short bucket, byte positionOf1)
-
fold
public HyperLogLogCollector fold(@Nullable HyperLogLogCollector other)
-
fold
public HyperLogLogCollector fold(ByteBuffer buffer)
-
toByteBuffer
public ByteBuffer toByteBuffer()
-
toByteArray
public byte[] toByteArray()
-
estimateCardinalityRound
public long estimateCardinalityRound()
-
estimateCardinality
public double estimateCardinality()
-
compareTo
public int compareTo(HyperLogLogCollector other)
- Specified by:
compareToin interfaceComparable<HyperLogLogCollector>
-
-