Class FrontCodedIndexed
- java.lang.Object
-
- org.apache.druid.segment.data.FrontCodedIndexed
-
- All Implemented Interfaces:
Iterable<ByteBuffer>,HotLoopCallee,Indexed<ByteBuffer>
- Direct Known Subclasses:
FrontCodedIndexed.FrontCodedV0,FrontCodedIndexed.FrontCodedV1
public abstract class FrontCodedIndexed extends Object implements Indexed<ByteBuffer>
Indexedspecialized for storing variable-width binary values (such as utf8 encoded strings), which must be sorted and unique, using 'front coding'. Front coding is a type of delta encoding for byte arrays, where sorted values are grouped into buckets. The first value of the bucket is written entirely, and remaining values are stored as a pair of an integer which indicates how much of the first byte array of the bucket to use as a prefix, followed by the remaining bytes after the prefix to complete the value. If using 'incremental' buckets, instead of using the prefix of the first bucket value, instead the prefix is computed against the immediately preceding value in the bucket.front coded indexed layout: | version | bucket size | has null? | number of values | size of "offsets" + "buckets" | "offsets" | "buckets" | | ------- | ----------- | --------- | ---------------- | ----------------------------- | --------- | --------- | | byte | byte | byte | vbyte int | vbyte int | int[] | bucket[] |
"offsets" are the ending offsets of each bucket stored in order, stored as plain integers for easy random access.
bucket layout: | first value | prefix length | fragment | ... | prefix length | fragment | | ----------- | ------------- | -------- | --- | ------------- | -------- | | blob | vbyte int | blob | ... | vbyte int | blob |
blob layout: | blob length | blob bytes | | ----------- | ---------- | | vbyte int | byte[] |
Getting a value first picks the appropriate bucket, finds its offset in the underlying buffer, then scans the bucket values to seek to the correct position of the value within the bucket in order to reconstruct it using the prefix length.
Finding the index of a value involves binary searching the first values of each bucket to find the correct bucket, then a linear scan within the bucket to find the matching value (or negative insertion point -1 for values that are not present).
The value iterator reads an entire bucket at a time, reconstructing the values into an array to iterate within the bucket before moving onto the next bucket as the iterator is consumed.
This class is not thread-safe since during operation modifies positions of a shared buffer.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classFrontCodedIndexed.FrontCodedV0static classFrontCodedIndexed.FrontCodedV1
-
Field Summary
Fields Modifier and Type Field Description protected intadjustedNumValuesprotected intadjustIndexprotected intbucketSizeprotected intbucketsPositionprotected ByteBufferbufferstatic intDEFAULT_BUCKET_SIZEstatic byteDEFAULT_VERSIONprotected intdivprotected booleanhasNullprotected intlastBucketNumValuesprotected intnumBucketsprotected intoffsetsPositionprotected intremstatic byteV0static byteV1
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description ByteBufferget(int index)Get the value at specified positionintindexOf(ByteBuffer value)Returns the index of "value" in this Indexed object, or a negative number if the value is not present.voidinspectRuntimeShape(RuntimeShapeInspector inspector)Implementations of this method should callinspector.visit()with all fields of this class, which meet two conditions: 1.booleanisSorted()Indicates if this value set is sorted, the implication being that the contract ofIndexed.indexOf(T)is strenthened to return a negative number equal to (-(insertion point) - 1) when the value is not present in the set.Iterator<ByteBuffer>iterator()static com.google.common.base.Supplier<FrontCodedIndexed>read(ByteBuffer buffer, ByteOrder ordering)intsize()Number of elements in the value setstatic bytevalidateVersion(byte version)-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
-
-
-
Field Detail
-
V0
public static final byte V0
- See Also:
- Constant Field Values
-
V1
public static final byte V1
- See Also:
- Constant Field Values
-
DEFAULT_VERSION
public static final byte DEFAULT_VERSION
- See Also:
- Constant Field Values
-
DEFAULT_BUCKET_SIZE
public static final int DEFAULT_BUCKET_SIZE
- See Also:
- Constant Field Values
-
buffer
protected final ByteBuffer buffer
-
adjustedNumValues
protected final int adjustedNumValues
-
adjustIndex
protected final int adjustIndex
-
bucketSize
protected final int bucketSize
-
numBuckets
protected final int numBuckets
-
div
protected final int div
-
rem
protected final int rem
-
offsetsPosition
protected final int offsetsPosition
-
bucketsPosition
protected final int bucketsPosition
-
hasNull
protected final boolean hasNull
-
lastBucketNumValues
protected final int lastBucketNumValues
-
-
Method Detail
-
validateVersion
public static byte validateVersion(byte version)
-
read
public static com.google.common.base.Supplier<FrontCodedIndexed> read(ByteBuffer buffer, ByteOrder ordering)
-
size
public int size()
Description copied from interface:IndexedNumber of elements in the value set- Specified by:
sizein interfaceIndexed<ByteBuffer>
-
get
@Nullable public ByteBuffer get(int index)
Description copied from interface:IndexedGet the value at specified position- Specified by:
getin interfaceIndexed<ByteBuffer>
-
indexOf
public int indexOf(@Nullable ByteBuffer value)
Description copied from interface:IndexedReturns the index of "value" in this Indexed object, or a negative number if the value is not present. The negative number is not guaranteed to be any particular number unlessIndexed.isSorted()returns true, in which case it will be a negative number equal to (-(insertion point) - 1), in the manner of Arrays.binarySearch.- Specified by:
indexOfin interfaceIndexed<ByteBuffer>- Parameters:
value- value to search for- Returns:
- index of value, or a negative number (equal to (-(insertion point) - 1) if
Indexed.isSorted())
-
isSorted
public boolean isSorted()
Description copied from interface:IndexedIndicates if this value set is sorted, the implication being that the contract ofIndexed.indexOf(T)is strenthened to return a negative number equal to (-(insertion point) - 1) when the value is not present in the set.- Specified by:
isSortedin interfaceIndexed<ByteBuffer>
-
iterator
public Iterator<ByteBuffer> iterator()
- Specified by:
iteratorin interfaceIterable<ByteBuffer>
-
inspectRuntimeShape
public void inspectRuntimeShape(RuntimeShapeInspector inspector)
Description copied from interface:HotLoopCalleeImplementations of this method should callinspector.visit()with all fields of this class, which meet two conditions: 1. They are used in methods of this class, annotated withCalledFromHotLoop2. They are either: a. Nullable objects b. Instances of HotLoopCallee c. Objects, which don't always have a specific class in runtime. For example, a field of typeSetcould beHashSetorTreeSetin runtime, depending on how this instance (the instance on which inspectRuntimeShape() is called) is configured. d. ByteBuffer or similar objects, where byte order matters e. boolean flags, affecting branch taking f. Arrays of objects, meeting any of conditions a-e.- Specified by:
inspectRuntimeShapein interfaceHotLoopCallee
-
-