public final class FrontCodedIndexed extends Object implements Indexed<ByteBuffer>
Indexed specialized for storing variable-width binary values (such as utf8 encoded strings), which must be
sorted and unique, using 'front coding'. Front coding is a type of delta encoding for byte arrays, where sorted
values are grouped into buckets. The first value of the bucket is written entirely, and remaining values are stored
as a pair of an integer which indicates how much of the first byte array of the bucket to use as a prefix, followed
by the remaining bytes after the prefix to complete the value.
front coded indexed layout:
| version | bucket size | has null? | number of values | size of "offsets" + "buckets" | "offsets" | "buckets" |
| ------- | ----------- | --------- | ---------------- | ----------------------------- | --------- | --------- |
| byte | byte | byte | vbyte int | vbyte int | int[] | bucket[] |
"offsets" are the ending offsets of each bucket stored in order, stored as plain integers for easy random access.
bucket layout:
| first value | prefix length | fragment | ... | prefix length | fragment |
| ----------- | ------------- | -------- | --- | ------------- | -------- |
| blob | vbyte int | blob | ... | vbyte int | blob |
blob layout:
| blob length | blob bytes |
| ----------- | ---------- |
| vbyte int | byte[] |
Getting a value first picks the appropriate bucket, finds its offset in the underlying buffer, then scans the bucket
values to seek to the correct position of the value within the bucket in order to reconstruct it using the prefix
length.
Finding the index of a value involves binary searching the first values of each bucket to find the correct bucket,
then a linear scan within the bucket to find the matching value (or negative insertion point -1 for values that
are not present).
The value iterator reads an entire bucket at a time, reconstructing the values into an array to iterate within the
bucket before moving onto the next bucket as the iterator is consumed.
This class is not thread-safe since during operation modifies positions of a shared buffer.| Modifier and Type | Method and Description |
|---|---|
ByteBuffer |
get(int index)
Get the value at specified position
|
int |
indexOf(ByteBuffer value)
Returns the index of "value" in this Indexed object, or a negative number if the value is not present.
|
void |
inspectRuntimeShape(RuntimeShapeInspector inspector)
Implementations of this method should call
inspector.visit() with all fields of this class, which meet two
conditions:
1. |
boolean |
isSorted()
Indicates if this value set is sorted, the implication being that the contract of
Indexed.indexOf(T) is strenthened
to return a negative number equal to (-(insertion point) - 1) when the value is not present in the set. |
Iterator<ByteBuffer> |
iterator() |
static com.google.common.base.Supplier<FrontCodedIndexed> |
read(ByteBuffer buffer,
ByteOrder ordering) |
int |
size()
Number of elements in the value set
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitcheckIndexforEach, spliteratorpublic static com.google.common.base.Supplier<FrontCodedIndexed> read(ByteBuffer buffer, ByteOrder ordering)
public int size()
Indexedsize in interface Indexed<ByteBuffer>@Nullable public ByteBuffer get(int index)
Indexedget in interface Indexed<ByteBuffer>public int indexOf(@Nullable ByteBuffer value)
IndexedIndexed.isSorted() returns true, in
which case it will be a negative number equal to (-(insertion point) - 1), in the manner of Arrays.binarySearch.indexOf in interface Indexed<ByteBuffer>value - value to search forIndexed.isSorted())public boolean isSorted()
IndexedIndexed.indexOf(T) is strenthened
to return a negative number equal to (-(insertion point) - 1) when the value is not present in the set.isSorted in interface Indexed<ByteBuffer>public Iterator<ByteBuffer> iterator()
iterator in interface Iterable<ByteBuffer>public void inspectRuntimeShape(RuntimeShapeInspector inspector)
HotLoopCalleeinspector.visit() with all fields of this class, which meet two
conditions:
1. They are used in methods of this class, annotated with CalledFromHotLoop
2. They are either:
a. Nullable objects
b. Instances of HotLoopCallee
c. Objects, which don't always have a specific class in runtime. For example, a field of type Set could be HashSet or TreeSet in runtime, depending on how
this instance (the instance on which inspectRuntimeShape() is called) is configured.
d. ByteBuffer or similar objects, where byte order matters
e. boolean flags, affecting branch taking
f. Arrays of objects, meeting any of conditions a-e.inspectRuntimeShape in interface HotLoopCalleeCopyright © 2011–2022 The Apache Software Foundation. All rights reserved.