Class FrontCodedIndexed

  • All Implemented Interfaces:
    Iterable<ByteBuffer>, HotLoopCallee, Indexed<ByteBuffer>
    Direct Known Subclasses:
    FrontCodedIndexed.FrontCodedV0, FrontCodedIndexed.FrontCodedV1

    public abstract class FrontCodedIndexed
    extends Object
    implements Indexed<ByteBuffer>
    Indexed specialized for storing variable-width binary values (such as utf8 encoded strings), which must be sorted and unique, using 'front coding'. Front coding is a type of delta encoding for byte arrays, where sorted values are grouped into buckets. The first value of the bucket is written entirely, and remaining values are stored as a pair of an integer which indicates how much of the first byte array of the bucket to use as a prefix, followed by the remaining bytes after the prefix to complete the value. If using 'incremental' buckets, instead of using the prefix of the first bucket value, instead the prefix is computed against the immediately preceding value in the bucket.

    front coded indexed layout: | version | bucket size | has null? | number of values | size of "offsets" + "buckets" | "offsets" | "buckets" | | ------- | ----------- | --------- | ---------------- | ----------------------------- | --------- | --------- | | byte | byte | byte | vbyte int | vbyte int | int[] | bucket[] |

    "offsets" are the ending offsets of each bucket stored in order, stored as plain integers for easy random access.

    bucket layout: | first value | prefix length | fragment | ... | prefix length | fragment | | ----------- | ------------- | -------- | --- | ------------- | -------- | | blob | vbyte int | blob | ... | vbyte int | blob |

    blob layout: | blob length | blob bytes | | ----------- | ---------- | | vbyte int | byte[] |

    Getting a value first picks the appropriate bucket, finds its offset in the underlying buffer, then scans the bucket values to seek to the correct position of the value within the bucket in order to reconstruct it using the prefix length.

    Finding the index of a value involves binary searching the first values of each bucket to find the correct bucket, then a linear scan within the bucket to find the matching value (or negative insertion point -1 for values that are not present).

    The value iterator reads an entire bucket at a time, reconstructing the values into an array to iterate within the bucket before moving onto the next bucket as the iterator is consumed.

    This class is not thread-safe since during operation modifies positions of a shared buffer.

    • Field Detail

      • adjustedNumValues

        protected final int adjustedNumValues
      • adjustIndex

        protected final int adjustIndex
      • bucketSize

        protected final int bucketSize
      • numBuckets

        protected final int numBuckets
      • div

        protected final int div
      • rem

        protected final int rem
      • offsetsPosition

        protected final int offsetsPosition
      • bucketsPosition

        protected final int bucketsPosition
      • hasNull

        protected final boolean hasNull
      • lastBucketNumValues

        protected final int lastBucketNumValues
    • Method Detail

      • validateVersion

        public static byte validateVersion​(byte version)
      • size

        public int size()
        Description copied from interface: Indexed
        Number of elements in the value set
        Specified by:
        size in interface Indexed<ByteBuffer>
      • indexOf

        public int indexOf​(@Nullable
                           ByteBuffer value)
        Description copied from interface: Indexed
        Returns the index of "value" in this Indexed object, or a negative number if the value is not present. The negative number is not guaranteed to be any particular number unless Indexed.isSorted() returns true, in which case it will be a negative number equal to (-(insertion point) - 1), in the manner of Arrays.binarySearch.
        Specified by:
        indexOf in interface Indexed<ByteBuffer>
        Parameters:
        value - value to search for
        Returns:
        index of value, or a negative number (equal to (-(insertion point) - 1) if Indexed.isSorted())
      • isSorted

        public boolean isSorted()
        Description copied from interface: Indexed
        Indicates if this value set is sorted, the implication being that the contract of Indexed.indexOf(T) is strenthened to return a negative number equal to (-(insertion point) - 1) when the value is not present in the set.
        Specified by:
        isSorted in interface Indexed<ByteBuffer>
      • inspectRuntimeShape

        public void inspectRuntimeShape​(RuntimeShapeInspector inspector)
        Description copied from interface: HotLoopCallee
        Implementations of this method should call inspector.visit() with all fields of this class, which meet two conditions: 1. They are used in methods of this class, annotated with CalledFromHotLoop 2. They are either: a. Nullable objects b. Instances of HotLoopCallee c. Objects, which don't always have a specific class in runtime. For example, a field of type Set could be HashSet or TreeSet in runtime, depending on how this instance (the instance on which inspectRuntimeShape() is called) is configured. d. ByteBuffer or similar objects, where byte order matters e. boolean flags, affecting branch taking f. Arrays of objects, meeting any of conditions a-e.
        Specified by:
        inspectRuntimeShape in interface HotLoopCallee