Class GenericIndexed<T>
- java.lang.Object
-
- org.apache.druid.segment.data.GenericIndexed<T>
-
- All Implemented Interfaces:
Closeable,AutoCloseable,Iterable<T>,HotLoopCallee,CloseableIndexed<T>,Indexed<T>,Serializer
public abstract class GenericIndexed<T> extends Object implements CloseableIndexed<T>, Serializer
A generic, flat storage mechanism. Use static methods fromArray() or fromIterable() to construct. If input is sorted, supports binary search index lookups. If input is not sorted, only supports array-like index lookups.V1 Storage Format:
byte 1: version (0x1) byte 2 == 0x1 =>; allowReverseLookup bytes 3-6 =>; numBytesUsed bytes 7-10 =>; numElements bytes 10-((numElements * 4) + 10): integers representing *end* offsets of byte serialized values bytes ((numElements * 4) + 10)-(numBytesUsed + 2): 4-byte integer representing length of value, followed by bytes for value. Length of value stored has no meaning, if next offset is strictly greater than the current offset, and if they are the same, -1 at this field means null, and 0 at this field means some object (potentially non-null - e. g. in the string case, that is serialized as an empty sequence of bytes).
V2 Storage Format Meta, header and value files are separate and header file stored in native endian byte order. Meta File: byte 1: version (0x2) byte 2 == 0x1 =>; allowReverseLookup bytes 3-6: numberOfElementsPerValueFile expressed as power of 2. That means all the value files contains same number of items except last value file and may have fewer elements. bytes 7-10 =>; numElements bytes 11-14 =>; columnNameLength bytes 15-columnNameLength =>; columnName
Header file name is identified as: StringUtils.format("%s_header", columnName) value files are identified as: StringUtils.format("%s_value_%d", columnName, fileNumber) number of value files == numElements/numberOfElementsPerValueFile The version
EncodedStringDictionaryWriter.VERSIONis reserved and must never be specified as theGenericIndexedversion byte, else it will interfere with string column deserialization.- See Also:
GenericIndexedWriter
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description classGenericIndexed.BufferIndexedSingle-threaded view.
-
Field Summary
Fields Modifier and Type Field Description protected booleanallowReverseLookupprotected intsizeprotected ObjectStrategy<T>strategystatic ObjectStrategy<String>STRING_STRATEGYstatic ObjectStrategy<ByteBuffer>UTF8_STRATEGYAn ObjectStrategy that returns a big-endian ByteBuffer pointing to original data.
-
Constructor Summary
Constructors Constructor Description GenericIndexed(ObjectStrategy<T> strategy, boolean allowReverseLookup, int size)
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected voidcheckIndex(int index)Checks ifindexa valid `element index` in GenericIndexed.voidclose()protected TcopyBufferAndGet(ByteBuffer valueBuffer, int startOffset, int endOffset)static <T> GenericIndexed<T>fromArray(T[] objects, ObjectStrategy<T> strategy)static <T> GenericIndexed<T>fromIterable(Iterable<T> objectsIterable, ObjectStrategy<T> strategy)Class<? extends T>getClazz()abstract longgetSerializedSize()Returns the number of bytes, that this Serializer will write to the output _channel_ (not smoosher) on aSerializer.writeTo(java.nio.channels.WritableByteChannel, org.apache.druid.java.util.common.io.smoosh.FileSmoosher)call.intindexOf(T value)Returns the index of "value" in this GenericIndexed object, or (-(insertion point) - 1) if the value is not present, in the manner of Arrays.binarySearch.booleanisSorted()Indicates if this value set is sorted, the implication being that the contract ofIndexed.indexOf(T)is strenthened to return a negative number equal to (-(insertion point) - 1) when the value is not present in the set.Iterator<T>iterator()static GenericIndexed<ResourceHolder<ByteBuffer>>ofCompressedByteBuffers(Iterable<ByteBuffer> buffers, CompressionStrategy compression, int bufferSize, ByteOrder order, Closer closer)static <T> GenericIndexed<T>read(ByteBuffer buffer, ObjectStrategy<T> strategy)static <T> GenericIndexed<T>read(ByteBuffer buffer, ObjectStrategy<T> strategy, SmooshedFileMapper fileMapper)abstract GenericIndexed.BufferIndexedsingleThreaded()intsize()Number of elements in the value set-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.druid.query.monomorphicprocessing.HotLoopCallee
inspectRuntimeShape
-
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
Methods inherited from interface org.apache.druid.segment.serde.Serializer
writeTo
-
-
-
-
Field Detail
-
UTF8_STRATEGY
public static final ObjectStrategy<ByteBuffer> UTF8_STRATEGY
An ObjectStrategy that returns a big-endian ByteBuffer pointing to original data. The returned ByteBuffer is a fresh read-only instance, so it is OK for callers to modify its position, limit, etc. However, it does point to the original data, so callers must take care not to use it if the original data may have been freed. The compare method of this instance usesStringUtils.compareUtf8UsingJavaStringOrdering(byte[], byte[])so that behavior is consistent withSTRING_STRATEGY.
-
STRING_STRATEGY
public static final ObjectStrategy<String> STRING_STRATEGY
-
strategy
protected final ObjectStrategy<T> strategy
-
allowReverseLookup
protected final boolean allowReverseLookup
-
size
protected final int size
-
-
Constructor Detail
-
GenericIndexed
public GenericIndexed(ObjectStrategy<T> strategy, boolean allowReverseLookup, int size)
-
-
Method Detail
-
read
public static <T> GenericIndexed<T> read(ByteBuffer buffer, ObjectStrategy<T> strategy)
-
read
public static <T> GenericIndexed<T> read(ByteBuffer buffer, ObjectStrategy<T> strategy, SmooshedFileMapper fileMapper)
-
fromArray
public static <T> GenericIndexed<T> fromArray(T[] objects, ObjectStrategy<T> strategy)
-
ofCompressedByteBuffers
public static GenericIndexed<ResourceHolder<ByteBuffer>> ofCompressedByteBuffers(Iterable<ByteBuffer> buffers, CompressionStrategy compression, int bufferSize, ByteOrder order, Closer closer)
-
fromIterable
public static <T> GenericIndexed<T> fromIterable(Iterable<T> objectsIterable, ObjectStrategy<T> strategy)
-
singleThreaded
public abstract GenericIndexed.BufferIndexed singleThreaded()
-
getSerializedSize
public abstract long getSerializedSize()
Description copied from interface:SerializerReturns the number of bytes, that this Serializer will write to the output _channel_ (not smoosher) on aSerializer.writeTo(java.nio.channels.WritableByteChannel, org.apache.druid.java.util.common.io.smoosh.FileSmoosher)call.- Specified by:
getSerializedSizein interfaceSerializer
-
checkIndex
protected void checkIndex(int index)
Checks ifindexa valid `element index` in GenericIndexed. Similar to Preconditions.checkElementIndex() except this method throwsIAEwith custom error message.Used here to get existing behavior(same error message and exception) of V1 GenericIndexed.
- Parameters:
index- index identifying an element of an GenericIndexed.
-
size
public int size()
Description copied from interface:IndexedNumber of elements in the value set
-
indexOf
public int indexOf(@Nullable T value)
Returns the index of "value" in this GenericIndexed object, or (-(insertion point) - 1) if the value is not present, in the manner of Arrays.binarySearch. This strengthens the contract of Indexed, which only guarantees that values-not-found will return some negative number.
-
isSorted
public boolean isSorted()
Description copied from interface:IndexedIndicates if this value set is sorted, the implication being that the contract ofIndexed.indexOf(T)is strenthened to return a negative number equal to (-(insertion point) - 1) when the value is not present in the set.
-
copyBufferAndGet
@Nullable protected T copyBufferAndGet(ByteBuffer valueBuffer, int startOffset, int endOffset)
-
close
public void close()
- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable
-
-