EncodedType - class of a single encoded valueEncodedKeyComponentType - A row key contains a component for each dimension, this param specifies the
class of this dimension's key component. A column type that supports multivalue rows
should use an array type (e.g., Strings would use int[]). Column types without
multivalue row support should use single objects (e.g., Long, Float).ActualType - class of a single actual valuepublic interface DimensionIndexer<EncodedType extends Comparable<EncodedType>,EncodedKeyComponentType,ActualType extends Comparable<ActualType>>
IncrementalIndex).
Ingested row values are passed to a DimensionIndexer, which will update its internal data structures such as
a value->ID dictionary as row values are seen.
The DimensionIndexer is also responsible for implementing various value lookup operations,
such as conversion between an encoded value and its full representation. It maintains knowledge of the
mappings between encoded values and actual values.
Sorting and Ordering
--------------------
When encoding is present, there are two relevant orderings for the encoded values.
1.) Ordering based on encoded value's order of ingestion
2.) Ordering based on converted actual value
Suppose we have a new String dimension DimA, which sees the values "Hello", "World", and "Apple", in that order.
This would correspond to dictionary encodings of "Hello"=0, "World"=1, and "Apple"=2, by the order
in which these values were first seen during ingestion.
However, some use cases require the encodings to be sorted by their associated actual values.
In this example, that ordering would be "Apple"=0, "Hello"=1, "World"=2.
The first ordering will be referred to as "Unsorted" in the documentation for this interface, and
the second ordering will be referred to as "Sorted".
The unsorted ordering is used during ingestion, within the IncrementalIndexRow
keys; the encodings are built as rows are ingested, taking the order in which new dimension values are seen.
The generation of a sorted encoding takes place during segment creation when indexes are merged/persisted.
The sorted ordering will be used for dimension value arrays in that context and when reading from
persisted segments.
Note that after calling the methods below that deal with sorted encodings,
- getUnsortedEncodedValueFromSorted()
- getSortedIndexedValues()
- convertUnsortedEncodedKeyComponentToSortedEncodedKeyComponent()
calling processRowValsToUnsortedEncodedKeyComponent() afterwards can invalidate previously read sorted encoding values
(i.e., new values could be added that are inserted between existing values in the ordering).
Thread Safety
--------------------
Each DimensionIndexer exists within the context of a single IncrementalIndex. Before IndexMerger.persist() is
called on an IncrementalIndex, any associated DimensionIndexers should allow multiple threads to add data to the
indexer via processRowValsToUnsortedEncodedKeyComponent() and allow multiple threads to read data via methods that only
deal with unsorted encodings.
As mentioned in the "Sorting and Ordering" section, writes and calls to the sorted encoding
methods should not be interleaved: the sorted encoding methods should only be called when it is known that
writes to the indexer will no longer occur.
The implementations of methods dealing with sorted encodings are free to assume that they will be called
by only one thread.
The sorted encoding methods are not currently used outside of index merging/persisting (single-threaded context, and
no new events will be added to the indexer).
If an indexer is passed to a thread that will use the sorted encoding methods, the caller is responsible
for ensuring that previous writes to the indexer are visible to the thread that uses the sorted encoding space.
For example, in the RealtimePlumber and IndexGeneratorJob, the thread that performs index persist is started
by the same thread that handles the row adds on an index, ensuring the adds are visible to the persist thread.| Modifier and Type | Method and Description |
|---|---|
boolean |
checkUnsortedEncodedKeyComponentsEqual(EncodedKeyComponentType lhs,
EncodedKeyComponentType rhs)
Check if two row value arrays from Row keys are equal.
|
int |
compareUnsortedEncodedKeyComponents(EncodedKeyComponentType lhs,
EncodedKeyComponentType rhs)
Compares the row values for this DimensionIndexer's dimension from a Row key.
|
Object |
convertUnsortedEncodedKeyComponentToActualList(EncodedKeyComponentType key)
Given a row value array from a Row key, as described in the documentation for
compareUnsortedEncodedKeyComponents(EncodedKeyComponentType, EncodedKeyComponentType), convert the unsorted encoded values to a list of actual values. |
ColumnValueSelector |
convertUnsortedValuesToSorted(ColumnValueSelector selectorWithUnsortedValues)
Converts dictionary-encoded row values from unspecified (random) encoding order, to sorted encoding.
|
long |
estimateEncodedKeyComponentSize(EncodedKeyComponentType key)
Gives the estimated size in bytes for the given key
|
void |
fillBitmapsFromUnsortedEncodedKeyComponent(EncodedKeyComponentType key,
int rowNum,
MutableBitmap[] bitmapIndexes,
BitmapFactory factory)
Helper function for building bitmap indexes for integer-encoded dimensions.
|
int |
getCardinality()
Get the cardinality of this dimension's values.
|
ActualType |
getMaxValue()
Get the maximum dimension value seen by this indexer.
|
ActualType |
getMinValue()
Get the minimum dimension value seen by this indexer.
|
CloseableIndexed<ActualType> |
getSortedIndexedValues()
Returns an indexed structure of this dimension's sorted actual values.
|
int |
getUnsortedEncodedKeyComponentHashCode(EncodedKeyComponentType key)
Given a row value array from a Row key, generate a hashcode.
|
EncodedType |
getUnsortedEncodedValueFromSorted(EncodedType sortedIntermediateValue)
Given an encoded value that was ordered by associated actual value, return the equivalent
encoded value ordered by time of ingestion.
|
ColumnValueSelector<?> |
makeColumnValueSelector(IncrementalIndexRowHolder currEntry,
IncrementalIndex.DimensionDesc desc)
Return an object used to read values from this indexer's column.
|
DimensionSelector |
makeDimensionSelector(DimensionSpec spec,
IncrementalIndexRowHolder currEntry,
IncrementalIndex.DimensionDesc desc)
Return an object used to read values from this indexer's column as Strings.
|
EncodedKeyComponentType |
processRowValsToUnsortedEncodedKeyComponent(Object dimValues,
boolean reportParseExceptions)
Given a single row value or list of row values (for multi-valued dimensions), update any internal data structures
with the ingested values and return the row values as an array to be used within a Row key.
|
void |
setSparseIndexed()
This method will be called while building an
IncrementalIndex whenever a known dimension column (either
through an explicit schema on the ingestion spec, or auto-discovered while processing rows) is absent in any row
that is processed, to allow an indexer to account for any missing rows if necessary. |
EncodedKeyComponentType processRowValsToUnsortedEncodedKeyComponent(@Nullable Object dimValues, boolean reportParseExceptions)
dimValues - Single row val to processreportParseExceptions - void setSparseIndexed()
IncrementalIndex whenever a known dimension column (either
through an explicit schema on the ingestion spec, or auto-discovered while processing rows) is absent in any row
that is processed, to allow an indexer to account for any missing rows if necessary. Useful so that a string
DimensionSelector built on top of an IncrementalIndex may accurately report
DimensionDictionarySelector.nameLookupPossibleInAdvance() by allowing it to track if it has any implicit null valued
rows.
At index persist/merge time all missing columns for a row will be explicitly replaced with the value appropriate
null or default value.long estimateEncodedKeyComponentSize(EncodedKeyComponentType key)
key - dimension value array from a TimeAndDims keyEncodedType getUnsortedEncodedValueFromSorted(EncodedType sortedIntermediateValue)
sortedIntermediateValue - value to convertCloseableIndexed<ActualType> getSortedIndexedValues()
ActualType getMinValue()
ActualType getMaxValue()
int getCardinality()
DimensionSelector makeDimensionSelector(DimensionSpec spec, IncrementalIndexRowHolder currEntry, IncrementalIndex.DimensionDesc desc)
spec - Specifies the output name of a dimension and any extraction functions to be applied.currEntry - Provides access to the current Row object in the Cursordesc - Descriptor object for this dimension within an IncrementalIndexColumnValueSelector<?> makeColumnValueSelector(IncrementalIndexRowHolder currEntry, IncrementalIndex.DimensionDesc desc)
currEntry - Provides access to the current Row object in the Cursordesc - Descriptor object for this dimension within an IncrementalIndexint compareUnsortedEncodedKeyComponents(@Nullable EncodedKeyComponentType lhs, @Nullable EncodedKeyComponentType rhs)
DimensionHandler.getEncodedValueSelectorComparator(), otherwise incorrect ordering/merging of rows
can occur during ingestion, causing issues such as imperfect rollup.lhs - dimension value array from a Row keyrhs - dimension value array from a Row keyboolean checkUnsortedEncodedKeyComponentsEqual(@Nullable EncodedKeyComponentType lhs, @Nullable EncodedKeyComponentType rhs)
lhs - dimension value array from a Row keyrhs - dimension value array from a Row keyint getUnsortedEncodedKeyComponentHashCode(@Nullable EncodedKeyComponentType key)
key - dimension value array from a Row keyObject convertUnsortedEncodedKeyComponentToActualList(EncodedKeyComponentType key)
compareUnsortedEncodedKeyComponents(EncodedKeyComponentType, EncodedKeyComponentType), convert the unsorted encoded values to a list of actual values.
If the key has one element, this method should return a single Object instead of a list.key - dimension value array from a Row keyColumnValueSelector convertUnsortedValuesToSorted(ColumnValueSelector selectorWithUnsortedValues)
DimensionMerger.convertSortedSegmentRowValuesToMergedRowValues(int, org.apache.druid.segment.ColumnValueSelector). The latter method requires sorted encoding
values on the input, because DimensionMerger.writeMergedValueDictionary(java.util.List<org.apache.druid.segment.IndexableAdapter>) takes sorted lookups as it's input.void fillBitmapsFromUnsortedEncodedKeyComponent(EncodedKeyComponentType key, int rowNum, MutableBitmap[] bitmapIndexes, BitmapFactory factory)
key - dimension value array from a Row keyrowNum - current row numberbitmapIndexes - array of bitmaps, indexed by integer dimension valuefactory - bitmap factoryCopyright © 2011–2020 The Apache Software Foundation. All rights reserved.