public interface DimensionMerger
IndexMerger).
This object is responsible for:
- merging encoding dictionaries, if present
- writing the merged column data and any merged indexing structures (e.g., dictionaries, bitmaps) to disk
At a high level, the index merging process can be broken down into the following steps:
- Merge segment's encoding dictionaries. These need to be merged across segments into a shared space of dictionary
mappings: writeMergedValueDictionary(List).
- Merge the rows across segments into a common sequence of rows. Done outside of scope of this interface,
currently in IndexMergerV9.
- After constructing the merged sequence of rows, process each individual row via processMergedRow(org.apache.druid.segment.ColumnValueSelector),
potentially continuing updating the internal structures.
- Write the value representation metadata (dictionary, bitmaps), the sequence of row values,
and index structures to a merged segment: writeIndexes(java.util.List<java.nio.IntBuffer>)
A class implementing this interface is expected to be highly stateful, updating its internal state as these
functions are called.| Modifier and Type | Method and Description |
|---|---|
boolean |
canSkip()
Return true if this dimension's data does not need to be written to the segment.
|
ColumnValueSelector |
convertSortedSegmentRowValuesToMergedRowValues(int segmentIndex,
ColumnValueSelector source)
Creates a value selector, which converts values with per-segment, _sorted order_ (see
DimensionIndexer.convertUnsortedValuesToSorted(org.apache.druid.segment.ColumnValueSelector)) encoding from the given selector to their equivalent
representation in the merged set of rows. |
void |
processMergedRow(ColumnValueSelector selector)
Process a column value(s) (potentially multi-value) of a row from the given selector and update the
DimensionMerger's internal state.
|
void |
writeIndexes(List<IntBuffer> segmentRowNumConversions)
Internally construct any index structures relevant to this DimensionMerger.
|
void |
writeMergedValueDictionary(List<IndexableAdapter> adapters)
Given a list of segment adapters:
- Read _sorted order_ (e.
|
void writeMergedValueDictionary(List<IndexableAdapter> adapters) throws IOException
IncrementalIndexAdapter.getDimValueLookup(String)) dictionary encoding information
from the adapters
- Merge those sorted order dictionary into a one big sorted order dictionary and write this merged dictionary.
The implementer should maintain knowledge of the "index number" of the adapters in the input list,
i.e., the position of each adapter in the input list.
This "index number" will be used to refer to specific segments later
in convertSortedSegmentRowValuesToMergedRowValues(int, org.apache.druid.segment.ColumnValueSelector).adapters - List of adapters to be merged.IOExceptionDimensionIndexer.convertUnsortedValuesToSorted(org.apache.druid.segment.ColumnValueSelector)ColumnValueSelector convertSortedSegmentRowValuesToMergedRowValues(int segmentIndex, ColumnValueSelector source)
DimensionIndexer.convertUnsortedValuesToSorted(org.apache.druid.segment.ColumnValueSelector)) encoding from the given selector to their equivalent
representation in the merged set of rows.
This method is used by the index merging process to build the merged sequence of rows.
The implementing class is expected to use the merged value metadata constructed
during writeMergedValueDictionary(List), if applicable.
For example, an implementation of this function for a dictionary-encoded String column would convert the
segment-specific, sorted order dictionary values within the row to the common merged dictionary values
determined during writeMergedValueDictionary(List).segmentIndex - indicates which segment the row originated from, in the order established in
writeMergedValueDictionary(List)source - the selector from which to take values to convertvoid processMergedRow(ColumnValueSelector selector) throws IOException
IOExceptionvoid writeIndexes(@Nullable List<IntBuffer> segmentRowNumConversions) throws IOException
processMergedRow(org.apache.druid.segment.ColumnValueSelector) calls, the DimensionMerger
can now build any index structures it needs.
For example, a dictionary encoded String implementation would create its bitmap indexes
for the merged segment during this step.
The index merger will provide a list of row number conversion IntBuffer objects.
Each IntBuffer is associated with one of the segments being merged; the position of the IntBuffer in the list
corresponds to the position of segment adapters within the input list of writeMergedValueDictionary(List).
For example, suppose there are two segments A and B.
Row 24 from segment A maps to row 99 in the merged sequence of rows,
The IntBuffer for segment A would have a mapping of 24 -> 99.segmentRowNumConversions - A list of row number conversion IntBuffer objects.IOExceptionboolean canSkip()
Copyright © 2011–2020 The Apache Software Foundation. All rights reserved.