@EverythingIsNonnullByDefault public class GroupingAggregatorFactory extends AggregatorFactory
grouping function to determine the grouping that a row is part of. Different result rows
for a query could have different grouping columns when subtotals are used.
This aggregator factory takes following arguments
- name - Name of aggregators
- groupings - List of dimensions that the user is interested in tracking
- keyDimensions - The list of grouping dimensions being included in the result row. This list is a subset of
groupings. This argument cannot be passed by the user. It is set by druid engine
when a particular subtotal spec is being processed. Whenever druid engine processes a new
subtotal spec, engine sets that subtotal spec as new keyDimensions.
When key dimensions are updated, value is updated as well. How the value is determined is captured
at groupingId(List, Set).
since grouping has to be calculated only once, it could have been implemented as a virtual function or
post-aggregator etc. We modelled it as an aggregation operator so that its output can be used in a post-aggregator.
Calcite too models grouping function as an aggregation operator.
Since it is a non-trivial special aggregation, implementing it required changes in core druid engine to work. There
were few approaches. We chose the approach that required least changes in core druid.
Refer to https://github.com/apache/druid/pull/10518#discussion_r532941216 for more details.
Currently, it works in following way
- On data servers (no change),
- this factory generates LongConstantAggregator / LongConstantBufferAggregator / LongConstantVectorAggregator
with keyDimensions as null
- The aggregators don't actually aggregate anything and their result is not actually used. We could have removed
these aggregators on data servers but that would result in a signature mismatch on broker and data nodes. That requires
extra handling and is error-prone.
- On brokers
- Results from data node is already being re-processed for each subtotal spec. We made modifications in this path to update the
grouping id for each row.| Constructor and Description |
|---|
GroupingAggregatorFactory(String name,
List<String> groupings) |
| Modifier and Type | Method and Description |
|---|---|
boolean |
canVectorize(ColumnInspector columnInspector)
Returns whether or not this aggregation class supports vectorization.
|
Object |
combine(Object lhs,
Object rhs)
A method that knows how to combine the outputs of
Aggregator.get() produced via AggregatorFactory.factorize(org.apache.druid.segment.ColumnSelectorFactory) or BufferAggregator.get(java.nio.ByteBuffer, int) produced via AggregatorFactory.factorizeBuffered(org.apache.druid.segment.ColumnSelectorFactory). |
Object |
deserialize(Object object)
A method that knows how to "deserialize" the object from whatever form it might have been put into
in order to transfer via JSON.
|
boolean |
equals(Object o) |
Aggregator |
factorize(ColumnSelectorFactory metricFactory) |
BufferAggregator |
factorizeBuffered(ColumnSelectorFactory metricFactory) |
VectorAggregator |
factorizeVector(VectorColumnSelectorFactory selectorFactory)
Create a VectorAggregator based on the provided column selector factory.
|
Object |
finalizeComputation(Object object)
"Finalizes" the computation of an object.
|
byte[] |
getCacheKey() |
AggregatorFactory |
getCombiningFactory()
Returns an AggregatorFactory that can be used to combine the output of aggregators from this factory.
|
Comparator |
getComparator() |
ValueType |
getFinalizedType()
Get the type for the final form of this this aggregator, i.e.
|
List<String> |
getGroupings() |
int |
getMaxIntermediateSize()
Returns the maximum size that this aggregator will require in bytes for intermediate storage of results.
|
String |
getName() |
List<AggregatorFactory> |
getRequiredColumns()
Used by
GroupByStrategyV1 when running nested groupBys, to
"transfer" values from this aggreagtor to an incremental index that the outer query will run on. |
ValueType |
getType()
Get the "intermediate"
ValueType for this aggregator. |
long |
getValue() |
int |
hashCode() |
List<String> |
requiredFields()
Get a list of fields that aggregators built by this factory will need to read.
|
String |
toString() |
GroupingAggregatorFactory |
withKeyDimensions(Set<String> newKeyDimensions)
Replace the param
keyDimensions with the new set of key dimensions |
getComplexTypeName, getMaxIntermediateSizeWithNulls, getMergingFactory, makeAggregateCombiner, makeNullableAggregateCombiner, mergeAggregators, optimizeForSegmentpublic Aggregator factorize(ColumnSelectorFactory metricFactory)
factorize in class AggregatorFactorypublic BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
factorizeBuffered in class AggregatorFactorypublic VectorAggregator factorizeVector(VectorColumnSelectorFactory selectorFactory)
AggregatorFactoryfactorizeVector in class AggregatorFactorypublic boolean canVectorize(ColumnInspector columnInspector)
AggregatorFactorycanVectorize in class AggregatorFactorypublic GroupingAggregatorFactory withKeyDimensions(Set<String> newKeyDimensions)
keyDimensions with the new set of key dimensionspublic Comparator getComparator()
getComparator in class AggregatorFactorypublic String getName()
getName in class AggregatorFactorypublic long getValue()
@Nullable public Object combine(@Nullable Object lhs, @Nullable Object rhs)
AggregatorFactoryAggregator.get() produced via AggregatorFactory.factorize(org.apache.druid.segment.ColumnSelectorFactory) or BufferAggregator.get(java.nio.ByteBuffer, int) produced via AggregatorFactory.factorizeBuffered(org.apache.druid.segment.ColumnSelectorFactory). Note, even though this method is called "combine",
this method's contract *does* allow for mutation of the input objects. Thus, any use of lhs or rhs after calling
this method is highly discouraged.combine in class AggregatorFactorylhs - The left hand side of the combinerhs - The right hand side of the combinepublic AggregatorFactory getCombiningFactory()
AggregatorFactoryCountAggregatorFactory getCombiningFactory method will return a
LongSumAggregatorFactory, because counts are combined by summing.
No matter what, `foo.getCombiningFactory()` and `foo.getCombiningFactory().getCombiningFactory()` should return
the same result.getCombiningFactory in class AggregatorFactorypublic List<AggregatorFactory> getRequiredColumns()
AggregatorFactoryGroupByStrategyV1 when running nested groupBys, to
"transfer" values from this aggreagtor to an incremental index that the outer query will run on. This method
only exists due to the design of GroupByStrategyV1, and should probably not be used for anything else. If you are
here because you are looking for a way to get the input fields required by this aggregator, and thought
"getRequiredColumns" sounded right, please use AggregatorFactory.requiredFields() instead.getRequiredColumns in class AggregatorFactorya similarly-named method that is perhaps the one you want instead.public Object deserialize(Object object)
AggregatorFactorydeserialize in class AggregatorFactoryobject - the object to deserialize@Nullable public Object finalizeComputation(@Nullable Object object)
AggregatorFactoryfinalizeComputation in class AggregatorFactoryobject - the object to be finalizedpublic List<String> requiredFields()
AggregatorFactoryrequiredFields in class AggregatorFactorypublic ValueType getType()
AggregatorFactoryValueType for this aggregator. This is the same as the type returned by
AggregatorFactory.deserialize(java.lang.Object) and the type accepted by AggregatorFactory.combine(java.lang.Object, java.lang.Object). However, it is *not* necessarily the same type
returned by AggregatorFactory.finalizeComputation(java.lang.Object).
Refer to the ValueType javadocs for details on the implications of choosing a type.getType in class AggregatorFactorypublic ValueType getFinalizedType()
AggregatorFactoryAggregatorFactory.finalizeComputation(java.lang.Object). This may be the same as or different than the types expected in AggregatorFactory.deserialize(java.lang.Object)
and AggregatorFactory.combine(java.lang.Object, java.lang.Object).
Refer to the ValueType javadocs for details on the implications of choosing a type.getFinalizedType in class AggregatorFactorypublic int getMaxIntermediateSize()
AggregatorFactorygetMaxIntermediateSize in class AggregatorFactorypublic byte[] getCacheKey()
Copyright © 2011–2021 The Apache Software Foundation. All rights reserved.