package aggregate
- Alphabetic
- Public
- All
Type Members
-
case class
ApproxPercentileFromTDigestExpr(child: Expression, percentiles: Either[Double, Array[Double]], finalDataType: DataType) extends Expression with GpuExpression with ShimExpression with Product with Serializable
This expression computes an approximate percentile using a t-digest as input.
This expression computes an approximate percentile using a t-digest as input.
- child
Expression that produces the t-digests.
- percentiles
Percentile scalar, or percentiles array to evaluate.
- finalDataType
Data type for results
- trait CpuToGpuAggregateBufferConverter extends AnyRef
- trait CpuToGpuBufferTransition extends UnaryExpression with ShimUnaryExpression with CodegenFallback
- class CpuToGpuCollectBufferConverter extends CpuToGpuAggregateBufferConverter
- case class CpuToGpuCollectBufferTransition(child: Expression, elementType: DataType) extends UnaryExpression with CpuToGpuBufferTransition with Product with Serializable
-
case class
CpuToGpuPercentileBufferConverter(elementType: DataType) extends CpuToGpuAggregateBufferConverter with Product with Serializable
Convert the incoming byte stream received from Spark CPU into internal histogram buffer format.
- case class CpuToGpuPercentileBufferTransition(child: Expression, elementType: DataType) extends UnaryExpression with CpuToGpuBufferTransition with Product with Serializable
- trait CudfAggregate extends Serializable
- class CudfCollectList extends CudfAggregate
-
class
CudfCollectSet extends CudfAggregate
Spark handles NaN's equality by different way for non-nested float/double and float/double in nested types.
Spark handles NaN's equality by different way for non-nested float/double and float/double in nested types. When we use non-nested versions of floats and doubles, NaN values are considered unequal, but when we collect sets of nested versions, NaNs are considered equal on the CPU. So we set NaNEquality dynamically in CudfCollectSet and CudfMergeSets. Note that dataType is ArrayType(child.dataType) here.
- class CudfCount extends CudfAggregate
- case class CudfHistogram(dataType: DataType) extends CudfAggregate with Product with Serializable
- class CudfM2 extends CudfAggregate
- class CudfMax extends CudfAggregate
- class CudfMaxBy extends CudfMaxMinByAggregate
- abstract class CudfMaxMinByAggregate extends CudfAggregate
-
class
CudfMean extends CudfAggregate
This class is only used by the M2 class aggregates, do not confuse this with GpuAverage.
This class is only used by the M2 class aggregates, do not confuse this with GpuAverage. In the future, this aggregate class should be removed and the mean values should be generated in the output of libcudf's M2 aggregate.
- case class CudfMergeHistogram(dataType: DataType) extends CudfAggregate with Product with Serializable
- class CudfMergeLists extends CudfAggregate
- class CudfMergeM2 extends CudfAggregate
- class CudfMergeSets extends CudfAggregate
- class CudfMin extends CudfAggregate
- class CudfMinBy extends CudfMaxMinByAggregate
- class CudfNthLikeAggregate extends CudfAggregate
- class CudfSum extends CudfAggregate
- class CudfTDigestMerge extends CudfAggregate
- class CudfTDigestUpdate extends CudfAggregate
- case class GpuAggregateExpression(origAggregateFunction: GpuAggregateFunction, mode: AggregateMode, isDistinct: Boolean, filter: Option[Expression], resultId: ExprId) extends Expression with GpuExpression with ShimExpression with GpuUnevaluable with Product with Serializable
-
trait
GpuAggregateFunction extends Expression with GpuExpression with ShimExpression with GpuUnevaluable
Trait that all aggregate functions implement.
Trait that all aggregate functions implement.
Aggregates start with some input from the child plan or from another aggregate (or from itself if the aggregate is merging several batches).
In general terms an aggregate function can be in one of two modes of operation: update or merge. Either the function is aggregating raw input, or it is merging previously aggregated data. Normally, Spark breaks up the processing of the aggregate in two exec nodes (a partial aggregate and a final), and the are separated by a shuffle boundary. That is not true for all aggregates, especially when looking at other flavors of Spark. What doesn't change is the core function of updating or merging. Note that an aggregate can merge right after an update is performed, as we have cases where input batches are update-aggregated and then a bigger batch is built by merging together those pre-aggregated inputs.
Aggregates have an interface to Spark and that is defined by
aggBufferAttributes. This collection of attributes must match the Spark equivalent of the aggregate, so that if half of the aggregate (update or merge) executes on the CPU, we can be compatible. The GpuAggregateFunction adds special steps to ensure that it can produce (and consume) batches in the shape ofaggBufferAttributes.The general transitions that are implemented in the aggregate function are as follows:
1)
inputProjection->updateAggregates:inputProjectioncreates a sequence of values that are operated on by theupdateAggregates. The length ofinputProjectionmust be the same asupdateAggregates, andupdateAggregates(cuDF aggregates) should be able to work with the product of theinputProjection(i.e. types are compatible)2)
updateAggregates->postUpdate: after the cuDF update aggregate, a post process step can (optionally) be performed. ThepostUpdatetakes the output ofupdateAggregatethat must match the order of columns and types as specified inaggBufferAttributes.3)
postUpdate->preMerge: preMerge prepares batches before going into themergeAggregate. ThepreMergestep binds toaggBufferAttributes, so it can be used to transform Spark compatible batch to a batch that the cuDF merge aggregate expects. Its input has the same shape as that produced bypostUpdate.4)
mergeAggregates->postMerge: postMerge optionally transforms the output of the cuDF merge aggregate in two situations: 1 - The step is used to match theaggBufferAttributesreferences for partial aggregates where each partially aggregated batch is getting merged withAggHelper(merge=true)2 - In a final aggregate where the merged batches are transformed to whatevaluateExpressionexpects. For simple aggregates like sum or count,evaluateExpressionis justaggBufferAttributes, but for more complex aggregates, it is an expression (see GpuAverage and GpuM2 subclasses) that relies on the merge step producing a columns in the shape ofaggBufferAttributes. -
case class
GpuApproximatePercentile(child: Expression, percentageExpression: GpuLiteral, accuracyExpression: GpuLiteral = ...) extends Expression with GpuAggregateFunction with Product with Serializable
The ApproximatePercentile function returns the approximate percentile(s) of a column at the given percentage(s).
The ApproximatePercentile function returns the approximate percentile(s) of a column at the given percentage(s). A percentile is a watermark value below which a given percentage of the column values fall. For example, the percentile of column
colat percentage 50% is the median of columncol.This function supports partial aggregation.
The GPU implementation uses t-digest to perform the initial aggregation (see
updateExpressions/mergeExpressions) and then applies theApproxPercentileFromTDigestExprexpression to compute percentiles from the final t-digest (seeevaluateExpression).There are two different data types involved here. The t-digests are a map of centroids (
Map[mean: Double -> weight: Double]) represented asList[Struct[Double, Double]]and the final output is either a single double or an array of doubles, depending on whether thepercentageExpressionparameter is a single value or an array.- child
child expression that can produce column value with
child.eval()- percentageExpression
Expression that represents a single percentage value or an array of percentage values. Each percentage value must be between 0.0 and 1.0.
- accuracyExpression
Integer literal expression of approximation accuracy. Higher value yields better accuracy, the default value is DEFAULT_PERCENTILE_ACCURACY.
-
case class
GpuAssembleSumChunks(chunkAttrs: Seq[AttributeReference], dataType: DecimalType, nullOnOverflow: Boolean) extends Expression with GpuExpression with ShimExpression with Product with Serializable
Reassembles a 128-bit value from four separate 64-bit sum results
Reassembles a 128-bit value from four separate 64-bit sum results
- chunkAttrs
attributes for the four 64-bit sum chunks ordered from least significant to most significant
- dataType
output type of the reconstructed 128-bit value
- nullOnOverflow
whether to produce null on overflows
- abstract class GpuAverage extends Expression with GpuAggregateFunction with GpuReplaceWindowFunction with Serializable
- case class GpuBasicAverage(child: Expression, dt: DataType) extends GpuAverage with Product with Serializable
- case class GpuBasicDecimalAverage(child: Expression, dt: DecimalType) extends GpuDecimalAverage with Product with Serializable
-
case class
GpuBasicDecimalSum(child: Expression, dt: DecimalType, failOnErrorOverride: Boolean) extends GpuDecimalSum with Product with Serializable
Sum aggregations for decimals up to and including DECIMAL64
-
case class
GpuBasicMax(child: Expression) extends GpuMax with Product with Serializable
Max aggregation without
Nanhandling -
case class
GpuBasicMin(child: Expression) extends GpuMin with Product with Serializable
Min aggregation without
Nanhandling -
case class
GpuBasicSum(child: Expression, resultType: DataType, failOnErrorOverride: Boolean) extends GpuSum with Product with Serializable
Sum aggregation for non-decimal types
-
case class
GpuCheckOverflowAfterSum(data: Expression, isEmpty: Expression, dataType: DecimalType, nullOnOverflow: Boolean) extends Expression with GpuExpression with ShimExpression with Product with Serializable
This is equivalent to what Spark does after a sum to check for overflow
If(isEmpty, Literal.create(null, resultType), CheckOverflowInSum(sum, d, !SQLConf.get.ansiEnabled))This is equivalent to what Spark does after a sum to check for overflow
If(isEmpty, Literal.create(null, resultType), CheckOverflowInSum(sum, d, !SQLConf.get.ansiEnabled))But we are renaming it to avoid confusion with the overflow detection we do as a part of sum itself that takes the place of the overflow checking that happens with add.
- trait GpuCollectBase extends Expression with GpuAggregateFunction with GpuDeterministicFirstLastCollectShim with GpuAggregateWindowFunction
-
case class
GpuCollectList(child: Expression, mutableAggBufferOffset: Int = 0, inputAggBufferOffset: Int = 0) extends Expression with GpuCollectBase with Product with Serializable
Collects and returns a list of non-unique elements.
Collects and returns a list of non-unique elements.
The two 'offset' parameters are not used by GPU version, but are here for the compatibility with the CPU version and automated checks.
-
case class
GpuCollectSet(child: Expression, mutableAggBufferOffset: Int = 0, inputAggBufferOffset: Int = 0) extends Expression with GpuCollectBase with GpuUnboundedToUnboundedWindowAgg with Product with Serializable
Collects and returns a set of unique elements.
Collects and returns a set of unique elements.
The two 'offset' parameters are not used by GPU version, but are here for the compatibility with the CPU version and automated checks.
- case class GpuCount(children: Seq[Expression], failOnError: Boolean = SQLConf.get.ansiEnabled) extends Expression with GpuAggregateFunction with GpuBatchedRunningWindowWithFixer with GpuUnboundToUnboundWindowWithFixer with GpuAggregateWindowFunction with GpuRunningWindowFunction with Product with Serializable
-
case class
GpuCreateHistogramIfValid(valuesExpr: Expression, frequenciesExpr: Expression, isReduction: Boolean, outputType: DataType) extends Expression with GpuExpression with ShimExpression with Product with Serializable
Create a histogram buffer from the input values and frequencies.
Create a histogram buffer from the input values and frequencies.
The frequencies are also checked to ensure that they are non-negative. If a negative frequency exists, an exception will be thrown.
-
case class
GpuDecimal128Average(child: Expression, dt: DecimalType) extends GpuDecimalAverage with Product with Serializable
Average aggregations for DECIMAL128.
Average aggregations for DECIMAL128.
To avoid the significantly slower sort-based aggregations in cudf for DECIMAL128 columns, the incoming DECIMAL128 values are split into four 32-bit chunks which are summed separately into 64-bit intermediate results and then recombined into a 128-bit result with overflow checking. See GpuDecimal128Sum for more details.
-
case class
GpuDecimal128Sum(child: Expression, dt: DecimalType, failOnErrorOverride: Boolean, forceWindowSumToNotBeReplaced: Boolean) extends GpuDecimalSum with GpuReplaceWindowFunction with Product with Serializable
Sum aggregations for DECIMAL128.
Sum aggregations for DECIMAL128.
The sum aggregation is performed by splitting the original 128-bit values into 32-bit "chunks" and summing those. The chunking accomplishes two things. First, it helps avoid cudf resorting to a much slower aggregation since currently DECIMAL128 sums are only implemented for sort-based aggregations. Second, chunking allows detection of overflows.
The chunked approach to sum aggregation works as follows. The 128-bit value is split into its four 32-bit chunks, with the most significant chunk being an INT32 and the remaining three chunks being UINT32. When these are sum aggregated, cudf will implicitly upscale the accumulated result to a 64-bit value. Since cudf only allows up to 2**31 rows to be aggregated at a time, the "extra" upper 32-bits of the upscaled 64-bit accumulation values will be enough to hold the worst-case "carry" bits from summing each 32-bit chunk.
After the cudf aggregation has completed, the four 64-bit chunks are reassembled into a 128-bit value. The lowest 32-bits of the least significant 64-bit chunk are used directly as the lowest 32-bits of the final value, and the remaining 32-bits are added to the next most significant 64-bit chunk. The lowest 32-bits of that chunk then become the next 32-bits of the 128-bit value and the remaining 32-bits are added to the next 64-bit chunk, and so on. Finally after the 128-bit value is constructed, the remaining "carry" bits of the most significant chunk after reconstruction are checked against the sign bit of the 128-bit result to see if there was an overflow.
- abstract class GpuDecimalAverage extends GpuDecimalAverageBase
- abstract class GpuDecimalAverageBase extends GpuAverage
- abstract class GpuDecimalSum extends GpuSum
-
case class
GpuDecimalSumHighDigits(input: Expression, originalInputType: DecimalType) extends Expression with GpuExpression with ShimExpression with Product with Serializable
This extracts the highest digits from a Decimal value as a part of doing a SUM.
-
case class
GpuExtractChunk32(data: Expression, chunkIdx: Int, replaceNullsWithZero: Boolean) extends Expression with GpuExpression with ShimExpression with Product with Serializable
Extracts a 32-bit chunk from a 128-bit value
Extracts a 32-bit chunk from a 128-bit value
- data
expression producing 128-bit values
- chunkIdx
index of chunk to extract (0-3)
- replaceNullsWithZero
whether to replace nulls with zero
- case class GpuFirst(child: Expression, ignoreNulls: Boolean) extends Expression with GpuAggregateFunction with GpuBatchedRunningWindowWithFixer with GpuAggregateWindowFunction with GpuDeterministicFirstLastCollectShim with ImplicitCastInputTypes with Serializable with Product
-
case class
GpuFloatMax(child: Expression) extends GpuMax with GpuReplaceWindowFunction with Product with Serializable
Max aggregation for FloatType and DoubleType to handle
Nans.Max aggregation for FloatType and DoubleType to handle
Nans.In Spark,
Nanis the max float value, however in cuDF, the calculation involvingNanis undefined. We design a workaround method here to match the Spark's behaviour. The high level idea is that, in the projection stage, we create another columnisNan. If any value in this column is true, returnNan, Else, return whatGpuBasicMaxreturns. -
case class
GpuFloatMin(child: Expression) extends GpuMin with GpuReplaceWindowFunction with Product with Serializable
GpuMin for FloatType and DoubleType to handle
Nans.GpuMin for FloatType and DoubleType to handle
Nans.In Spark,
Nanis the max float value, however in cuDF, the calculation involvingNanis undefined. We design a workaround method here to match the Spark's behaviour. The high level idea is: if the column contains onlyNans ornulls then if the column containsNanthen returnNanelse return null else replace allNans with nulls; use cuDF kernel to find the min value - case class GpuLast(child: Expression, ignoreNulls: Boolean) extends Expression with GpuAggregateFunction with GpuBatchedRunningWindowWithFixer with GpuAggregateWindowFunction with GpuDeterministicFirstLastCollectShim with ImplicitCastInputTypes with Serializable with Product
-
abstract
class
GpuM2 extends Expression with GpuAggregateFunction with ImplicitCastInputTypes with Serializable
Base class for overriding standard deviation and variance aggregations.
Base class for overriding standard deviation and variance aggregations. This is also a GPU-based implementation of 'CentralMomentAgg' aggregation class in Spark with the fixed 'momentOrder' variable set to '2'.
- abstract class GpuMax extends Expression with GpuAggregateFunction with GpuBatchedRunningWindowWithFixer with GpuUnboundToUnboundWindowWithFixer with GpuAggregateWindowFunction with GpuRunningWindowFunction with Serializable
- case class GpuMaxBy(valueExpr: Expression, orderingExpr: Expression) extends GpuMaxMinByBase with Product with Serializable
- abstract class GpuMaxMinByBase extends Expression with GpuAggregateFunction with Serializable
- abstract class GpuMin extends Expression with GpuAggregateFunction with GpuBatchedRunningWindowWithFixer with GpuUnboundToUnboundWindowWithFixer with GpuAggregateWindowFunction with GpuRunningWindowFunction with Serializable
- case class GpuMinBy(valueExpr: Expression, orderingExpr: Expression) extends GpuMaxMinByBase with Product with Serializable
- case class GpuNthValue(child: Expression, offset: Expression, ignoreNulls: Boolean) extends Expression with GpuAggregateWindowFunction with GpuBatchedRunningWindowWithFixer with ImplicitCastInputTypes with Serializable with Product
- abstract class GpuPercentile extends Expression with GpuAggregateFunction with Serializable
-
case class
GpuPercentileDefault(childExpr: Expression, percentage: GpuLiteral, isReduction: Boolean) extends GpuPercentile with Product with Serializable
Compute percentiles from just the input values.
-
case class
GpuPercentileEvaluation(childExpr: Expression, percentage: Either[Double, Array[Double]], outputType: DataType, isReduction: Boolean) extends Expression with GpuExpression with ShimExpression with Product with Serializable
Perform the final evaluation step to compute percentiles from histograms.
-
case class
GpuPercentileWithFrequency(childExpr: Expression, percentage: GpuLiteral, frequencyExpr: Expression, isReduction: Boolean) extends GpuPercentile with Product with Serializable
Compute percentiles from the input values associated with frequencies.
- case class GpuPivotFirst(pivotColumn: Expression, valueColumn: Expression, pivotColumnValues: Seq[Any]) extends Expression with GpuAggregateFunction with Product with Serializable
- case class GpuReplaceNullmask(input: Expression, mask: Expression) extends Expression with GpuExpression with ShimExpression with Product with Serializable
- case class GpuStddevPop(child: Expression, nullOnDivideByZero: Boolean) extends GpuM2 with Product with Serializable
- case class GpuStddevSamp(child: Expression, nullOnDivideByZero: Boolean) extends GpuM2 with GpuReplaceWindowFunction with Product with Serializable
- abstract class GpuSum extends Expression with GpuAggregateFunction with ImplicitCastInputTypes with GpuBatchedRunningWindowWithFixer with GpuAggregateWindowFunction with GpuRunningWindowFunction with Serializable
- trait GpuToCpuAggregateBufferConverter extends AnyRef
- trait GpuToCpuBufferTransition extends UnaryExpression with ShimUnaryExpression with CodegenFallback
- class GpuToCpuCollectBufferConverter extends GpuToCpuAggregateBufferConverter
- case class GpuToCpuCollectBufferTransition(child: Expression) extends UnaryExpression with GpuToCpuBufferTransition with Product with Serializable
-
case class
GpuToCpuPercentileBufferConverter(elementType: DataType) extends GpuToCpuAggregateBufferConverter with Product with Serializable
Convert the internal histogram buffer into a byte stream that can be deserialized by Spark CPU.
- case class GpuToCpuPercentileBufferTransition(child: Expression, elementType: DataType) extends UnaryExpression with GpuToCpuBufferTransition with Product with Serializable
- case class GpuVariancePop(child: Expression, nullOnDivideByZero: Boolean) extends GpuM2 with Product with Serializable
- case class GpuVarianceSamp(child: Expression, nullOnDivideByZero: Boolean) extends GpuM2 with Product with Serializable
- case class WindowStddevSamp(child: Expression, nullOnDivideByZero: Boolean) extends Expression with GpuAggregateWindowFunction with Product with Serializable
- case class WrappedAggFunction(aggregateFunction: GpuAggregateFunction, filter: Expression) extends Expression with GpuAggregateFunction with Product with Serializable
Value Members
-
object
CudfAll
Check if all values in a boolean column are trues.
Check if all values in a boolean column are trues. The CUDF all aggregation does not work for reductions or group by aggregations so we use Min as a workaround for this.
-
object
CudfAny
Check if there is a
truevalue in a boolean column.Check if there is a
truevalue in a boolean column. The CUDF any aggregation does not work for reductions or group by aggregations so we use Max as a workaround for this. - object CudfMaxMinBy
- object CudfNthLikeAggregate extends Serializable
- object CudfTDigest
- object GpuAverage extends Serializable
-
object
GpuDecimalSumOverflow
All decimal processing in Spark has overflow detection as a part of it.
All decimal processing in Spark has overflow detection as a part of it. Either it replaces the value with a null in non-ANSI mode, or it throws an exception in ANSI mode. Spark will also do the processing for larger values as
Decimalvalues which are based onBigDecimaland have unbounded precision. So in most cases it is impossible to overflow/underflow so much that an incorrect value is returned. Spark will just use more and more memory to hold the value and then check for overflow at some point when the result needs to be turned back into a 128-bit value.We cannot do the same thing. Instead we take three strategies to detect overflow.
1. For decimal values with a precision of 8 or under we follow Spark and do the SUM on the unscaled value as a long, and then bit-cast the result back to a Decimal value. this means that we can SUM
174,467,442,481maximum or minimum decimal values with a precision of 8 before overflow can no longer be detected. It is much higher for decimal values with a smaller precision. 2. For decimal values with a precision from 9 to 20 inclusive we sum them as 128-bit values. this is very similar to what we do in the first strategy. The main differences are that we use a 128-bit value when doing the sum, and we check for overflow after processing each batch. In the case of group-by and reduction that happens after the update stage and also after each merge stage. This gives us enough room that we can always detect overflow when summing a single batch. Even on a merge where we could be doing the aggregation on a batch that has all max output values in it. 3. For values from 21 to 28 inclusive we have enough room to not check for overflow on teh update aggregation, but for the merge aggregation we need to do some extra checks. This is done by taking the digits above 28 and sum them separately. We then check to see if they would have overflowed the original limits. This lets us detect overflow in cases where the original value would have wrapped around. The reason this works is because we have a hard limit on the maximum number of values in a single batch being processed.Int.MaxValue, or about 2.2 billion values. So we use a precision on the higher values that is large enough to handle 2.2 billion values and still detect overflow. This equates to a precision of about 10 more than is needed to hold the higher digits. This effectively gives us unlimited overflow detection. 4. For anything larger than precision 28 we do the same overflow detection for strategy 3, but also do it on the update aggregation. This lets us fully detect overflows in any stage of an aggregation.Note that for Window operations either there is no merge stage or it only has a single value being merged into a batch instead of an entire batch being merged together. This lets us handle the overflow detection with what is built into GpuAdd.
- object GpuMax extends Serializable
- object GpuMin extends Serializable
- object GpuPercentile extends Serializable
- object GpuSum extends Serializable