package window
- Alphabetic
- Public
- All
Type Members
-
case class
AggAndReplace[T](agg: T, nullReplacePolicy: Option[ReplacePolicy]) extends Product with Serializable
For Scan and GroupBy Scan aggregations nulls are not always treated the same way as they are in window operations.
For Scan and GroupBy Scan aggregations nulls are not always treated the same way as they are in window operations. Often we have to run a post processing step and replace them. This groups those two together so we can have a complete picture of how to perform these types of aggregations.
-
class
AutoClosableArrayBuffer[T <: AutoCloseable] extends AutoCloseable
Just a simple wrapper to make working with buffers of AutoClosable things play nicely with withResource.
-
trait
BasicWindowCalc extends AnyRef
Calculates the results of window operations.
Calculates the results of window operations. It assumes that any batching of the data or fixups after the fact to get the right answer is done outside of this.
- case class BatchedOps(running: Seq[NamedExpression], unboundedAgg: Seq[NamedExpression], unboundedDoublePass: Seq[NamedExpression], bounded: Seq[NamedExpression], passThrough: Seq[NamedExpression]) extends Product with Serializable
-
class
BatchedRunningWindowBinaryFixer extends BatchedRunningWindowFixer with Logging
This class fixes up batched running windows by performing a binary op on the previous value and those in the the same partition by key group.
This class fixes up batched running windows by performing a binary op on the previous value and those in the the same partition by key group. It does not deal with nulls, so it works for things like row_number and count, that cannot produce nulls, or for NULL_MIN and NULL_MAX that do the right thing when they see a null.
-
trait
BatchedRunningWindowFixer extends AutoCloseable with Retryable
Provides a way to process running window operations without needing to buffer and split the batches on partition by boundaries.
Provides a way to process running window operations without needing to buffer and split the batches on partition by boundaries. When this happens part of a partition by key set may have been processed in the last batch, and the rest of it will need to be updated. For example if we are doing a running min operation. We may first get in something like
PARTS: 1, 1, 2, 2 VALUES: 2, 3, 10, 9The output of processing this would result in a new column that would look like
MINS: 2, 2, 10, 9But we don't know if the group with 2 in PARTS is done or not. So the fixer saved the last value in MINS, which is a 9. When the next batch shows up
PARTS: 2, 2, 3, 3 VALUES: 11, 5, 13, 14We generate the window result again and get
MINS: 11, 5, 13, 13But we cannot output this yet because there may have been overlap with the previous batch. The framework will figure that out and pass data into
fixUpto do the fixing. It will pass in MINS, and also a column of boolean valuestrue, true, false, falseto indicate which rows overlapped with the previous batch. In our min examplefixUpwill do a min between the last value in the previous batch and the values that could overlap with it.RESULT: 9, 5, 13, 13which can be output. - class BatchedUnboundedToUnboundedBinaryFixer extends BatchedUnboundedToUnboundedWindowFixer
-
trait
BatchedUnboundedToUnboundedWindowFixer extends AutoCloseable
Provides a way to process window operations without needing to buffer and split the batches on partition by boundaries.
Provides a way to process window operations without needing to buffer and split the batches on partition by boundaries. When this happens part of a partition by key set may have been processed in the previous batches, and may need to be updated. For example if we are doing a min operation with unbounded preceding and unbounded following. We may first get in something like
PARTS: 1, 1, 2, 2 VALUES: 2, 3, 10, 9The output of processing this would result in a new column that would look like
MINS: 2, 2, 9, 9But we don't know if the group with 2 in PARTS is done or not. So the fixer saved the last value in MINS, which is a 9, and caches the batch. When the next batch shows up
PARTS: 2, 2, 3, 3 VALUES: 11, 5, 13, 14We generate the window result again and get
MINS: 5, 5, 13, 13And now we need to grab the first entry which is a 5 and update the cached data with another min. The cached data for PARTS=2 is now 5. We then need to go back and fix up all of the previous batches that had something to do with PARTS=2. The first batch will be pulled from the cache and updated to look like
PARTS: 1, 1, 2, 2 VALUES: 2, 3, 10, 9 MINS: 2, 2, 5, 5which can be output because we were able to fix up all of the PARTS in that batch. - case class BigIntRangeBoundaryValue(value: BigInt) extends RangeBoundaryValue with Product with Serializable
-
case class
BoundGpuWindowFunction(windowFunc: GpuWindowFunction, boundInputLocations: Array[Int]) extends Product with Serializable
The class represents a window function and the locations of its deduped inputs after an initial projection.
-
class
CountUnboundedToUnboundedFixer extends BatchedUnboundedToUnboundedWindowFixer
Fixes up a count operation for unbounded preceding to unbounded following
-
class
DenseRankFixer extends BatchedRunningWindowFixer with Logging
Fix up dense rank batches.
Fix up dense rank batches. A dense rank has no gaps in the rank values. The rank corresponds to the ordering columns(s) equality. So when a batch finishes and another starts that split can either be at the beginning of a new order by section or part way through one. If it is at the beginning, then like row number we want to just add in the previous value and go on. If it was part way through, then we want to add in the previous value minus 1. The minus one is to pick up where we left off. If anything is outside of a continues partition by group then we just keep those values unchanged.
- case class DoubleRangeBoundaryValue(value: Double) extends RangeBoundaryValue with Product with Serializable
-
abstract
class
FirstLastRunningWindowFixerBase extends BatchedRunningWindowFixer with Logging
Common base class for batched running window fixers for FIRST() and LAST() window functions.
Common base class for batched running window fixers for FIRST() and LAST() window functions. This mostly handles the checkpoint logic. The fixup logic is left to the concrete subclass.
- case class FirstPassAggResult(rideAlongColumns: SpillableColumnarBatch, aggResult: SpillableColumnarBatch) extends AutoCloseable with Product with Serializable
-
class
FirstRunningWindowFixer extends FirstLastRunningWindowFixerBase
Batched running window fixer for
FIRST()window functions.Batched running window fixer for
FIRST()window functions. Supports fixing for batched execution forROWSandRANGEbased window specifications. - class FixerPair extends AutoCloseable
-
trait
GpuAggregateWindowFunction extends Expression with GpuWindowFunction
GPU Counterpart of
AggregateWindowFunction.GPU Counterpart of
AggregateWindowFunction. On the CPU this would extendDeclarativeAggregateand use the provided methods to build up the expressions need to produce a result. For window operations we do it in a single pass, where all of the data is available so instead we have out own set of expressions. -
abstract
class
GpuBaseWindowExecMeta[WindowExecType <: SparkPlan] extends SparkPlanMeta[WindowExecType]
Base class for GPU Execs that implement window functions.
Base class for GPU Execs that implement window functions. This abstracts the method by which the window function's input expressions, partition specs, order-by specs, etc. are extracted from the specific WindowExecType.
- WindowExecType
The Exec class that implements window functions (E.g. o.a.s.sql.execution.window.WindowExec.)
- class GpuBatchedBoundedWindowExec extends GpuWindowExec
- class GpuBatchedBoundedWindowIterator extends Iterator[ColumnarBatch] with BasicWindowCalc with Logging
-
trait
GpuBatchedRunningWindowWithFixer extends AnyRef
For many operations a running window (unbounded preceding to current row) can process the data without dividing the data up into batches that contain all of the data for a given group by key set.
For many operations a running window (unbounded preceding to current row) can process the data without dividing the data up into batches that contain all of the data for a given group by key set. Instead we store a small amount of state from a previous result and use it to fix the final result. This is a memory optimization.
-
case class
GpuCachedDoublePassWindowExec(windowOps: Seq[NamedExpression], gpuPartitionSpec: Seq[Expression], gpuOrderSpec: Seq[SortOrder], child: SparkPlan)(cpuPartitionSpec: Seq[Expression], cpuOrderSpec: Seq[SortOrder]) extends SparkPlan with GpuWindowBaseExec with Product with Serializable
This allows for batches of data to be processed without needing them to correspond to the partition by boundaries.
This allows for batches of data to be processed without needing them to correspond to the partition by boundaries. This is similar to GpuRunningWindowExec, but for operations that need a small amount of information from all of the batches associated with a partition instead of just the previous batch. It does this by processing a batch, collecting and updating a small cache of information about the last partition in the batch, and then putting that batch into a form that would let it be spilled if needed. A batch is released when the last partition key in the batch is fully processed. Before it is released it will be updated to include any information needed from the cached data.
Currently this only works for unbounded to unbounded windows, but could be extended to more.
-
class
GpuCachedDoublePassWindowIterator extends Iterator[ColumnarBatch] with BasicWindowCalc
An iterator that can do aggregations on window queries that need a small amount of information from all of the batches to update the result in a second pass.
An iterator that can do aggregations on window queries that need a small amount of information from all of the batches to update the result in a second pass. It does this by having the aggregations be instances of GpuUnboundToUnboundWindowWithFixer which can fix up the window output for unbounded to unbounded windows. Because of this there is no requirement about how the input data is batched, but it must be sorted by both partitioning and ordering.
-
case class
GpuDenseRank(children: Seq[Expression]) extends Expression with GpuRunningWindowFunction with GpuBatchedRunningWindowWithFixer with Product with Serializable
Dense Rank is a special window operation where it is only supported as a running window.
Dense Rank is a special window operation where it is only supported as a running window. In cudf it is only supported as a scan and a group by scan.
- children
the order by columns.
- Note
this is a running window only operator
- case class GpuLag(input: Expression, offset: Expression, default: Expression) extends Expression with GpuOffsetWindowFunction with Product with Serializable
- case class GpuLead(input: Expression, offset: Expression, default: Expression) extends Expression with GpuOffsetWindowFunction with Product with Serializable
- trait GpuOffsetWindowFunction extends Expression with GpuAggregateWindowFunction
-
case class
GpuPercentRank(children: Seq[Expression]) extends Expression with GpuReplaceWindowFunction with Product with Serializable
percent_rank() is a running window function in that it only operates on a window of unbounded preceding to current row.
percent_rank() is a running window function in that it only operates on a window of unbounded preceding to current row. But the percent part actually makes it need a full count of the number of rows in the window. This is why we rewrite the operator to allow us to compute the result in a way that will not overflow memory.
-
case class
GpuRank(children: Seq[Expression]) extends Expression with GpuRunningWindowFunction with GpuBatchedRunningWindowWithFixer with ShimExpression with Product with Serializable
Rank is a special window operation where it is only supported as a running window.
Rank is a special window operation where it is only supported as a running window. In cudf it is only supported as a scan and a group by scan. But there are special requirements beyond that when doing the computation as a running batch. To fix up each batch it needs both the rank and the row number. To make this work and be efficient there is different behavior for batched running window vs non-batched. If it is for a running batch we include the row number values, in both the initial projections and in the corresponding aggregations. Then we combine them into a struct column in
scanCombinebefore it is passed on to theRankFixer. If it is not a running batch, then we drop the row number part because it is just not needed.- children
the order by columns.
- Note
this is a running window only operator.
-
trait
GpuReplaceWindowFunction extends Expression with GpuWindowFunction
This is a special window function that simply replaces itself with one or more window functions and other expressions that can be executed.
This is a special window function that simply replaces itself with one or more window functions and other expressions that can be executed. This allows you to write
GpuAveragein terms ofGpuSumandGpuCountwhich can both operate on all window optimizations makingGpuAveragebe able to do the same. -
case class
GpuRunningWindowExec(windowOps: Seq[NamedExpression], gpuPartitionSpec: Seq[Expression], gpuOrderSpec: Seq[SortOrder], child: SparkPlan)(cpuPartitionSpec: Seq[Expression], cpuOrderSpec: Seq[SortOrder]) extends SparkPlan with GpuWindowBaseExec with Product with Serializable
This allows for batches of data to be processed without needing them to correspond to the partition by boundaries, but only for window operations that are unbounded preceding to current row (Running Window).
This allows for batches of data to be processed without needing them to correspond to the partition by boundaries, but only for window operations that are unbounded preceding to current row (Running Window). This works because a small amount of data can be saved from a previous batch and used to update the current batch.
-
trait
GpuRunningWindowFunction extends Expression with GpuWindowFunction
A window function that is optimized for running windows using the cudf scan and group by scan operations.
A window function that is optimized for running windows using the cudf scan and group by scan operations. In some cases, like row number and rank, Spark only supports them as running window operations. This is why it directly extends GpuWindowFunction because it can be a stand alone window function. In all other cases it should be combined with GpuAggregateWindowFunction to provide a fully functional window operation. It should be noted that WindowExec tries to deduplicate input projections and aggregations to reduce memory usage. Because of tracking requirements it is required that there is a one to one relationship between an input projection and a corresponding aggregation.
-
class
GpuRunningWindowIterator extends GpuColumnarBatchIterator with BasicWindowCalc
An iterator that can do row based aggregations on running window queries (Unbounded preceding to current row) if and only if the aggregations are instances of GpuBatchedRunningWindowFunction which can fix up the window output when an aggregation is only partly done in one batch of data.
An iterator that can do row based aggregations on running window queries (Unbounded preceding to current row) if and only if the aggregations are instances of GpuBatchedRunningWindowFunction which can fix up the window output when an aggregation is only partly done in one batch of data. Because of this there is no requirement about how the input data is batched, but it must be sorted by both partitioning and ordering.
- case class GpuSpecialFrameBoundary(boundary: SpecialFrameBoundary) extends Expression with GpuExpression with ShimExpression with GpuUnevaluable with Product with Serializable
- case class GpuSpecifiedWindowFrame(frameType: FrameType, lower: Expression, upper: Expression) extends Expression with GpuWindowFrame with Product with Serializable
- abstract class GpuSpecifiedWindowFrameMetaBase extends ExprMeta[SpecifiedWindowFrame]
-
trait
GpuUnboundToUnboundWindowWithFixer extends AnyRef
For many window operations the results in earlier rows depends on the results from the last or later rows.
For many window operations the results in earlier rows depends on the results from the last or later rows. In many of these cases we chunk the data based off of the partition by groups and process the data at once. But this can lead to out of memory errors, or hitting the row limit on some columns. Doing two passes through the data where the first pass processes the data and a second pass fixes up the data can let us keep the data in the original batches and reduce total memory usage. But this requires that some of the batches be made spillable while we wait for the end of the partition by group.
Right now this is written to be specific to windows that are unbounded preceding to unbounded following, but it could be adapted to also work for current row to unbounded following, and possibly more situations.
- class GpuUnboundedToUnboundedAggFinalIterator extends Iterator[ColumnarBatch]
-
class
GpuUnboundedToUnboundedAggSliceBySizeIterator extends Iterator[SlicedBySize]
Try to slice the input batches into right sized output.
-
case class
GpuUnboundedToUnboundedAggStages(inputTypes: Seq[DataType], boundPartitionSpec: Seq[GpuExpression], boundRideAlong: Seq[GpuExpression], boundAggregations: Seq[GpuExpression], boundFinalProject: Seq[GpuExpression]) extends Serializable with Product
Holds the bound references for various aggregation stages
Holds the bound references for various aggregation stages
- boundRideAlong
used for a project that pulls out columns that are passing through unchanged.
- boundAggregations
aggregations to be done. NOTE THIS IS WIP
- boundFinalProject
the final project to get the output in the right order
-
case class
GpuUnboundedToUnboundedAggWindowExec(windowOps: Seq[NamedExpression], gpuPartitionSpec: Seq[Expression], gpuOrderSpec: Seq[SortOrder], child: SparkPlan)(cpuPartitionSpec: Seq[Expression], cpuOrderSpec: Seq[SortOrder], targetSizeBytes: Long) extends SparkPlan with GpuWindowBaseExec with Product with Serializable
This allows for batches of data to be processed without needing them to correspond to the partition by boundaries.
This allows for batches of data to be processed without needing them to correspond to the partition by boundaries. This is specifically for unbounded to unbounded window operations that can be replaced with an aggregation and then expanded out/joined with the original input data.
- class GpuUnboundedToUnboundedAggWindowFirstPassIterator extends Iterator[FirstPassAggResult]
- class GpuUnboundedToUnboundedAggWindowSecondPassIterator extends Iterator[SecondPassAggResult]
-
trait
GpuUnboundedToUnboundedWindowAgg extends Expression with GpuAggregateFunction
This is used to tag a GpuAggregateFunction that it has been tested to work properly with
GpuUnboundedToUnboundedAggWindowExec. - trait GpuWindowBaseExec extends SparkPlan with ShimUnaryExecNode with GpuExec
- case class GpuWindowExec(windowOps: Seq[NamedExpression], gpuPartitionSpec: Seq[Expression], gpuOrderSpec: Seq[SortOrder], child: SparkPlan)(cpuPartitionSpec: Seq[Expression], cpuOrderSpec: Seq[SortOrder]) extends SparkPlan with GpuWindowBaseExec with Product with Serializable
-
class
GpuWindowExecMeta extends GpuBaseWindowExecMeta[WindowExec]
Specialization of GpuBaseWindowExecMeta for org.apache.spark.sql.window.WindowExec.
Specialization of GpuBaseWindowExecMeta for org.apache.spark.sql.window.WindowExec. This class implements methods to extract the window-expressions, partition columns, order-by columns, etc. from WindowExec.
- case class GpuWindowExpression(windowFunction: Expression, windowSpec: GpuWindowSpecDefinition) extends Expression with GpuUnevaluable with ShimExpression with Product with Serializable
- abstract class GpuWindowExpressionMetaBase extends ExprMeta[WindowExpression]
- trait GpuWindowFrame extends Expression with GpuExpression with GpuUnevaluable with ShimExpression
- trait GpuWindowFunction extends Expression with GpuUnevaluable with ShimExpression
-
class
GpuWindowIterator extends Iterator[ColumnarBatch] with BasicWindowCalc
An Iterator that performs window operations on the input data.
An Iterator that performs window operations on the input data. It is required that the input data is batched so all of the data for a given key is in the same batch. The input data must also be sorted by both partition by keys and order by keys.
- case class GpuWindowSpecDefinition(partitionSpec: Seq[Expression], orderSpec: Seq[SortOrder], frameSpecification: GpuWindowFrame) extends Expression with GpuExpression with ShimExpression with GpuUnevaluable with Product with Serializable
- class GpuWindowSpecDefinitionMeta extends ExprMeta[WindowSpecDefinition]
-
class
GroupedAggregations extends AnyRef
Window aggregations that are grouped together.
Window aggregations that are grouped together. It holds the aggregation and the offsets of its input columns, along with the output columns it should write the result to.
-
class
LastRunningWindowFixer extends FirstLastRunningWindowFixerBase
Batched running window fixer for
LAST()window functions.Batched running window fixer for
LAST()window functions. Supports fixing for batched execution forROWSandRANGEbased window specifications. - case class LongRangeBoundaryValue(value: Long) extends RangeBoundaryValue with Product with Serializable
- case class ParsedBoundary(isUnbounded: Boolean, value: RangeBoundaryValue) extends Product with Serializable
-
class
PartitionedFirstPassAggResult extends AnyRef
Partitions the aggregation results from the first pass into two groups:
Partitions the aggregation results from the first pass into two groups:
- The aggregation results (and the corresponding rows in the ride-along column) belonging to the last group. This group is deemed currently incomplete, because the end of the group hasn't been encountered yet. 2. The aggregation results (and the corresponding rows in the ride-along column) belonging to all the preceding groups. All those groups are deemed complete. Note that PartitionedFirstPassAggResult is not constructed from FirstPassAggResult unless there are at least two distinct groups. (If there's only one group, it couldn't possibly be complete yet.)
- class PendingSecondAggResults extends Iterator[SlicedBySize] with AutoCloseable
-
abstract
class
RangeBoundaryValue extends AnyRef
Abstraction for possible range-boundary specifications.
Abstraction for possible range-boundary specifications.
This provides type disjunction for Long, BigInt and Double, the three types that might represent a range boundary.
-
class
RankFixer extends BatchedRunningWindowFixer with Logging
Rank is more complicated than DenseRank to fix.
Rank is more complicated than DenseRank to fix. This is because there are gaps in the rank values. The rank value of each group is row number of the first row in the group. So values in the same partition group but not the same ordering are fixed by adding the row number from the previous batch to them. If they are a part of the same ordering and part of the same partition, then we need to just put in the previous rank value.
Because we need both a rank and a row number to fix things up the input to this is a struct containing a rank column as the first entry and a row number column as the second entry. This happens in the
scanCombinemethod for GpuRank. It is a little ugly but it works to maintain the requirement that the input to the fixer is a single column. - case class SecondPassAggResult(rideAlongColumns: LinkedList[SpillableColumnarBatch], aggResult: SpillableColumnarBatch) extends AutoCloseable with Product with Serializable
- case class SlicedBySize(rideAlongColumns: SpillableColumnarBatch, aggResults: SpillableColumnarBatch) extends AutoCloseable with Product with Serializable
-
class
SumBinaryFixer extends BatchedRunningWindowFixer with Logging
This class fixes up batched running windows for sum.
This class fixes up batched running windows for sum. Sum is a lot like other binary op fixers, but it has to special case nulls and that is not super generic. In the future we might be able to make this more generic but we need to see what the use case really is.
Value Members
- object AggResultBatchConventions
- object DenseRankFixer
- object GpuBatchedWindowIteratorUtils
-
object
GpuRowNumber extends Expression with GpuRunningWindowFunction with GpuBatchedRunningWindowWithFixer with Product with Serializable
The row number in the window.
The row number in the window.
- Note
this is a running window only operator
-
object
GpuUnboundedToUnboundedAggWindowIterator
An iterator that can do unbounded to unbounded window aggregations as group by aggregations followed by an expand/join.
- object GpuUnspecifiedFrame extends Expression with GpuWindowFrame with Product with Serializable
- object GpuWindowExec extends Serializable
- object GpuWindowExecMeta
- object GroupedAggregations
- object PendingSecondAggResults
- object RangeBoundaryValue
- object RankFixer
-
object
TableAndBatchUtils
Utilities for conversion between SpillableColumnarBatch, ColumnarBatch, and cudf.Table.