package execution
- Alphabetic
- Public
- All
Type Members
-
class
BatchPartitionIdPassthrough extends Partitioner
A dummy partitioner for use with records whose partition ids have been pre-computed (i.e.
A dummy partitioner for use with records whose partition ids have been pre-computed (i.e. for use on RDDs of (Int, Row) pairs where the Int is a partition id in the expected range).
-
class
CoalescedBatchPartitioner extends Partitioner
A Partitioner that might group together one or more partitions from the parent.
-
class
CrossJoinIterator extends Iterator[ColumnarBatch] with Arm
An iterator that does a cross join against a stream of batches.
-
abstract
class
GpuBroadcastExchangeExecBase extends Exchange with GpuExec
In some versions of databricks we need to return the completionFuture in a different way.
- abstract class GpuBroadcastExchangeExecBaseWithFuture extends GpuBroadcastExchangeExecBase
- class GpuBroadcastMeta extends SparkPlanMeta[BroadcastExchangeExec]
- abstract class GpuBroadcastNestedLoopJoinExecBase extends SparkPlan with BinaryExecNode with GpuExec
- class GpuBroadcastNestedLoopJoinMeta extends GpuBroadcastJoinMeta[BroadcastNestedLoopJoinExec]
-
case class
GpuBroadcastToCpuExec(mode: BroadcastMode, child: SparkPlan) extends GpuBroadcastExchangeExecBaseWithFuture with Product with Serializable
This is a specialized version of GpuColumnarToRow that wraps a GpuBroadcastExchange and converts the columnar results containing cuDF tables into Spark rows so that the results can feed a CPU BroadcastHashJoin.
This is a specialized version of GpuColumnarToRow that wraps a GpuBroadcastExchange and converts the columnar results containing cuDF tables into Spark rows so that the results can feed a CPU BroadcastHashJoin. This is required for exchange reuse in AQE.
- mode
Broadcast mode
- child
Input to broadcast
- class GpuColumnToRowMapPartitionsRDD extends MapPartitionsRDD[InternalRow, ColumnarBatch]
-
case class
GpuCustomShuffleReaderExec(child: SparkPlan, partitionSpecs: Seq[ShufflePartitionSpec]) extends SparkPlan with UnaryExecNode with GpuExec with Product with Serializable
A wrapper of shuffle query stage, which follows the given partition arrangement.
A wrapper of shuffle query stage, which follows the given partition arrangement.
- child
It is usually
ShuffleQueryStageExec, but can be the shuffle exchange node during canonicalization.- partitionSpecs
The partition specs that defines the arrangement.
- trait GpuHashJoin extends SparkPlan with GpuExec
-
abstract
class
GpuShuffleExchangeExecBase extends Exchange with GpuExec
Performs a shuffle that will result in the desired partitioning.
-
abstract
class
GpuShuffleExchangeExecBaseWithMetrics extends GpuShuffleExchangeExecBase
Performs a shuffle that will result in the desired partitioning.
- class GpuShuffleMeta extends SparkPlanMeta[ShuffleExchangeExec]
-
class
HashJoinIterator extends Iterator[ColumnarBatch] with Arm with Logging
An iterator that does a hash join against a stream of batches.
-
class
SerializeBatchDeserializeHostBuffer extends Serializable with AutoCloseable
- Annotations
- @SerialVersionUID()
-
class
SerializeConcatHostBuffersDeserializeBatch extends Serializable with Arm with AutoCloseable
- Annotations
- @SerialVersionUID()
-
class
ShuffledBatchRDD extends RDD[ColumnarBatch]
This is a specialized version of
org.apache.spark.rdd.ShuffledRDDthat is optimized for shufflingColumnarBatchinstead of Java key-value pairs.This is a specialized version of
org.apache.spark.rdd.ShuffledRDDthat is optimized for shufflingColumnarBatchinstead of Java key-value pairs.This RDD takes a
ShuffleDependency(dependency), and an optional array of partition start indices as input arguments (specifiedPartitionStartIndices).The
dependencyhas the parent RDD of this RDD, which represents the dataset before shuffle (i.e. map output). Elements of this RDD are (partitionId, Row) pairs. Partition ids should be in the range [0, numPartitions - 1].dependency.partitioneris the original partitioner used to partition map output, anddependency.partitioner.numPartitionsis the number of pre-shuffle partitions (i.e. the number of partitions of the map output).When
specifiedPartitionStartIndicesis defined,specifiedPartitionStartIndices.lengthwill be the number of post-shuffle partitions. For this case, theith post-shuffle partition includesspecifiedPartitionStartIndices[i]tospecifiedPartitionStartIndices[i+1] - 1(inclusive).When
specifiedPartitionStartIndicesis not defined, there will bedependency.partitioner.numPartitionspost-shuffle partitions. For this case, a post-shuffle partition is created for every pre-shuffle partition. - case class ShuffledBatchRDDPartition(index: Int, spec: ShufflePartitionSpec) extends Partition with Product with Serializable
Value Members
- object GpuBroadcastExchangeExec
- object GpuBroadcastNestedLoopJoinExecBase extends Arm with Serializable
- object GpuColumnToRowMapPartitionsRDD extends Serializable
- object GpuHashJoin extends Arm with Serializable
- object GpuShuffleExchangeExec
-
object
InternalColumnarRddConverter extends Logging
Please don't use this class directly use com.nvidia.spark.rapids.ColumnarRdd instead.
Please don't use this class directly use com.nvidia.spark.rapids.ColumnarRdd instead. We had to place the implementation in a spark specific package to poke at the internals of spark more than anyone should know about.
This provides a way to get back out GPU Columnar data RDD[Table]. Each Table will have the same schema as the dataframe passed in. If the schema of the dataframe is something that Rapids does not currently support an
IllegalArgumentExceptionwill be thrown.The size of each table will be determined by what is producing that table but typically will be about the number of bytes set by
RapidsConf.GPU_BATCH_SIZE_BYTES.Table is not a typical thing in an RDD so special care needs to be taken when working with it. By default it is not serializable so repartitioning the RDD or any other operator that involves a shuffle will not work. This is because it is very expensive to serialize and deserialize a GPU Table using a conventional spark shuffle. Also most of the memory associated with the Table is on the GPU itself, so each table must be closed when it is no longer needed to avoid running out of GPU memory. By convention it is the responsibility of the one consuming the data to close it when they no longer need it.
- object JoinTypeChecks
- object TrampolineUtil