Packages

package execution

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. abstract class BaseHashJoinIterator extends SplittableJoinIterator
  2. abstract class BaseSubHashJoinIterator extends Iterator[ColumnarBatch] with TaskAutoCloseableResource

    Base class for joins by sub-partitioning algorithm

  3. class BatchPartitionIdPassthrough extends Partitioner

    A dummy partitioner for use with records whose partition ids have been pre-computed (i.e.

    A dummy partitioner for use with records whose partition ids have been pre-computed (i.e. for use on RDDs of (Int, Row) pairs where the Int is a partition id in the expected range).

  4. class CoalescedBatchPartitioner extends Partitioner

    A Partitioner that might group together one or more partitions from the parent.

  5. class ConditionalHashJoinIterator extends BaseHashJoinIterator

    An iterator that does a hash join against a stream of batches with an inequality condition.

    An iterator that does a hash join against a stream of batches with an inequality condition. The compiled condition will be closed when this iterator is closed.

  6. class ConditionalNestedLoopExistenceJoinIterator extends ExistenceJoinIterator
  7. class ConditionalNestedLoopJoinIterator extends SplittableJoinIterator
  8. class CrossJoinIterator extends AbstractGpuJoinIterator

    An iterator that does a cross join against a stream of batches.

  9. class EmptyOuterNestedLoopJoinIterator extends Iterator[ColumnarBatch]

    Iterator for producing batches from an outer join where the build-side table is empty.

  10. abstract class ExistenceJoinIterator extends Iterator[ColumnarBatch] with TaskAutoCloseableResource

    Existence join generates an exists boolean column with true or false in it, then appends it to the output columns.

    Existence join generates an exists boolean column with true or false in it, then appends it to the output columns. The true in exists column indicates left table should retain that row, the row number of exists equals to the row number of left table.

    e.g.: select * from left_table where left_table.column_0 >= 3 or exists (select * from right_table where left_table.column_1 < right_table.column_1) Explanation of this sql is: Filter(left_table.column_0 >= 3 or `exists`) Existence_join (left + `exists`) // `exists` do not shrink or expand the rows of left table left_table right_table

  11. class GpuBatchSubPartitionIterator extends Iterator[(Seq[Int], Seq[SpillableColumnarBatch])] with Logging

    Iterate all the partitions in the input "batchSubPartitioner," and each call to "next()" will return one or multiple parts as a Seq of "SpillableColumnarBatch", or None for an empty partition, along with its partition id(s).

  12. class GpuBatchSubPartitioner extends AutoCloseable with Logging

    Drain the batches in the input iterator and partition each batch into smaller parts.

    Drain the batches in the input iterator and partition each batch into smaller parts. It assumes all the batches are on GPU. e.g. There are two batches as below (2,4,6), (6,8,8), and split them into 6 partitions. The result will be 0 -> (2) 1 -> (6), (6) (two separate not merged sub batches) 2 -> empty 3 -> (4) 4 -> empty 5 -> (8, 8) (A single batch)

  13. case class GpuBroadcastExchangeExec(mode: BroadcastMode, child: SparkPlan)(cpuCanonical: BroadcastExchangeExec) extends GpuBroadcastExchangeExecBase with Product with Serializable
  14. abstract class GpuBroadcastExchangeExecBase extends Exchange with ShimBroadcastExchangeLike with ShimUnaryExecNode with GpuExec
  15. case class GpuBroadcastHashJoinExec(leftKeys: Seq[Expression], rightKeys: Seq[Expression], joinType: JoinType, buildSide: GpuBuildSide, condition: Option[Expression], left: SparkPlan, right: SparkPlan) extends GpuBroadcastHashJoinExecBase with Product with Serializable
  16. abstract class GpuBroadcastHashJoinExecBase extends SparkPlan with ShimBinaryExecNode with GpuHashJoin
  17. class GpuBroadcastHashJoinMeta extends GpuBroadcastHashJoinMetaBase
  18. abstract class GpuBroadcastHashJoinMetaBase extends GpuBroadcastJoinMeta[BroadcastHashJoinExec]
  19. class GpuBroadcastMeta extends SparkPlanMeta[BroadcastExchangeExec] with Logging
  20. case class GpuBroadcastNestedLoopJoinExec(left: SparkPlan, right: SparkPlan, joinType: JoinType, gpuBuildSide: GpuBuildSide, condition: Option[Expression], postBuildCondition: List[NamedExpression], targetSizeBytes: Long) extends GpuBroadcastNestedLoopJoinExecBase with Product with Serializable
  21. abstract class GpuBroadcastNestedLoopJoinExecBase extends SparkPlan with ShimBinaryExecNode with GpuExec
  22. class GpuBroadcastNestedLoopJoinMeta extends GpuBroadcastNestedLoopJoinMetaBase
  23. abstract class GpuBroadcastNestedLoopJoinMetaBase extends GpuBroadcastJoinMeta[BroadcastNestedLoopJoinExec]
  24. case class GpuBroadcastToRowExec(buildKeys: Seq[Expression], broadcastMode: BroadcastMode, child: SparkPlan) extends Exchange with ShimBroadcastExchangeLike with ShimUnaryExecNode with GpuExec with Logging with Product with Serializable
  25. class GpuColumnToRowMapPartitionsRDD extends MapPartitionsRDD[InternalRow, ColumnarBatch]
  26. case class GpuCustomShuffleReaderExec(child: SparkPlan, partitionSpecs: Seq[ShufflePartitionSpec]) extends SparkPlan with ShimUnaryExecNode with GpuExec with Product with Serializable

    A wrapper of shuffle query stage, which follows the given partition arrangement.

    A wrapper of shuffle query stage, which follows the given partition arrangement.

    child

    It is usually ShuffleQueryStageExec, but can be the shuffle exchange node during canonicalization.

    partitionSpecs

    The partition specs that defines the arrangement.

  27. trait GpuHashJoin extends SparkPlan with GpuJoinExec
  28. trait GpuJoinExec extends SparkPlan with ShimBinaryExecNode with GpuExec
  29. abstract class GpuShuffleExchangeExecBase extends Exchange with ShimUnaryExecNode with GpuExec

    Performs a shuffle that will result in the desired partitioning.

  30. abstract class GpuShuffleExchangeExecBaseWithMetrics extends GpuShuffleExchangeExecBase

    Performs a shuffle that will result in the desired partitioning.

  31. class GpuShuffleMeta extends GpuShuffleMetaBase
  32. abstract class GpuShuffleMetaBase extends SparkPlanMeta[ShuffleExchangeExec]
  33. trait GpuSubPartitionHashJoin extends Logging
  34. class GpuSubPartitionPairIterator extends Iterator[PartitionPair] with AutoCloseable

    Iterator that returns a pair of batches (build side, stream side) with the same key set generated by sub-partitioning algorithm when each call to "next".

    Iterator that returns a pair of batches (build side, stream side) with the same key set generated by sub-partitioning algorithm when each call to "next". Each pair may have data from one or multiple partitions. And for build side, batches are concatenated into a single one.

    It will skip the empty pairs by default. Set "skipEmptyPairs" to false to also get the empty pairs.

  35. case class GpuSubqueryBroadcastExec(name: String, indices: Seq[Int], buildKeys: Seq[Expression], child: SparkPlan)(modeKeys: Option[Seq[Expression]]) extends BaseSubqueryExec with ShimBaseSubqueryExec with GpuExec with ShimUnaryExecNode with Product with Serializable
  36. class GpuSubqueryBroadcastMeta extends GpuSubqueryBroadcastMetaBase
  37. abstract class GpuSubqueryBroadcastMetaBase extends SparkPlanMeta[SubqueryBroadcastExec]
  38. class HashJoinIterator extends BaseHashJoinIterator

    An iterator that does a hash join against a stream of batches.

  39. class HashJoinStreamSideIterator extends BaseHashJoinIterator

    An iterator that does the stream-side only of a hash join.

    An iterator that does the stream-side only of a hash join. Using full join as an example, it performs the left or right outer join for the stream side's view of a full outer join. As the join is performed, the build-side rows that are referenced during the join are tracked and can be retrieved after the iteration has completed to assist in performing the anti-join needed to produce the final results needed for the full outer join.

  40. class HashOuterJoinIterator extends Iterator[ColumnarBatch] with TaskAutoCloseableResource

    An iterator that does a hash outer join against a stream of batches where either the join type is a full outer join or the join type is a left or right outer join and the build side matches the outer join side.

    An iterator that does a hash outer join against a stream of batches where either the join type is a full outer join or the join type is a left or right outer join and the build side matches the outer join side. It does this by doing a subset of the original join (e.g.: left outer for a full outer join) and keeping track of the hits on the build side. It then produces a final batch of all the build side rows that were not already included.

  41. class HashedExistenceJoinIterator extends ExistenceJoinIterator
  42. case class JoinBuildSideStats(streamMagnificationFactor: Double, isDistinct: Boolean) extends Product with Serializable

    Class to hold statistics on the build-side batch of a hash join.

    Class to hold statistics on the build-side batch of a hash join.

    streamMagnificationFactor

    estimated magnification of a stream batch during join

    isDistinct

    true if all build-side join keys are distinct

  43. class PartitionPair extends AutoCloseable

    An utils class that will take over the two input resources representing data from build side and stream side separately.

    An utils class that will take over the two input resources representing data from build side and stream side separately. build data is passed in as a Option of a single batch, while stream data is as a Seq of batches.

  44. class RapidsAnalysisException extends AnalysisException

    This class is to only be used to throw errors specific to the RAPIDS Accelerator or errors mirroring Spark where a raw AnalysisException is thrown directly rather than via an error utility class (this should be rare).

  45. class SerializeBatchDeserializeHostBuffer extends Serializable with AutoCloseable

    Object used for executors to serialize a result for their partition that will be collected on the driver to be broadcasted out as part of the exchange.

    Object used for executors to serialize a result for their partition that will be collected on the driver to be broadcasted out as part of the exchange.

    Annotations
    @SerialVersionUID()
  46. class SerializeConcatHostBuffersDeserializeBatch extends Serializable with Logging

    Class that is used to broadcast results (a contiguous host batch) to executors.

    Class that is used to broadcast results (a contiguous host batch) to executors.

    This is instantiated in the driver, serialized to an output stream provided by Spark to broadcast, and deserialized on the executor. Both the driver's and executor's copies are cleaned via GC. Because Spark closes AutoCloseable broadcast results after spilling to disk, this class does not subclass AutoCloseable. Instead we implement a closeInternal method only to be triggered via GC.

    Annotations
    @SerialVersionUID()
  47. class ShuffledBatchRDD extends RDD[ColumnarBatch]

    This is a specialized version of org.apache.spark.rdd.ShuffledRDD that is optimized for shuffling ColumnarBatch instead of Java key-value pairs.

    This is a specialized version of org.apache.spark.rdd.ShuffledRDD that is optimized for shuffling ColumnarBatch instead of Java key-value pairs.

    This RDD takes a ShuffleDependency (dependency), and an array of ShufflePartitionSpec as input arguments.

    The dependency has the parent RDD of this RDD, which represents the dataset before shuffle (i.e. map output). Elements of this RDD are (partitionId, Row) pairs. Partition ids should be in the range [0, numPartitions - 1]. dependency.partitioner is the original partitioner used to partition map output, and dependency.partitioner.numPartitions is the number of pre-shuffle partitions (i.e. the number of partitions of the map output).

    This code is made to try and match the Spark code as closely as possible to make maintenance simpler. Fixing compiler or IDE warnings in this code may not be ideal if the same warnings are in Spark.

  48. case class ShuffledBatchRDDPartition(index: Int, spec: ShufflePartitionSpec) extends Partition with Product with Serializable

Value Members

  1. object ExchangeMappingCache extends Logging

    Caches the mappings from canonical CPU exchanges to the GPU exchanges that replaced them

  2. object GpuBroadcastExchangeExecBase extends Serializable
  3. object GpuBroadcastHelper
  4. object GpuBroadcastNestedLoopJoinExecBase extends Serializable
  5. object GpuBroadcastToRowExec extends Serializable
  6. object GpuColumnToRowMapPartitionsRDD extends Serializable
  7. object GpuHashJoin extends Serializable
  8. object GpuShuffleExchangeExecBase extends Serializable
  9. object GpuShuffleMetaBase
  10. object GpuSubPartitionHashJoin
  11. object GpuSubqueryBroadcastExec extends Serializable
  12. object InternalColumnarRddConverter extends Logging

    Please don't use this class directly use com.nvidia.spark.rapids.ColumnarRdd instead.

    Please don't use this class directly use com.nvidia.spark.rapids.ColumnarRdd instead. We had to place the implementation in a spark specific package to poke at the internals of spark more than anyone should know about.

    This provides a way to get back out GPU Columnar data RDD[Table]. Each Table will have the same schema as the dataframe passed in. If the schema of the dataframe is something that Rapids does not currently support an IllegalArgumentException will be thrown.

    The size of each table will be determined by what is producing that table but typically will be about the number of bytes set by RapidsConf.GPU_BATCH_SIZE_BYTES.

    Table is not a typical thing in an RDD so special care needs to be taken when working with it. By default it is not serializable so repartitioning the RDD or any other operator that involves a shuffle will not work. This is because it is very expensive to serialize and deserialize a GPU Table using a conventional spark shuffle. Also most of the memory associated with the Table is on the GPU itself, so each table must be closed when it is no longer needed to avoid running out of GPU memory. By convention it is the responsibility of the one consuming the data to close it when they no longer need it.

  13. object JoinBuildSideStats extends Serializable
  14. object JoinTypeChecks
  15. object SerializedHostTableUtils
  16. object ShimTrampolineUtil
  17. object TrampolineUtil

Ungrouped