Packages

package rapids

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. abstract class AbstractGpuCoalesceIterator extends Iterator[ColumnarBatch] with Logging
  2. abstract class AbstractGpuJoinIterator extends Iterator[ColumnarBatch] with TaskAutoCloseableResource

    Base class for iterators producing the results of a join.

  3. abstract class AbstractHostByteBufferIterator extends Iterator[ByteBuffer]
  4. abstract class AbstractProjectSplitIterator extends Iterator[ColumnarBatch]

    An iterator that is intended to split the input to or output of a project on rows.

    An iterator that is intended to split the input to or output of a project on rows. In practice this is only used for splitting the input prior to a project in some very special cases. If the projected size of the output is so large that it would risk us not being able to split it later on if we ran into trouble.

  5. class AcceleratedColumnarToRowIterator extends Iterator[InternalRow] with Serializable

    An iterator that uses the GPU for columnar to row conversion of fixed width types.

  6. case class AcquireFailed(numWaitingTasks: Int) extends TryAcquireResult with Product with Serializable

    To acquire the semaphore this thread would have to block.

    To acquire the semaphore this thread would have to block.

    numWaitingTasks

    the number of tasks waiting at the time the request was made. Note that this can change very quickly.

  7. class AdaptiveSparkPlanHelperImpl extends AdaptiveSparkPlanHelperShim with AdaptiveSparkPlanHelper
  8. abstract class AggExprMeta[INPUT <: AggregateFunction] extends ExprMeta[INPUT]

    Base class for metadata around AggregateFunction.

  9. class AggHelper extends Serializable

    Internal class used in computeAggregates for the pre, agg, and post steps

  10. case class AggregateModeInfo(uniqueModes: Seq[AggregateMode], hasPartialMode: Boolean, hasPartialMergeMode: Boolean, hasFinalMode: Boolean, hasCompleteMode: Boolean) extends Product with Serializable

    Utility class to convey information on the aggregation modes being used

  11. case class AllowSpillOnlyLazySpillableColumnarBatchImpl(wrapped: LazySpillableColumnarBatch) extends LazySpillableColumnarBatch with Product with Serializable

    A version of LazySpillableColumnarBatch where instead of closing the underlying batch it is only spilled.

    A version of LazySpillableColumnarBatch where instead of closing the underlying batch it is only spilled. This is used for cases, like with a streaming hash join where the data itself needs to out live the JoinGatherer it is handed off to.

  12. class AlluxioConfigReader extends AnyRef

    Alluxio master address and port reader.

    Alluxio master address and port reader. It reads from /opt/alluxio/conf/alluxio-site.properties

  13. class AlluxioFS extends AnyRef

    interfaces for Alluxio file system.

    interfaces for Alluxio file system. Currently contains interfaces: get mount points mount

  14. class AppendDataExecV1Meta extends SparkPlanMeta[AppendDataExecV1] with HasCustomTaggingData
  15. trait ArmScalaSpecificImpl extends AnyRef

    Implementation of the automatic-resource-management pattern

  16. class AtomicCreateTableAsSelectExecMeta extends SparkPlanMeta[AtomicCreateTableAsSelectExec]
  17. class AtomicReplaceTableAsSelectExecMeta extends SparkPlanMeta[AtomicReplaceTableAsSelectExec]
  18. class AutoCloseColumnBatchIterator[U] extends Iterator[ColumnarBatch]

    For columnar code on the CPU it is the responsibility of the SparkPlan exec that creates a ColumnarBatch to close it.

    For columnar code on the CPU it is the responsibility of the SparkPlan exec that creates a ColumnarBatch to close it. In the case of code running on the GPU that would waste too much memory, so it is the responsibility of the code receiving the batch to close it, when it is not longer needed.

    This class provides a simple way for CPU batch code to be sure that a batch gets closed. If your code is executing on the GPU do not use this class.

  19. case class AutoCloseableTargetSize(targetSize: Long, minSize: Long) extends AutoCloseable with Product with Serializable

    This is a wrapper that turns a target size into an autocloseable to allow it to be used in withRetry blocks.

    This is a wrapper that turns a target size into an autocloseable to allow it to be used in withRetry blocks. It is intended to be used to help with cases where the split calculation happens inside the retry block, and depends on the target size. On a GpuSplitAndRetryOOM or CpuSplitAndRetryOOM, a split policy like splitTargetSizeInHalfGpu or splitTargetSizeInHalfCpu can be used to retry the block with a smaller target size.

  20. case class AvoidTransition[INPUT <: SparkPlan](plan: SparkPlanMeta[INPUT]) extends Optimization with Product with Serializable
  21. class AvroDataFileReader extends AvroFileReader

    AvroDataFileReader reads the Avro file data in the iterator pattern.

    AvroDataFileReader reads the Avro file data in the iterator pattern. You can use it as below. while(reader.hasNextBlock) { val b = reader.peekBlock estimateBufSize(b) // allocate the batch buffer reader.readNextRawBlock(buffer_as_out_stream) }

  22. abstract class AvroFileReader extends AutoCloseable

    The parent of the Rapids Avro file readers

  23. class AvroFileWriter extends AnyRef

    AvroDataWriter, used to write a avro file header to the output stream.

  24. class AvroMetaFileReader extends AvroFileReader

    AvroMetaFileReader collects the blocks' information from the Avro file without reading the block data.

  25. trait AvroProvider extends AnyRef
  26. abstract class BaseCrossJoinGatherMap extends LazySpillableGatherMap
  27. abstract class BaseExprMeta[INPUT <: Expression] extends RapidsMeta[INPUT, Expression, Expression]

    Base class for metadata around Expression.

  28. class BatchContext extends AnyRef

    A context lives during the whole process of reading partitioned files to a batch buffer (aka HostMemoryBuffer) to build a memory file.

    A context lives during the whole process of reading partitioned files to a batch buffer (aka HostMemoryBuffer) to build a memory file. Children can extend this to add more necessary fields.

  29. class BatchToGenerate extends AutoCloseable
  30. case class BatchWithPartitionData(inputBatch: SpillableColumnarBatch, partitionedRowsData: Array[PartitionRowData], partitionSchema: StructType) extends AutoCloseable with Product with Serializable

    Class to wrap columnar batch and partition rows data and utility functions to merge them.

    Class to wrap columnar batch and partition rows data and utility functions to merge them.

    inputBatch

    Input ColumnarBatch.

    partitionedRowsData

    Array of PartitionRowData, where each entry contains an InternalRow and a row number pair. These pairs specify how many rows to replicate the partition value.

    partitionSchema

    Schema of the partitioned data.

  31. class BatchWithPartitionDataIterator extends GpuColumnarBatchIterator

    An iterator that provides ColumnarBatch instances by merging existing batches with partition columns.

    An iterator that provides ColumnarBatch instances by merging existing batches with partition columns. Each partition value column added is within the CUDF column size limit. It uses withRetry to support retry framework and may spill data if needed.

  32. abstract class BatchedBufferDecompressor extends AutoCloseable with Logging

    Base class for batched decompressors

  33. case class BatchedByKey(gpuOrder: Seq[SortOrder])(cpuOrder: Seq[SortOrder]) extends CoalesceGoal with Product with Serializable

    Split the data into batches where a set of keys are all within a single batch.

    Split the data into batches where a set of keys are all within a single batch. This is generally used for things like a window operation or a sort based aggregation where you want all of the keys for a given operation to be available so the GPU can produce a correct answer. There is no limit on the target size so if there is a lot of data skew for a key, the batch may still run into limits on set by Spark or cudf. It should be noted that it is required that a node in the Spark plan that requires this should also require an input ordering that satisfies this ordering as well.

    gpuOrder

    the GPU keys that should be used for batching.

    cpuOrder

    the CPU keys that should be used for batching.

  34. class BatchedCopyCompressor extends BatchedTableCompressor
  35. class BatchedCopyDecompressor extends BatchedBufferDecompressor
  36. class BatchedNvcompLZ4Compressor extends BatchedTableCompressor
  37. class BatchedNvcompLZ4Decompressor extends BatchedBufferDecompressor
  38. class BatchedNvcompZSTDCompressor extends BatchedTableCompressor
  39. class BatchedNvcompZSTDDecompressor extends BatchedBufferDecompressor
  40. abstract class BatchedTableCompressor extends AutoCloseable with Logging

    Base class for batched compressors

  41. case class BatchesToCoalesce(batches: Array[SpillableColumnarBatch]) extends AutoCloseable with Product with Serializable

    A helper class that contains a sequence of SpillableColumnarBatch and that can be used to split the sequence into two.

    A helper class that contains a sequence of SpillableColumnarBatch and that can be used to split the sequence into two. This class is auto closeable, as it is sent to code that will close it, and in turn close the SpillableColumnarBatch instances in batches

    batches

    a sequence of SpillableColumnarBatch to manage.

  42. class BigSizedJoinIterator extends Iterator[ColumnarBatch] with TaskAutoCloseableResource

    Iterator that produces the result of a large symmetric join where the build side of the join is too large for a single GPU batch.

    Iterator that produces the result of a large symmetric join where the build side of the join is too large for a single GPU batch. The prior join input probing phase has sized the build side of the join, so this partitions both the build side and stream side into N+1 partitions, where N is the size of the build side divided by the target GPU batch size.

    Once the build side is partitioned completely, the partitions are placed into "join groups" where all the build side data of a join group fits in the GPU target batch size. If the input data is skewed, a single build partition could be larger than the target GPU batch size. Currently such oversized partitions are placed in separate join groups consisting just of one partition each in the hopes that there will be enough GPU memory to proceed with the join despite the skew. We will need to revisit this for very large, skewed build side data arriving at a single task.

    Once the build side join groups are identified, each stream batch is partitioned into the same number of partitions as the build side with the same hash key used for the build side. The partitions from the batch are grouped into join groups matching the partition grouping from the build side, and each join group is processed as a sub-join. Once all the join groups for a stream batch have been processed, the next stream batch is fetched, partitioned, and sub-joins are processed against the build side join groups. Repeat until the stream side is exhausted.

  43. abstract class BinaryAstExprMeta[INPUT <: BinaryExpression] extends BinaryExprMeta[INPUT]

    Base metadata class for binary expressions that support conversion to AST

  44. abstract class BinaryExprMeta[INPUT <: BinaryExpression] extends ExprMeta[INPUT]

    Base class for metadata around BinaryExpression.

  45. case class BlockInfo(blockStart: Long, blockSize: Long, dataSize: Long, count: Long) extends Product with Serializable

    The each Avro block information

    The each Avro block information

    blockStart

    the start of block

    blockSize

    the whole block size = the size between two sync buffers + sync buffer

    dataSize

    the block data size

    count

    how many entries in this block

  46. case class BoundExpressionsModeAggregates(boundFinalProjections: Option[Seq[GpuExpression]], boundResultReferences: Seq[Expression]) extends Product with Serializable
  47. case class BufferSpill(spillBuffer: RapidsBuffer, newBuffer: Option[RapidsBuffer]) extends SpillAction with Product with Serializable
  48. case class BufferUnspill(spillBuffer: RapidsBuffer, newBuffer: Option[RapidsBuffer]) extends SpillAction with Product with Serializable
  49. class BuildSidePartitioner extends JoinPartitioner

    Join partitioner for the build side of a large join where the build side of the join does not fit in a single GPU batch.

  50. class ByteArrayInputFile extends InputFile
  51. class CSVPartitionReader extends CSVPartitionReaderBase[HostLineBufferer, HostLineBuffererFactory.type]
  52. abstract class CSVPartitionReaderBase[BUFF <: LineBufferer, FACT <: LineBuffererFactory[BUFF]] extends GpuTextBasedPartitionReader[BUFF, FACT]
  53. class CachedGpuBatchIterator extends GpuColumnarBatchIterator
  54. class CastChecks extends ExprChecks
  55. final class CastExprMeta[INPUT <: UnaryLike[Expression] with TimeZoneAwareExpression with NullIntolerant] extends CastExprMetaBase[INPUT]

    Meta-data for cast and ansi_cast.

  56. abstract class CastExprMetaBase[INPUT <: UnaryLike[Expression] with TimeZoneAwareExpression] extends UnaryExprMeta[INPUT]

    Meta-data for cast, ansi_cast and ToPrettyString

  57. class CastOptions extends Serializable

    This class is used to encapsulate parameters to use to help determine how to cast

  58. class ChunkedPacker extends Iterator[MemoryBuffer] with Logging with AutoCloseable

    ChunkedPacker is an Iterator that uses a cudf::chunked_pack to copy a cuDF Table to a target buffer in chunks.

    ChunkedPacker is an Iterator that uses a cudf::chunked_pack to copy a cuDF Table to a target buffer in chunks.

    Each chunk is sized at most bounceBuffer.getLength, and the caller should cudaMemcpy bytes from bounceBuffer to a target buffer after each call to next().

    Note

    ChunkedPacker must be closed by the caller as it has GPU and host resources associated with it.

  59. class CloseableBufferedIterator[T <: AutoCloseable] extends BufferedIterator[T] with AutoCloseable

    Helper iterator that wraps an Iterator of AutoCloseable subclasses.

    Helper iterator that wraps an Iterator of AutoCloseable subclasses. This iterator also implements AutoCloseable, so it can be closed in case of exceptions and when close is called on it, its buffered item will be closed as well.

    T

    an AutoCloseable subclass

  60. class CloseableHolder[T <: AutoCloseable] extends AnyRef
  61. sealed abstract class CoalesceGoal extends Expression with GpuUnevaluable with ShimExpression

    Provides a goal for batching of data.

  62. sealed abstract class CoalesceSizeGoal extends CoalesceGoal
  63. class CollectTimeIterator[T] extends Iterator[T]
  64. class ColumnarCopyHelper extends AnyRef

    A helper class which efficiently transfers different types of host columnar data into cuDF.

    A helper class which efficiently transfers different types of host columnar data into cuDF. It is written in Java for two reasons: 1. Scala for-loop is slower (Scala while-loop is identical to Java loop) 2. Both ColumnBuilder and ColumnVector are Java classes

  65. trait ColumnarFileFormat extends AnyRef

    Used to write columnar data to files.

  66. abstract class ColumnarOutputWriter extends HostBufferConsumer

    This is used to write columnar data to a file system.

    This is used to write columnar data to a file system. Subclasses of ColumnarOutputWriter must provide a zero-argument constructor. This is the columnar version of org.apache.spark.sql.execution.datasources.OutputWriter.

  67. abstract class ColumnarOutputWriterFactory extends Serializable

    A factory that produces ColumnarOutputWriters.

    A factory that produces ColumnarOutputWriters. A new ColumnarOutputWriterFactory is created on the driver side, and then gets serialized to executor side to create ColumnarOutputWriters. This is the columnar version of org.apache.spark.sql.execution.datasources.OutputWriterFactory.

  68. case class ColumnarOverrideRules() extends ColumnarRule with Logging with Product with Serializable
  69. class ColumnarPartitionReaderWithPartitionValues extends PartitionReader[ColumnarBatch]

    A wrapper reader that always appends partition values to the ColumnarBatch produced by the input reader fileReader.

    A wrapper reader that always appends partition values to the ColumnarBatch produced by the input reader fileReader. Each scalar value is splatted to a column with the same number of rows as the batch returned by the reader.

  70. class ColumnarToRowIterator extends Iterator[InternalRow] with AutoCloseable

    ColumnarToRowIterator converts GPU ColumnarBatches to CPU InternalRows.

    ColumnarToRowIterator converts GPU ColumnarBatches to CPU InternalRows.

    Note

    releaseSemaphore = true (default) should only be used in cases where we are sure that no GPU memory is left unaccounted for (not spillable). One notable case where releaseSemaphore is false is when used in GpuUserDefinedFunction, which is evaluated as part of a projection, that may or may not include other GPU columns.

  71. case class CombineConf(combineThresholdSize: Long, combineWaitTime: Int) extends Product with Serializable
  72. abstract class ComplexTypeMergingExprMeta[INPUT <: ComplexTypeMergingExpression] extends ExprMeta[INPUT]

    Base class for metadata around ComplexTypeMergingExpression.

  73. case class CompressedTable(compressedSize: Long, meta: TableMeta, buffer: DeviceMemoryBuffer) extends AutoCloseable with Product with Serializable

    Compressed table descriptor

    Compressed table descriptor

    compressedSize

    size of the compressed data in bytes

    meta

    metadata describing the table layout when uncompressed

    buffer

    buffer containing the compressed data

  74. class ConfBuilder extends AnyRef
  75. abstract class ConfEntry[T] extends AnyRef
  76. class ConfEntryWithDefault[T] extends ConfEntry[T]
  77. case class ContextChecks(outputCheck: TypeSig, sparkOutputSig: TypeSig, paramCheck: Seq[ParamCheck] = Seq.empty, repeatingParamCheck: Option[RepeatingParamCheck] = None) extends TypeChecks[Map[String, SupportLevel]] with Product with Serializable

    Checks an expression that have input parameters and a single output.

    Checks an expression that have input parameters and a single output. This is intended to be given for a specific ExpressionContext. If your expression does not meet this pattern you may need to create a custom ExprChecks instance.

  78. class CopyCompressionCodec extends TableCompressionCodec

    A table compression codec used only for testing that copies the data.

  79. class CostBasedOptimizer extends Optimizer with Logging

    Experimental cost-based optimizer that aims to avoid moving sections of the plan to the GPU when it would be better to keep that part of the plan on the CPU.

    Experimental cost-based optimizer that aims to avoid moving sections of the plan to the GPU when it would be better to keep that part of the plan on the CPU. For example, we don't want to move data to the GPU just for a trivial projection and then have to move data back to the CPU on the next step.

  80. trait CostModel extends AnyRef

    The cost model is behind a trait so that we can consider making this pluggable in the future so that users can override the cost model to suit specific use cases.

  81. class CpuCostModel extends CostModel
  82. abstract class CreatableRelationProviderMeta[INPUT <: CreatableRelationProvider] extends RapidsMeta[INPUT, CreatableRelationProvider, GpuCreatableRelationProvider]
  83. class CreatableRelationProviderRule[INPUT <: CreatableRelationProvider] extends ReplacementRule[INPUT, CreatableRelationProvider, CreatableRelationProviderMeta[INPUT]]
  84. trait CudfBinaryExpression extends BinaryExpression with GpuBinaryExpression
  85. abstract class CudfBinaryOperator extends BinaryOperator with GpuBinaryOperator with CudfBinaryExpression
  86. class CudfRegexTranspiler extends AnyRef

    Transpile Java/Spark regular expression to a format that cuDF supports, or throw an exception if this is not possible.

  87. trait CudfUnaryExpression extends GpuUnaryExpression
  88. case class CudfVersionMismatchException(errorMsg: String) extends PluginException with Product with Serializable
  89. trait DataBlockBase extends AnyRef
  90. trait DataFromReplacementRule extends AnyRef
  91. class DataTypeMeta extends AnyRef

    The metadata around DataType, which records the original data type, the desired data type for GPU overrides, and the reason of potential conversion.

    The metadata around DataType, which records the original data type, the desired data type for GPU overrides, and the reason of potential conversion. The metadata is to ensure TypeChecks tagging the actual data types for GPU runtime, since data types of GPU overrides may slightly differ from original CPU counterparts.

  92. abstract class DataWritingCommandMeta[INPUT <: DataWritingCommand] extends RapidsMeta[INPUT, DataWritingCommand, GpuDataWritingCommand]

    Base class for metadata around DataWritingCommand.

  93. class DataWritingCommandRule[INPUT <: DataWritingCommand] extends ReplacementRule[INPUT, DataWritingCommand, DataWritingCommandMeta[INPUT]]

    Holds everything that is needed to replace a DataWritingCommand with a GPU enabled version.

  94. sealed abstract class DateTimeRebaseMode extends Serializable

    Mirror of Spark's LegacyBehaviorPolicy.

    Mirror of Spark's LegacyBehaviorPolicy.

    This is to provides a stable reference to other Java code in our codebase and also mitigate from Spark's breaking changes that may cause issues if our code uses Spark's LegacyBehaviorPolicy.

  95. sealed class DegenerateRapidsBuffer extends RapidsBuffer

    A buffer with no corresponding device data (zero rows or columns).

    A buffer with no corresponding device data (zero rows or columns). These buffers are not tracked in buffer stores since they have no device memory. They are only tracked in the catalog and provide a representative ColumnarBatch but cannot provide a MemoryBuffer.

  96. class DeviceMemoryEventHandler extends RmmEventHandler with Logging

    RMM event handler to trigger spilling from the device memory store.

  97. class DirectByteBufferFactory extends ByteBufferFactory
  98. final class DoNotReplaceOrWarnSparkPlanMeta[INPUT <: SparkPlan] extends SparkPlanMeta[INPUT]

    Metadata for SparkPlan that should not be replaced or have any kind of warning for

  99. class DuplicateBufferException extends RuntimeException

    Exception thrown when inserting a buffer into the catalog with a duplicate buffer ID and storage tier combination.

  100. class DynamicGpuPartialSortAggregateIterator extends Iterator[ColumnarBatch]
  101. class EmptyGpuDataProducer[T] extends GpuDataProducer[T]
  102. class ExecChecks extends TypeChecks[Map[String, SupportLevel]]

    Checks the input and output types supported by a SparkPlan node.

    Checks the input and output types supported by a SparkPlan node. We don't currently separate input checks from output checks. We can add this in if something needs it.

    The namedChecks map can be used to provide checks for specific groups of expressions.

  103. class ExecRule[INPUT <: SparkPlan] extends ReplacementRule[INPUT, SparkPlan, SparkPlanMeta[INPUT]]

    Holds everything that is needed to replace a SparkPlan with a GPU enabled version.

  104. class ExecutedCommandExecMeta extends SparkPlanMeta[ExecutedCommandExec]
  105. trait ExplainPlanBase extends AnyRef
  106. class ExplainPlanImpl extends ExplainPlanBase

    Note, this class should not be referenced directly in source code.

    Note, this class should not be referenced directly in source code. It should be loaded by reflection using ShimLoader.newInstanceOf, see ./docs/dev/shims.md

    Attributes
    protected
  107. abstract class ExprChecks extends TypeChecks[Map[ExpressionContext, Map[String, SupportLevel]]]

    Base class all Expression checks must follow.

  108. case class ExprChecksImpl(contexts: Map[ExpressionContext, ContextChecks]) extends ExprChecks with Product with Serializable
  109. abstract class ExprMeta[INPUT <: Expression] extends BaseExprMeta[INPUT]
  110. class ExprRule[INPUT <: Expression] extends ReplacementRule[INPUT, Expression, BaseExprMeta[INPUT]]

    Holds everything that is needed to replace an Expression with a GPU enabled version.

  111. sealed abstract class ExpressionContext extends AnyRef
  112. trait ExtraInfo extends AnyRef

    A common trait for the extra information for different file format

  113. class FileFormatChecks extends TypeChecks[SupportLevel]

    Checks for either a read or a write of a given file format.

  114. sealed trait FileFormatOp extends AnyRef
  115. sealed trait FileFormatType extends AnyRef
  116. abstract class FilePartitionReaderBase extends PartitionReader[ColumnarBatch] with Logging with ScanWithMetrics

    The base class for PartitionReader

  117. abstract class GeneratorExprMeta[INPUT <: Generator] extends ExprMeta[INPUT]
  118. class GetJsonObjectCombiner extends GpuExpressionCombiner
  119. case class GpuAlias(child: Expression, name: String)(exprId: ExprId = NamedExpression.newExprId, qualifier: Seq[String] = Seq.empty, explicitMetadata: Option[Metadata] = None) extends GpuUnaryExpression with NamedExpression with Product with Serializable
  120. case class GpuAppendDataExecV1(table: SupportsWrite, plan: LogicalPlan, refreshCache: () ⇒ Unit, write: V1Write) extends V2CommandExec with GpuV1FallbackWriters with Product with Serializable

    GPU version of AppendDataExecV1

    GPU version of AppendDataExecV1

    Physical plan node for append into a v2 table using V1 write interfaces.

    Rows in the output data set are appended.

  121. case class GpuArrayExists(argument: Expression, function: Expression, followThreeValuedLogic: Boolean, isBound: Boolean = false, boundIntermediate: Seq[GpuExpression] = Seq.empty) extends Expression with GpuArrayTransformBase with Product with Serializable
  122. case class GpuArrayFilter(argument: Expression, function: Expression, isBound: Boolean = false, boundIntermediate: Seq[GpuExpression] = Seq.empty) extends Expression with GpuArrayTransformBase with Product with Serializable
  123. case class GpuArrayTransform(argument: Expression, function: Expression, isBound: Boolean = false, boundIntermediate: Seq[GpuExpression] = Seq.empty) extends Expression with GpuArrayTransformBase with Product with Serializable
  124. trait GpuArrayTransformBase extends Expression with GpuSimpleHigherOrderFunction
  125. case class GpuAtLeastNNonNulls(n: Int, exprs: Seq[Expression]) extends Expression with GpuExpression with ShimExpression with Predicate with Product with Serializable

    A GPU accelerated predicate that is evaluated to be true if there are at least n non-null and non-NaN values.

  126. abstract class GpuBaseAggregateMeta[INPUT <: SparkPlan] extends SparkPlanMeta[INPUT]
  127. trait GpuBaseLimitExec extends SparkPlan with LimitExec with GpuExec with ShimUnaryExecNode

    Helper trait which defines methods that are shared by both GpuLocalLimitExec and GpuGlobalLimitExec.

  128. class GpuBaseLimitIterator extends Iterator[ColumnarBatch]
  129. trait GpuBatchScanExecMetrics extends SparkPlan with GpuExec
  130. trait GpuBinaryExpression extends BinaryExpression with ShimBinaryExpression with GpuExpression
  131. trait GpuBinaryExpressionArgsAnyScalar extends BinaryExpression with GpuBinaryExpression

    Expressions subclassing this trait guarantee that they implement: doColumnar(GpuScalar, GpuScalar) doColumnar(GpuColumnVector, GpuScalar)

    Expressions subclassing this trait guarantee that they implement: doColumnar(GpuScalar, GpuScalar) doColumnar(GpuColumnVector, GpuScalar)

    The default implementation throws for all other permutations.

    The binary expression must fallback to the CPU for the doColumnar cases that would throw. The default implementation here should never execute.

  132. trait GpuBinaryOperator extends BinaryOperator with GpuBinaryExpression
  133. trait GpuBind extends AnyRef

    A trait that allows an Expression to control how it and its child expressions are bound.

    A trait that allows an Expression to control how it and its child expressions are bound. This should be used with a lot of caution as binding can be really hard to debug if you get it wrong. The output of bind should have all instances of AttributeReference replaced with GpuBoundReference.

  134. case class GpuBoundReference(ordinal: Int, dataType: DataType, nullable: Boolean)(exprId: ExprId, name: String) extends GpuLeafExpression with ShimExpression with Product with Serializable
  135. case class GpuBringBackToHost(child: SparkPlan) extends SparkPlan with ShimUnaryExecNode with GpuExec with Product with Serializable

    Pull back any data on the GPU to the host so the host can access it.

  136. sealed abstract class GpuBuildSide extends AnyRef

    Spark BuildSide, BuildRight, BuildLeft moved packages in Spark 3.1 so create GPU versions of these that can be agnostic to Spark version.

  137. case class GpuCSVPartitionReaderFactory(sqlConf: SQLConf, broadcastedConf: Broadcast[SerializableConfiguration], dataSchema: StructType, readDataSchema: StructType, partitionSchema: StructType, parsedOptions: CSVOptions, maxReaderBatchSizeRows: Integer, maxReaderBatchSizeBytes: Long, maxGpuColumnSizeBytes: Long, metrics: Map[String, GpuMetric], params: Map[String, String]) extends ShimFilePartitionReaderFactory with Product with Serializable
  138. case class GpuCSVScan(sparkSession: SparkSession, fileIndex: PartitioningAwareFileIndex, dataSchema: StructType, readDataSchema: StructType, readPartitionSchema: StructType, options: CaseInsensitiveStringMap, partitionFilters: Seq[Expression], dataFilters: Seq[Expression], maxReaderBatchSizeRows: Integer, maxReaderBatchSizeBytes: Long, maxGpuColumnSizeBytes: Long) extends TextBasedFileScan with GpuScan with Product with Serializable
  139. case class GpuCaseWhen(branches: Seq[(Expression, Expression)], elseValue: Option[Expression] = None, caseWhenFuseEnabled: Boolean = true) extends Expression with GpuConditionalExpression with Serializable with Product
  140. case class GpuCast(child: Expression, dataType: DataType, ansiMode: Boolean = false, timeZoneId: Option[String] = None, legacyCastComplexTypesToString: Boolean = false, stringToDateAnsiModeEnabled: Boolean = false) extends GpuUnaryExpression with TimeZoneAwareExpression with NullIntolerant with Product with Serializable

    Casts using the GPU

  141. case class GpuCheckOverflow(child: Expression, dataType: DecimalType, nullOnOverflow: Boolean) extends GpuUnaryExpression with Product with Serializable

    A GPU substitution for CheckOverflow.

    A GPU substitution for CheckOverflow. This cannot match the Spark CheckOverflow 100% because Spark will calculate values in BigDecimal with unbounded precision and then see if there was an overflow. This will check bounds, but can only detect that an overflow happened if the result is outside the bounds of what the Spark type supports, but did not yet overflow the bounds for what the CUDF type supports. For most operations when this is a possibility for the given precision then the operator should fall back to the CPU, or have alternative ways of checking for overflow prior to this being called.

  142. case class GpuCoalesce(children: Seq[Expression]) extends Expression with GpuExpression with ShimExpression with ComplexTypeMergingExpression with Product with Serializable
  143. case class GpuCoalesceBatches(child: SparkPlan, goal: CoalesceGoal) extends SparkPlan with ShimUnaryExecNode with GpuExec with Product with Serializable
  144. case class GpuCoalesceExec(numPartitions: Int, child: SparkPlan) extends SparkPlan with ShimUnaryExecNode with GpuExec with Product with Serializable
  145. class GpuCoalesceIterator extends AbstractGpuCoalesceIterator
  146. class GpuCollectLimitMeta extends SparkPlanMeta[CollectLimitExec]
  147. class GpuColumnVector extends GpuColumnVectorBase

    A GPU accelerated version of the Spark ColumnVector.

    A GPU accelerated version of the Spark ColumnVector. Most of the standard Spark APIs should never be called, as they assume that the data is on the host, and we want to keep as much of the data on the device as possible. We also provide GPU accelerated versions of the transitions to and from rows.

  148. final class GpuColumnVectorFromBuffer extends GpuColumnVector

    GPU column vector carved from a single buffer, like those from cudf's contiguousSplit.

  149. abstract class GpuColumnarBatchIterator extends Iterator[ColumnarBatch] with AutoCloseable

    An abstract columnar batch iterator that gives options for auto closing when the associated task completes.

    An abstract columnar batch iterator that gives options for auto closing when the associated task completes. Also provides idempotent close semantics.

    This iterator follows the semantics of GPU RDD columnar batch iterators too in that if a batch is returned by next it is the responsibility of the receiver to close it.

    Generally it is good practice if hasNext would return false than any outstanding resources should be closed so waiting for an explicit close is not needed.

  150. class GpuColumnarBatchSerializer extends Serializer with Serializable

    Serializer for serializing ColumnarBatchs for use during normal shuffle.

    Serializer for serializing ColumnarBatchs for use during normal shuffle.

    The serialization write path takes the cudf Table that is described by the ColumnarBatch and uses cudf APIs to serialize the data into a sequence of bytes on the host. The data is returned to the Spark shuffle code where it is compressed by the CPU and written to disk.

    The serialization read path is notably different. The sequence of serialized bytes IS NOT deserialized into a cudf Table but rather tracked in host memory by a ColumnarBatch that contains a SerializedTableColumn. During query planning, each GPU columnar shuffle exchange is followed by a GpuShuffleCoalesceExec that expects to receive only these custom batches of SerializedTableColumn. GpuShuffleCoalesceExec coalesces the smaller shuffle partitions into larger tables before placing them on the GPU for further processing.

    Note

    The RAPIDS shuffle does not use this code.

  151. class GpuColumnarBatchWithPartitionValuesIterator extends Iterator[ColumnarBatch]

    An iterator that appends partition columns to each batch in the input iterator.

    An iterator that appends partition columns to each batch in the input iterator.

    This iterator will correctly handle multiple partition values for each partition column for a chunked read.

  152. case class GpuColumnarToRowExec(child: SparkPlan, exportColumnarRdd: Boolean = false) extends SparkPlan with ShimUnaryExecNode with ColumnarToRowTransition with GpuExec with Product with Serializable
  153. trait GpuComplexTypeMergingExpression extends Expression with ComplexTypeMergingExpression with GpuExpression with ShimExpression
  154. final class GpuCompressedColumnVector extends GpuColumnVectorBase with WithTableBuffer

    A column vector that tracks a compressed table.

    A column vector that tracks a compressed table. Unlike a normal GPU column vector, the columnar data within cannot be accessed directly. This class primarily serves the role of tracking the compressed data and table metadata so it can be decompressed later.

  155. class GpuCompressionAwareCoalesceIterator extends GpuCoalesceIterator

    Compression codec-aware GpuCoalesceIterator subclass which should be used in cases where the RAPIDS Shuffle Manager could be configured, as batches to be coalesced may be compressed.

  156. trait GpuConditionalExpression extends Expression with ComplexTypeMergingExpression with GpuExpression with ShimExpression
  157. class GpuCostModel extends CostModel
  158. trait GpuCreatableRelationProvider extends CreatableRelationProvider

    Trait to mark a GPU version of a CreatableRelationProvider

  159. trait GpuDataProducer[T] extends AutoCloseable

    A GpuDataProducer produces data on the GPU.

    A GpuDataProducer produces data on the GPU. That data typically comes from other resources also held on the GPU that cannot be released until the iterator is closed and cannot be made spillable. This behaves like an Iterator but is not an Iterator because this cannot be used in place of an Iterator, especially in the context of an RDD where it would violate the semantics of the GpuSemaphore. Generally the lifetime of this should be entirely while the GpuSemaphore is held. It is "generally" because there are a few cases where for performance reasons if we know that no data is going to be produced, then we might not grab the semaphore at all.

    T

    what it is that we are wrapping

  160. trait GpuDataWritingCommand extends LogicalPlan with DataWritingCommand with ShimUnaryCommand

    An extension of DataWritingCommand that allows columnar execution.

  161. case class GpuDataWritingCommandExec(cmd: GpuDataWritingCommand, child: SparkPlan) extends SparkPlan with ShimUnaryExecNode with GpuExec with Product with Serializable
  162. case class GpuDynamicPruningExpression(child: Expression) extends UnaryExpression with ShimUnaryExpression with GpuExpression with DynamicPruning with Product with Serializable
  163. trait GpuExec extends SparkPlan
  164. case class GpuExecutedCommandExec(cmd: RunnableCommand) extends SparkPlan with LeafExecNode with GpuExec with Product with Serializable

    GPU version of ExecutedCommandExec.

    GPU version of ExecutedCommandExec.

    This class is essentially identical to ExecutedCommandExec but marked with GpuExec so it's clear this is replacing a CPU operation with a GPU operation. The GPU operation is not performed directly here, rather it is the underlying command that will ultimately execute on the GPU.

  165. case class GpuExpandExec(projections: Seq[Seq[Expression]], output: Seq[Attribute], child: SparkPlan)(preprojectEnabled: Boolean = false, coalesceAfter: Boolean = true) extends SparkPlan with ShimUnaryExecNode with GpuExec with Product with Serializable

    Apply all of the GroupExpressions to every input row, hence we will get multiple output rows for an input row.

    Apply all of the GroupExpressions to every input row, hence we will get multiple output rows for an input row.

    projections

    The group of expressions, all of the group expressions should output the same schema specified bye the parameter output

    output

    Attribute references to Output

    child

    Child operator

    preprojectEnabled

    Whether to enable pre-project before expanding

    coalesceAfter

    Whether to coalesce the output batches

  166. class GpuExpandExecMeta extends SparkPlanMeta[ExpandExec]
  167. class GpuExpandIterator extends Iterator[ColumnarBatch]
  168. case class GpuExplode(child: Expression) extends GpuExplodeBase with Product with Serializable
  169. abstract class GpuExplodeBase extends GpuUnevaluableUnaryExpression with GpuGenerator
  170. trait GpuExpression extends Expression

    An Expression that cannot be evaluated in the traditional row-by-row sense (hence Unevaluable) but instead can be evaluated on an entire column batch at once.

  171. case class GpuFastSampleExec(lowerBound: Double, upperBound: Double, withReplacement: Boolean, seed: Long, child: SparkPlan) extends SparkPlan with ShimUnaryExecNode with GpuExec with Product with Serializable
  172. case class GpuFilterExec(condition: Expression, child: SparkPlan)(coalesceAfter: Boolean = true) extends SparkPlan with ShimUnaryExecNode with ShimPredicateHelper with GpuExec with Product with Serializable
  173. case class GpuFilterExecMeta(filter: FilterExec, conf: RapidsConf, parentMetaOpt: Option[RapidsMeta[_, _, _]], rule: DataFromReplacementRule) extends SparkPlanMeta[FilterExec] with Product with Serializable
  174. case class GpuGenerateExec(generator: GpuGenerator, requiredChildOutput: Seq[Attribute], outer: Boolean, generatorOutput: Seq[Attribute], child: SparkPlan) extends SparkPlan with ShimUnaryExecNode with GpuExec with Product with Serializable
  175. class GpuGenerateExecSparkPlanMeta extends SparkPlanMeta[GenerateExec]
  176. class GpuGenerateIterator extends Iterator[ColumnarBatch] with TaskAutoCloseableResource
  177. trait GpuGenerator extends Expression with GpuUnevaluable

    GPU overrides of Generator, corporate with GpuGenerateExec.

  178. case class GpuGetJsonObject(json: Expression, path: Expression)(savePathForVerify: Option[String], saveRowsForVerify: Int) extends BinaryExpression with GpuBinaryExpressionArgsAnyScalar with ExpectsInputTypes with GpuCombinable with Product with Serializable
  179. class GpuGetJsonObjectMeta extends BinaryExprMeta[GetJsonObject]
  180. case class GpuGlobalLimitExec(limit: Int = -1, child: SparkPlan, offset: Int = 0) extends SparkPlan with GpuBaseLimitExec with Product with Serializable

    Take the first limit elements of the child's single output partition.

  181. case class GpuHashAggregateExec(requiredChildDistributionExpressions: Option[Seq[Expression]], groupingExpressions: Seq[NamedExpression], aggregateExpressions: Seq[GpuAggregateExpression], aggregateAttributes: Seq[Attribute], resultExpressions: Seq[NamedExpression], child: SparkPlan, configuredTargetBatchSize: Long, estimatedPreProcessGrowth: Double, forceSinglePassAgg: Boolean, allowSinglePassAgg: Boolean, allowNonFullyAggregatedOutput: Boolean, skipAggPassReductionRatio: Double) extends SparkPlan with ShimUnaryExecNode with GpuExec with Product with Serializable

    The GPU version of SortAggregateExec that is intended for partial aggregations that are not reductions and so it sorts the input data ahead of time to do it in a single pass.

    The GPU version of SortAggregateExec that is intended for partial aggregations that are not reductions and so it sorts the input data ahead of time to do it in a single pass.

    requiredChildDistributionExpressions

    this is unchanged by the GPU. It is used in EnsureRequirements to be able to add shuffle nodes

    groupingExpressions

    The expressions that, when applied to the input batch, return the grouping key

    aggregateExpressions

    The GpuAggregateExpression instances for this node

    aggregateAttributes

    References to each GpuAggregateExpression (attribute references)

    resultExpressions

    the expected output expression of this hash aggregate (which this node should project)

    child

    incoming plan (where we get input columns from)

    configuredTargetBatchSize

    user-configured maximum device memory size of a batch

    allowNonFullyAggregatedOutput

    whether we can skip the third pass of aggregation (can omit non fully aggregated data for non-final stage of aggregation)

    skipAggPassReductionRatio

    skip if the ratio of rows after a pass is bigger than this value

  182. class GpuHashAggregateMeta extends GpuBaseAggregateMeta[HashAggregateExec]
  183. case class GpuHashAggregateMetrics(numOutputRows: GpuMetric, numOutputBatches: GpuMetric, numTasksFallBacked: GpuMetric, opTime: GpuMetric, computeAggTime: GpuMetric, concatTime: GpuMetric, sortTime: GpuMetric, numAggOps: GpuMetric, numPreSplits: GpuMetric, singlePassTasks: GpuMetric, heuristicTime: GpuMetric) extends Product with Serializable

    Utility class to hold all of the metrics related to hash aggregation

  184. abstract class GpuHashPartitioningBase extends Expression with GpuExpression with ShimExpression with GpuPartitioning with Serializable
  185. trait GpuHigherOrderFunction extends Expression with GpuExpression with ShimExpression

    A higher order function takes one or more (lambda) functions and applies these to some objects.

    A higher order function takes one or more (lambda) functions and applies these to some objects. The function produces a number of variables which can be consumed by some lambda function.

  186. case class GpuIf(predicateExpr: Expression, trueExpr: Expression, falseExpr: Expression) extends Expression with GpuConditionalExpression with Product with Serializable
  187. case class GpuInSet(child: Expression, list: Seq[Any]) extends GpuUnaryExpression with Predicate with Product with Serializable
  188. case class GpuIsNan(child: Expression) extends GpuUnaryExpression with Predicate with Product with Serializable
  189. case class GpuIsNotNull(child: Expression) extends GpuUnaryExpression with Predicate with Product with Serializable
  190. case class GpuIsNull(child: Expression) extends GpuUnaryExpression with Predicate with Product with Serializable
  191. case class GpuJsonTuple(children: Seq[Expression]) extends Expression with GpuGenerator with ShimExpression with Product with Serializable
  192. class GpuKeyBatchingIterator extends Iterator[ColumnarBatch]

    Given a stream of data that is sorted by a set of keys, split the data so each batch output contains all of the keys for a given key set.

    Given a stream of data that is sorted by a set of keys, split the data so each batch output contains all of the keys for a given key set. This tries to get the batch sizes close to the target size. It assumes that the input batches will already be close to that size and does not try to split them too much further.

  193. case class GpuKnownFloatingPointNormalized(child: Expression) extends UnaryExpression with ShimTaggingExpression with GpuExpression with Product with Serializable

    This is a TaggingExpression in spark, which gets matched in NormalizeFloatingNumbers (which is a Rule).

  194. case class GpuKnownNotNull(child: Expression) extends UnaryExpression with ShimTaggingExpression with GpuExpression with Product with Serializable

    GPU version of the 'KnownNotNull', a TaggingExpression in spark, to tag an expression as known to not be null.

  195. case class GpuLambdaFunction(function: Expression, arguments: Seq[NamedExpression], hidden: Boolean = false) extends Expression with GpuExpression with ShimExpression with Product with Serializable

    A lambda function and its arguments on the GPU.

    A lambda function and its arguments on the GPU. This is mostly just a wrapper around the function expression, but it holds references to the arguments passed into it.

  196. abstract class GpuLeafExpression extends Expression with GpuExpression with ShimExpression
  197. case class GpuLiteral(value: Any, dataType: DataType) extends GpuLeafExpression with Product with Serializable

    In order to do type conversion and checking, use GpuLiteral.create() instead of constructor.

  198. case class GpuLocalLimitExec(limit: Int, child: SparkPlan) extends SparkPlan with GpuBaseLimitExec with Product with Serializable

    Take the first limit elements of each child partition, but do not collect or shuffle them.

  199. case class GpuMakeDecimal(child: Expression, precision: Int, sparkScale: Int, nullOnOverflow: Boolean) extends GpuUnaryExpression with Product with Serializable
  200. case class GpuMapFilter(argument: Expression, function: Expression, isBound: Boolean = false, boundIntermediate: Seq[GpuExpression] = Seq.empty) extends Expression with GpuMapSimpleHigherOrderFunction with Product with Serializable
  201. case class GpuMapFromArraysMeta(expr: MapFromArrays, conf: RapidsConf, parent: Option[RapidsMeta[_, _, _]], rule: DataFromReplacementRule) extends BinaryExprMeta[MapFromArrays] with Product with Serializable
  202. trait GpuMapSimpleHigherOrderFunction extends Expression with GpuSimpleHigherOrderFunction with GpuBind
  203. class GpuMergeAggregateIterator extends Iterator[ColumnarBatch] with AutoCloseable with Logging

    Iterator that takes another columnar batch iterator as input and emits new columnar batches that are aggregated based on the specified grouping and aggregation expressions.

    Iterator that takes another columnar batch iterator as input and emits new columnar batches that are aggregated based on the specified grouping and aggregation expressions. This iterator tries to perform a hash-based aggregation but is capable of falling back to a sort-based aggregation which can operate on data that is either larger than can be represented by a cudf column or larger than can fit in GPU memory.

    The iterator starts by pulling all batches from the input iterator, performing an initial projection and aggregation on each individual batch via aggregateInputBatches(). The resulting aggregated batches are cached in memory as spillable batches. Once all input batches have been aggregated, tryMergeAggregatedBatches() is called to attempt a merge of the aggregated batches into a single batch. If this is successful then the resulting batch can be returned, otherwise buildSortFallbackIterator is used to sort the aggregated batches by the grouping keys and performs a final merge aggregation pass on the sorted batches.

  204. sealed abstract class GpuMetric extends Serializable
  205. case class GpuMonotonicallyIncreasingID() extends GpuLeafExpression with Product with Serializable

    An expression that returns monotonically increasing 64-bit integers just like org.apache.spark.sql.catalyst.expressions.MonotonicallyIncreasingID

    An expression that returns monotonically increasing 64-bit integers just like org.apache.spark.sql.catalyst.expressions.MonotonicallyIncreasingID

    The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. This implementations should match what spark does which is to put the partition ID in the upper 31 bits, and the lower 33 bits represent the record number within each partition.

  206. case class GpuMultiGetJsonObject(json: Expression, paths: Seq[Option[List[PathInstruction]]], output: StructType)(targetBatchSize: Long, parallel: Option[Int]) extends Expression with GpuExpression with ShimExpression with Product with Serializable
  207. case class GpuNaNvl(left: Expression, right: Expression) extends BinaryExpression with GpuBinaryExpression with Product with Serializable
  208. case class GpuNamedLambdaVariable(name: String, dataType: DataType, nullable: Boolean, exprId: ExprId = NamedExpression.newExprId) extends GpuLeafExpression with NamedExpression with GpuUnevaluable with Product with Serializable

    A named lambda variable.

    A named lambda variable. In Spark on the CPU this includes an AtomicReference to the value that is updated each time a lambda function is called. On the GPU we have to bind this and turn it into a GpuBoundReference for a modified input batch. In the future this should also work with AST when cudf supports that type of operation.

  209. case class GpuNansToNulls(child: Expression) extends GpuUnaryExpression with Product with Serializable
  210. class GpuObjectHashAggregateExecMeta extends GpuTypedImperativeSupportedAggregateExecMeta[ObjectHashAggregateExec]
  211. case class GpuOrcMultiFilePartitionReaderFactory(sqlConf: SQLConf, broadcastedConf: Broadcast[SerializableConfiguration], dataSchema: StructType, readDataSchema: StructType, partitionSchema: StructType, filters: Array[Filter], rapidsConf: RapidsConf, metrics: Map[String, GpuMetric], queryUsesInputFile: Boolean) extends MultiFilePartitionReaderFactoryBase with Product with Serializable

    The multi-file partition reader factory for creating cloud reading or coalescing reading for ORC file format.

    The multi-file partition reader factory for creating cloud reading or coalescing reading for ORC file format.

    sqlConf

    the SQLConf

    broadcastedConf

    the Hadoop configuration

    dataSchema

    schema of the data

    readDataSchema

    the Spark schema describing what will be read

    partitionSchema

    schema of partitions.

    filters

    filters on non-partition columns

    rapidsConf

    the Rapids configuration

    metrics

    the metrics

    queryUsesInputFile

    this is a parameter to easily allow turning it off in GpuTransitionOverrides if InputFileName, InputFileBlockStart, or InputFileBlockLength are used

  212. class GpuOrcPartitionReader extends FilePartitionReaderBase with OrcPartitionReaderBase

    A PartitionReader that reads an ORC file split on the GPU.

    A PartitionReader that reads an ORC file split on the GPU.

    Efficiently reading an ORC split on the GPU requires rebuilding the ORC file in memory such that only relevant data is present in the memory file. This avoids sending unnecessary data to the GPU and saves GPU memory.

  213. case class GpuOrcPartitionReaderFactory(sqlConf: SQLConf, broadcastedConf: Broadcast[SerializableConfiguration], dataSchema: StructType, readDataSchema: StructType, partitionSchema: StructType, pushedFilters: Array[Filter], rapidsConf: RapidsConf, metrics: Map[String, GpuMetric], params: Map[String, String]) extends ShimFilePartitionReaderFactory with Product with Serializable
  214. case class GpuOrcScan(sparkSession: SparkSession, hadoopConf: Configuration, fileIndex: PartitioningAwareFileIndex, dataSchema: StructType, readDataSchema: StructType, readPartitionSchema: StructType, options: CaseInsensitiveStringMap, pushedFilters: Array[Filter], partitionFilters: Seq[Expression], dataFilters: Seq[Expression], rapidsConf: RapidsConf, queryUsesInputFile: Boolean = false) extends FileScan with GpuScan with Logging with Product with Serializable
  215. case class GpuOutOfCoreSortIterator(iter: Iterator[ColumnarBatch], sorter: GpuSorter, targetSize: Long, opTime: GpuMetric, sortTime: GpuMetric, outputBatches: GpuMetric, outputRows: GpuMetric) extends Iterator[ColumnarBatch] with AutoCloseable with Product with Serializable

    Sorts incoming batches of data spilling if needed.

    Sorts incoming batches of data spilling if needed.
    The algorithm for this is a modified version of an external merge sort with multiple passes for large data. https://en.wikipedia.org/wiki/External_sorting#External_merge_sort
    The main difference is that we cannot stream the data when doing a merge sort. So, we instead divide the data into batches that are small enough that we can do a merge sort on N batches and still fit the output within the target batch size. When merging batches instead of individual rows we cannot assume that all of the resulting data is globally sorted. Hopefully, most of it is globally sorted but we have to use the first row from the next pending batch to determine the cutoff point between globally sorted data and data that still needs to be merged with other batches. The globally sorted portion is put into a sorted queue while the rest of the merged data is split and put back into a pending queue. The process repeats until we have enough data to output.

  216. case class GpuOverrides() extends Rule[SparkPlan] with Logging with Product with Serializable
  217. trait GpuOverridesListener extends AnyRef

    Listener trait so that tests can confirm that the expected optimizations are being applied

  218. case class GpuOverwriteByExpressionExecV1(table: SupportsWrite, plan: LogicalPlan, refreshCache: () ⇒ Unit, write: V1Write) extends V2CommandExec with GpuV1FallbackWriters with Product with Serializable

    GPU version of OverwriteByExpressionExecV1

    GPU version of OverwriteByExpressionExecV1

    Physical plan node for overwrite into a v2 table with V1 write interfaces. Note that when this interface is used, the atomicity of the operation depends solely on the target data source.

    Overwrites data in a table matched by a set of filters. Rows matching all of the filters will be deleted and rows in the output data set are appended.

    This plan is used to implement SaveMode.Overwrite. The behavior of SaveMode.Overwrite is to truncate the table -- delete all rows -- and append the output data set. This uses the filter AlwaysTrue to delete all rows.

  219. final class GpuPackedTableColumn extends GpuColumnVectorBase with WithTableBuffer

    A GPU column tracking a packed table such as one generated by contiguous split.

    A GPU column tracking a packed table such as one generated by contiguous split. Unlike GpuColumnVectorFromBuffer, the columnar data cannot be accessed directly.

    This class primarily serves the role of tracking the packed table data in a ColumnarBatch without requiring the underlying table to be manifested along with all of the child columns. The typical use-case generates one of these columns per task output partition, and then the RAPIDS shuffle transmits the opaque host metadata and GPU data buffer to another host.

    NOTE: There should only be one instance of this column per ColumnarBatch as the

  220. class GpuParquetFileFormat extends ColumnarFileFormat with Logging
  221. case class GpuParquetMultiFilePartitionReaderFactory(sqlConf: SQLConf, broadcastedConf: Broadcast[SerializableConfiguration], dataSchema: StructType, readDataSchema: StructType, partitionSchema: StructType, filters: Array[Filter], rapidsConf: RapidsConf, metrics: Map[String, GpuMetric], queryUsesInputFile: Boolean, alluxioPathReplacementMap: Option[Map[String, String]]) extends MultiFilePartitionReaderFactoryBase with Product with Serializable

    Similar to GpuParquetPartitionReaderFactory but extended for reading multiple files in an iteration.

    Similar to GpuParquetPartitionReaderFactory but extended for reading multiple files in an iteration. This will allow us to read multiple small files and combine them on the CPU side before sending them down to the GPU.

  222. case class GpuParquetPartitionReaderFactory(sqlConf: SQLConf, broadcastedConf: Broadcast[SerializableConfiguration], dataSchema: StructType, readDataSchema: StructType, partitionSchema: StructType, filters: Array[Filter], rapidsConf: RapidsConf, metrics: Map[String, GpuMetric], params: Map[String, String], alluxioPathReplacementMap: Option[Map[String, String]]) extends ShimFilePartitionReaderFactory with Logging with Product with Serializable
  223. case class GpuParquetScan(sparkSession: SparkSession, hadoopConf: Configuration, fileIndex: PartitioningAwareFileIndex, dataSchema: StructType, readDataSchema: StructType, readPartitionSchema: StructType, pushedFilters: Array[Filter], options: CaseInsensitiveStringMap, partitionFilters: Seq[Expression], dataFilters: Seq[Expression], rapidsConf: RapidsConf, queryUsesInputFile: Boolean = false) extends FileScan with GpuScan with Logging with Product with Serializable

    Base GpuParquetScan used for common code across Spark versions.

    Base GpuParquetScan used for common code across Spark versions. Gpu version of Spark's 'ParquetScan'.

    sparkSession

    SparkSession.

    hadoopConf

    Hadoop configuration.

    fileIndex

    File index of the relation.

    dataSchema

    Schema of the data.

    readDataSchema

    Schema to read.

    readPartitionSchema

    Partition schema.

    pushedFilters

    Filters on non-partition columns.

    options

    Parquet option settings.

    partitionFilters

    Filters on partition columns.

    dataFilters

    File source metadata filters.

    rapidsConf

    Rapids configuration.

    queryUsesInputFile

    This is a parameter to easily allow turning it off in GpuTransitionOverrides if InputFileName, InputFileBlockStart, or InputFileBlockLength are used

  224. class GpuParquetWriter extends ColumnarOutputWriter
  225. trait GpuPartitioning extends Partitioning
  226. case class GpuPosExplode(child: Expression) extends GpuExplodeBase with Product with Serializable
  227. case class GpuProjectAstExec(projectList: List[Expression], child: SparkPlan) extends SparkPlan with GpuProjectExecLike with Product with Serializable

    Use cudf AST expressions to project columnar batches

  228. case class GpuProjectExec(projectList: List[NamedExpression], child: SparkPlan) extends SparkPlan with GpuProjectExecLike with Product with Serializable
  229. trait GpuProjectExecLike extends SparkPlan with ShimUnaryExecNode with GpuExec
  230. class GpuProjectExecMeta extends SparkPlanMeta[ProjectExec] with Logging
  231. case class GpuPromotePrecision(child: Expression) extends GpuUnaryExpression with Product with Serializable

    A GPU substitution of PromotePrecision, which is a NOOP in Spark too.

  232. case class GpuQueryStagePrepOverrides() extends Rule[SparkPlan] with Logging with Product with Serializable

    Tag the initial plan when AQE is enabled

  233. case class GpuRangeExec(start: Long, end: Long, step: Long, numSlices: Int, output: Seq[Attribute], targetSizeBytes: Long) extends SparkPlan with ShimLeafExecNode with GpuExec with Product with Serializable

    Physical plan for range (generating a range of 64 bit numbers).

  234. case class GpuRangePartitioner(rangeBounds: Array[InternalRow], sorter: GpuSorter) extends Expression with GpuExpression with ShimExpression with GpuPartitioning with Product with Serializable
  235. class GpuReadCSVFileFormat extends CSVFileFormat with GpuReadFileFormatWithMetrics

    A FileFormat that allows reading CSV files with the GPU.

  236. trait GpuReadFileFormatWithMetrics extends FileFormat
  237. class GpuReadOrcFileFormat extends OrcFileFormat with GpuReadFileFormatWithMetrics

    A FileFormat that allows reading ORC files with the GPU.

  238. class GpuReadParquetFileFormat extends ParquetFileFormat with GpuReadFileFormatWithMetrics

    A FileFormat that allows reading Parquet files with the GPU.

  239. class GpuRegExpReplaceMeta extends QuaternaryExprMeta[RegExpReplace]
  240. trait GpuRegExpReplaceOpt extends Serializable
  241. case class GpuReplicateRows(children: Seq[Expression]) extends Expression with GpuGenerator with ShimExpression with Product with Serializable
  242. case class GpuRoundRobinPartitioning(numPartitions: Int) extends Expression with GpuExpression with ShimExpression with GpuPartitioning with Product with Serializable

    Represents a partitioning where incoming columnar batched rows are distributed evenly across output partitions by starting from a zero-th partition number and distributing rows in a round-robin fashion.

    Represents a partitioning where incoming columnar batched rows are distributed evenly across output partitions by starting from a zero-th partition number and distributing rows in a round-robin fashion. This partitioning is used when implementing the DataFrame.repartition() operator.

  243. trait GpuRowBasedUserDefinedFunction extends Expression with GpuExpression with ShimExpression with UserDefinedExpression with Serializable with Logging

    Execute a row based UDF efficiently by pulling back only the columns the UDF needs to host and do the processing on CPU.

  244. case class GpuRowToColumnarExec(child: SparkPlan, goal: CoalesceSizeGoal) extends SparkPlan with ShimUnaryExecNode with GpuExec with Product with Serializable

    GPU version of row to columnar transition.

  245. trait GpuRunnableCommand extends LogicalPlan with RunnableCommand with ShimUnaryCommand

    An extension of RunnableCommand that allows columnar execution.

  246. case class GpuRunnableCommandExec(cmd: GpuRunnableCommand, child: SparkPlan) extends SparkPlan with ShimUnaryExecNode with GpuExec with Product with Serializable
  247. case class GpuSampleExec(lowerBound: Double, upperBound: Double, withReplacement: Boolean, seed: Long, child: SparkPlan) extends SparkPlan with ShimUnaryExecNode with GpuExec with Product with Serializable
  248. class GpuSampleExecMeta extends SparkPlanMeta[SampleExec] with Logging
  249. class GpuScalar extends AutoCloseable

    The wrapper of a Scala value and its corresponding cudf Scalar, along with its DataType.

    The wrapper of a Scala value and its corresponding cudf Scalar, along with its DataType.

    This class is introduced because many expressions require both the cudf Scalar and its corresponding Scala value to complete their computations. e.g. 'GpuStringSplit', 'GpuStringLocate', 'GpuDivide', 'GpuDateAddInterval', 'GpuTimeMath' ... So only either a cudf Scalar or a Scala value can not support such cases, unless copying data between the host and the device each time being asked for.

    This GpuScalar can be created from either a cudf Scalar or a Scala value. By initializing the cudf Scalar or the Scala value lazily and caching them after being created, it can reduce the unnecessary data copies.

    If a GpuScalar is created from a Scala value and is used only on the host side, there will be no data copy and no cudf Scalar created. And if it is used on the device side, only need to copy data to the device once to create a cudf Scalar.

    Similarly, if a GpuScalar is created from a cudf Scalar, no need to copy data to the host if it is used only on the device side (This is the ideal case we like, since all is on the GPU). And only need to copy the data to the host once if it is used on the host side.

    So a GpuScalar will have at most one data copy but support all the cases. No round-trip happens.

    Another reason why storing the Scala value in addition to the cudf Scalar is GpuDateAddInterval and 'GpuTimeMath' have different algorithms with the 3 members of a CalendarInterval, which can not be supported by a single cudf Scalar now.

    Do not create a GpuScalar from the constructor, instead call the factory APIs above.

  250. trait GpuScan extends Scan with ScanWithMetrics
  251. abstract class GpuScanWrapper extends GpuScan
  252. case class GpuShuffleCoalesceExec(child: SparkPlan, targetBatchByteSize: Long) extends SparkPlan with ShimUnaryExecNode with GpuExec with Product with Serializable

    Coalesces serialized tables on the host up to the target batch size before transferring the coalesced result to the GPU.

    Coalesces serialized tables on the host up to the target batch size before transferring the coalesced result to the GPU. This reduces the overhead of copying data to the GPU and also helps avoid holding onto the GPU semaphore while shuffle I/O is being performed.

    Note

    This should ALWAYS appear in the plan after a GPU shuffle when RAPIDS shuffle is not being used.

  253. class GpuShuffleCoalesceIterator extends Iterator[ColumnarBatch]

    Iterator that coalesces columnar batches that are expected to only contain SerializedTableColumn.

    Iterator that coalesces columnar batches that are expected to only contain SerializedTableColumn. The serialized tables within are collected up to the target batch size and then concatenated on the host before the data is transferred to the GPU.

  254. case class GpuShuffledAsymmetricHashJoinExec(joinType: JoinType, leftKeys: Seq[Expression], rightKeys: Seq[Expression], condition: Option[Expression], left: SparkPlan, right: SparkPlan, isGpuShuffle: Boolean, gpuBatchSizeBytes: Long, isSkewJoin: Boolean)(cpuLeftKeys: Seq[Expression], cpuRightKeys: Seq[Expression], magnificationThreshold: Integer) extends GpuShuffledSizedHashJoinExec[ColumnarBatch] with Product with Serializable

    A GPU shuffled hash join optimized to handle asymmetric joins like left outer and right outer.

    A GPU shuffled hash join optimized to handle asymmetric joins like left outer and right outer. Probes the sizes of the input tables before performing the join to determine which to use as the build side.

    leftKeys

    join keys for the left table

    rightKeys

    join keys for the right table

    condition

    inequality portions of the join condition

    left

    plan for the left table

    right

    plan for the right table

    isGpuShuffle

    whether the shuffle is GPU-centric (e.g.: UCX-based)

    gpuBatchSizeBytes

    target GPU batch size

    cpuLeftKeys

    original CPU expressions for the left join keys

    cpuRightKeys

    original CPU expressions for the right join keys

  255. case class GpuShuffledHashJoinExec(leftKeys: Seq[Expression], rightKeys: Seq[Expression], joinType: JoinType, buildSide: GpuBuildSide, condition: Option[Expression], left: SparkPlan, right: SparkPlan, isSkewJoin: Boolean)(cpuLeftKeys: Seq[Expression], cpuRightKeys: Seq[Expression]) extends SparkPlan with ShimBinaryExecNode with GpuHashJoin with GpuSubPartitionHashJoin with Product with Serializable
  256. class GpuShuffledHashJoinMeta extends SparkPlanMeta[ShuffledHashJoinExec]
  257. abstract class GpuShuffledSizedHashJoinExec[HOST_BATCH_TYPE <: AutoCloseable] extends SparkPlan with GpuJoinExec
  258. case class GpuShuffledSymmetricHashJoinExec(joinType: JoinType, leftKeys: Seq[Expression], rightKeys: Seq[Expression], condition: Option[Expression], left: SparkPlan, right: SparkPlan, isGpuShuffle: Boolean, gpuBatchSizeBytes: Long, isSkewJoin: Boolean)(cpuLeftKeys: Seq[Expression], cpuRightKeys: Seq[Expression]) extends GpuShuffledSizedHashJoinExec[SpillableHostConcatResult] with Product with Serializable

    A GPU shuffled hash join optimized to handle symmetric joins like inner and full outer.

    A GPU shuffled hash join optimized to handle symmetric joins like inner and full outer. Probes the sizes of the input tables before performing the join to determine which to use as the build side.

    leftKeys

    join keys for the left table

    rightKeys

    join keys for the right table

    condition

    inequality portions of the join condition

    left

    plan for the left table

    right

    plan for the right table

    isGpuShuffle

    whether the shuffle is GPU-centric (e.g.: UCX-based)

    gpuBatchSizeBytes

    target GPU batch size

    cpuLeftKeys

    original CPU expressions for the left join keys

    cpuRightKeys

    original CPU expressions for the right join keys

  259. trait GpuSimpleHigherOrderFunction extends Expression with GpuHigherOrderFunction with GpuBind

    Trait for functions having as input one argument and one function.

  260. class GpuSortAggregateExecMeta extends GpuTypedImperativeSupportedAggregateExecMeta[SortAggregateExec]
  261. case class GpuSortEachBatchIterator(iter: Iterator[ColumnarBatch], sorter: GpuSorter, singleBatch: Boolean, opTime: GpuMetric = NoopMetric, sortTime: GpuMetric = NoopMetric, outputBatches: GpuMetric = NoopMetric, outputRows: GpuMetric = NoopMetric) extends Iterator[ColumnarBatch] with Product with Serializable
  262. case class GpuSortExec(gpuSortOrder: Seq[SortOrder], global: Boolean, child: SparkPlan, sortType: SortExecType)(cpuSortOrder: Seq[SortOrder]) extends SparkPlan with ShimUnaryExecNode with GpuExec with Product with Serializable
  263. class GpuSortMergeJoinMeta extends SparkPlanMeta[SortMergeJoinExec]
  264. class GpuSortMeta extends SparkPlanMeta[SortExec]
  265. case class GpuSortOrderMeta(sortOrder: SortOrder, conf: RapidsConf, parentOpt: Option[RapidsMeta[_, _, _]], rule: DataFromReplacementRule) extends BaseExprMeta[SortOrder] with Product with Serializable
  266. class GpuSorter extends Serializable

    A class that provides convenience methods for sorting batches of data.

    A class that provides convenience methods for sorting batches of data. A Spark SortOrder typically will just reference a single column using an AttributeReference. This is the simplest situation so we just need to bind the attribute references to where they go, but it is possible that some computation can be done in the SortOrder. This would be a situation like sorting strings by their length instead of in lexicographical order. Because cudf does not support this directly we instead go through the SortOrder instances that are a part of this sorter and find the ones that require computation. We then do the sort in a few stages first we compute any needed columns from the SortOrder instances that require some computation, and add them to the original batch. The method appendProjectedColumns does this. This then provides a number of methods that can be used to operate on a batch that has these new columns added to it. These include sorting, merge sorting, and finding bounds. These can be combined in various ways to do different algorithms. When you are done with these different operations you can drop the temporary columns that were added, just for computation, using removeProjectedColumns. Some times you may want to pull data back to the CPU and sort rows there too. We provide cpuOrders that lets you do this on rows that have had the extra ordering columns added to them. This also provides fullySortBatch as an optimization. If all you want to do is sort a batch you don't want to have to sort the temp columns too, and this provide that.

  267. case class GpuSparkPartitionID() extends GpuLeafExpression with Product with Serializable

    An expression that returns the current partition id just like org.apache.spark.sql.catalyst.expressions.SparkPartitionID

  268. class GpuStackMeta extends BaseExprMeta[Stack]
  269. trait GpuString2TrimExpression extends Expression with String2TrimExpression with GpuExpression with ShimExpression
  270. case class GpuTakeOrderedAndProjectExecMeta(takeExec: TakeOrderedAndProjectExec, rapidsConf: RapidsConf, parentOpt: Option[RapidsMeta[_, _, _]], rule: DataFromReplacementRule) extends SparkPlanMeta[TakeOrderedAndProjectExec] with Product with Serializable
  271. trait GpuTernaryExpression extends TernaryExpression with ShimTernaryExpression with GpuExpression
  272. trait GpuTernaryExpressionArgsAnyScalarScalar extends TernaryExpression with GpuTernaryExpression

    Expressions subclassing this trait guarantee that they implement: doColumnar(GpuScalar, GpuScalar, GpuScalar) doColumnar(GpuColumnVector, GpuScalar, GpuScalar)

    Expressions subclassing this trait guarantee that they implement: doColumnar(GpuScalar, GpuScalar, GpuScalar) doColumnar(GpuColumnVector, GpuScalar, GpuScalar)

    The default implementation throws for all other permutations.

    The ternary expression must fallback to the CPU for the doColumnar cases that would throw. The default implementation here should never execute.

  273. trait GpuTernaryExpressionArgsScalarAnyScalar extends TernaryExpression with GpuTernaryExpression

    Expressions subclassing this trait guarantee that they implement: doColumnar(GpuScalar, GpuScalar, GpuScalar) doColumnar(GpuScalar, GpuColumnVector, GpuScalar)

    Expressions subclassing this trait guarantee that they implement: doColumnar(GpuScalar, GpuScalar, GpuScalar) doColumnar(GpuScalar, GpuColumnVector, GpuScalar)

    The default implementation throws for all other permutations.

    The ternary expression must fallback to the CPU for the doColumnar cases that would throw. The default implementation here should never execute.

  274. abstract class GpuTextBasedPartitionReader[BUFF <: LineBufferer, FACT <: LineBuffererFactory[BUFF]] extends PartitionReader[ColumnarBatch] with ScanWithMetrics

    The text based PartitionReader

  275. case class GpuTieredProject(exprTiers: Seq[Seq[GpuExpression]]) extends Product with Serializable

    Do projections in a tiered fashion, where earlier tiers contain sub-expressions that are referenced in later tiers.

    Do projections in a tiered fashion, where earlier tiers contain sub-expressions that are referenced in later tiers. Each tier adds columns to the original batch corresponding to the output of the sub-expressions. It also removes columns that are no longer needed, based on inputAttrTiers for the current tier and the next tier. Example of how this is processed: Original projection expressions: (((a + b) + c) * e), (((a + b) + d) * f), (a + e), (c + f) Input columns for tier 1: a, b, c, d, e, f (original projection inputs) Tier 1: (a + b) as ref1 Input columns for tier 2: a, c, d, e, f, ref1 Tier 2: (ref1 + c) as ref2, (ref1 + d) as ref3 Input columns for tier 3: a, c, e, f, ref2, ref3 Tier 3: (ref2 * e), (ref3 * f), (a + e), (c + f)

  276. case class GpuTopN(limit: Int, gpuSortOrder: Seq[SortOrder], projectList: Seq[NamedExpression], child: SparkPlan, offset: Int = 0)(cpuSortOrder: Seq[SortOrder]) extends SparkPlan with GpuBaseLimitExec with Product with Serializable

    Take the first limit elements as defined by the sortOrder, and do projection if needed.

    Take the first limit elements as defined by the sortOrder, and do projection if needed. This is logically equivalent to having a Limit operator after a SortExec operator, or having a ProjectExec operator between them. This could have been named TopK, but Spark's top operator does the opposite in ordering so we name it TakeOrdered to avoid confusion.

  277. case class GpuTransformKeys(argument: Expression, function: Expression, isBound: Boolean = false, boundIntermediate: Seq[GpuExpression] = Seq.empty) extends Expression with GpuMapSimpleHigherOrderFunction with Product with Serializable
  278. case class GpuTransformValues(argument: Expression, function: Expression, isBound: Boolean = false, boundIntermediate: Seq[GpuExpression] = Seq.empty) extends Expression with GpuMapSimpleHigherOrderFunction with Product with Serializable
  279. class GpuTransitionOverrides extends Rule[SparkPlan]

    Rules that run after the row to columnar and columnar to row transitions have been inserted.

    Rules that run after the row to columnar and columnar to row transitions have been inserted. These rules insert transitions to and from the GPU, and then optimize various transitions.

  280. abstract class GpuTypedImperativeSupportedAggregateExecMeta[INPUT <: BaseAggregateExec] extends GpuBaseAggregateMeta[INPUT]

    Base class for metadata around SortAggregateExec and ObjectHashAggregateExec, which may contain TypedImperativeAggregate functions in aggregate expressions.

  281. abstract class GpuUnaryExpression extends UnaryExpression with ShimUnaryExpression with GpuExpression
  282. trait GpuUnevaluable extends Expression with GpuExpression
  283. abstract class GpuUnevaluableUnaryExpression extends GpuUnaryExpression with GpuUnevaluable
  284. case class GpuUnionExec(children: Seq[SparkPlan]) extends SparkPlan with ShimSparkPlan with GpuExec with Product with Serializable
  285. case class GpuUnscaledValue(child: Expression) extends GpuUnaryExpression with Product with Serializable
  286. class GpuUnsignedIntegerType extends DataType

    An unsigned, 32-bit integer type that maps to DType.UINT32 in cudf.

    An unsigned, 32-bit integer type that maps to DType.UINT32 in cudf.

    Note

    This type should NOT be used in Catalyst plan nodes that could be exposed to CPU expressions.

  287. class GpuUnsignedLongType extends DataType

    An unsigned, 64-bit integer type that maps to DType.UINT64 in cudf.

    An unsigned, 64-bit integer type that maps to DType.UINT64 in cudf.

    Note

    This type should NOT be used in Catalyst plan nodes that could be exposed to CPU expressions.

  288. trait GpuUserDefinedFunction extends Expression with GpuExpression with ShimExpression with UserDefinedExpression with Serializable

    Common implementation across all RAPIDS accelerated UDF types

  289. trait GpuV1FallbackWriters extends V2CommandExec with LeafV2CommandExec with SupportsV1Write with GpuExec

    GPU version of V1FallbackWriters

  290. class HMBInputFile extends InputFile
  291. class HMBSeekableInputStream extends SeekableInputStream with HostMemoryInputStreamMixIn

    A parquet compatible stream that allows reading from a HostMemoryBuffer to Parquet.

    A parquet compatible stream that allows reading from a HostMemoryBuffer to Parquet. The majority of the code here was copied from Parquet's DelegatingSeekableInputStream with minor modifications to have it be make it Scala and call into the HostMemoryInputStreamMixIn's state.

  292. trait HasCustomTaggingData extends AnyRef
  293. final class HashedPriorityQueue[T] extends AbstractQueue[T]

    Implements a priority queue based on a heap.

    Implements a priority queue based on a heap. Like many priority queue implementations, this provides logarithmic time for inserting elements and removing the top element. However unlike many implementations, this provides logarithmic rather than linear time for the random-access contains and remove methods. The queue also provides a mechanism for updating the heap after an element's priority has changed via the priorityUpdated method instead of requiring the element to be removed and re-inserted.

    The queue is NOT thread-safe.

    The iterator does NOT return elements in priority order.

  294. case class Header(meta: Map[String, Array[Byte]], syncBuffer: Array[Byte]) extends Product with Serializable

    The header information of an Avro file.

  295. trait HiveProvider extends AnyRef

    The subclass of HiveProvider imports spark-hive classes.

    The subclass of HiveProvider imports spark-hive classes. This file should not imports spark-hive because class not found exception may throw if spark-hive does not exist at runtime. Details see: https://github.com/NVIDIA/spark-rapids/issues/5648

  296. class HostByteBufferIterator extends AbstractHostByteBufferIterator

    Create an iterator that will emit ByteBuffer instances sequentially to work around the 2GB ByteBuffer size limitation.

    Create an iterator that will emit ByteBuffer instances sequentially to work around the 2GB ByteBuffer size limitation. This allows the entire address range of a >2GB host buffer to be covered by a sequence of ByteBuffer instances.

    NOTE: It is the caller's responsibility to ensure this iterator does not outlive the host buffer. The iterator DOES NOT increment the reference count of the host buffer to ensure it remains valid.

    returns

    ByteBuffer iterator

  297. case class HostColumnarToGpu(child: SparkPlan, goal: CoalesceSizeGoal) extends SparkPlan with ShimUnaryExecNode with GpuExec with Product with Serializable

    Put columnar formatted data on the GPU.

  298. class HostLineBufferer extends LineBufferer

    Buffer the lines in a single HostMemoryBuffer with the separator inserted inbetween each of the lines.

  299. trait HostMemoryBuffersWithMetaDataBase extends AnyRef

    The base HostMemoryBuffer information read from a single file.

  300. class HostMemoryInputStream extends InputStream with HostMemoryInputStreamMixIn

    An implementation of InputStream that reads from a HostMemoryBuffer.

    An implementation of InputStream that reads from a HostMemoryBuffer.

    NOTE: Closing this input stream does NOT close the buffer!

  301. trait HostMemoryInputStreamMixIn extends InputStream
  302. class HostMemoryOutputStream extends OutputStream

    An implementation of OutputStream that writes to a HostMemoryBuffer.

    An implementation of OutputStream that writes to a HostMemoryBuffer.

    NOTE: Closing this output stream does NOT close the buffer!

  303. class HostQueueBatchIterator extends GpuColumnarBatchIterator

    Iterator that produces SerializedTableColumn batches from a queue of spillable host memory batches that were fetched first during probing and the (possibly empty) remaining iterator of un-probed host memory batches.

    Iterator that produces SerializedTableColumn batches from a queue of spillable host memory batches that were fetched first during probing and the (possibly empty) remaining iterator of un-probed host memory batches. The iterator returns the queue elements first, followed by the elements of the remaining iterator.

  304. class HostShuffleCoalesceIterator extends Iterator[HostConcatResult] with AutoCloseable

    Iterator that coalesces columnar batches that are expected to only contain SerializedTableColumn.

    Iterator that coalesces columnar batches that are expected to only contain SerializedTableColumn. The serialized tables within are collected up to the target batch size and then concatenated on the host before handing them to the caller on .next()

  305. class HostStringColBufferer extends LineBufferer

    Buffer the lines as a HostColumnVector of strings, one per line.

  306. class HostToGpuCoalesceIterator extends AbstractGpuCoalesceIterator

    This iterator builds GPU batches from host batches.

    This iterator builds GPU batches from host batches. The host batches potentially use Spark's UnsafeRow so it is not safe to cache these batches. Rows must be read and immediately written to CuDF builders.

  307. abstract class ImperativeAggExprMeta[INPUT <: ImperativeAggregate] extends AggExprMeta[INPUT]

    Base class for metadata around ImperativeAggregate.

  308. case class InputCheck(cudf: TypeSig, spark: TypeSig, notes: List[String] = List.empty) extends Product with Serializable

    Checks a set of named inputs to an SparkPlan node against a TypeSig

  309. final class InsertIntoHadoopFsRelationCommandMeta extends DataWritingCommandMeta[InsertIntoHadoopFsRelationCommand]
  310. class InternalExclusiveModeGpuDiscoveryPlugin extends ResourceDiscoveryPlugin with Logging

    Note, this class should not be referenced directly in source code.

    Note, this class should not be referenced directly in source code. It should be loaded by reflection using ShimLoader.newInstanceOf, see ./docs/dev/shims.md

    Attributes
    protected
  311. abstract class InternalRowToColumnarBatchIterator extends Iterator[ColumnarBatch]

    This class converts InternalRow instances to ColumnarBatches on the GPU through the magic of code generation.

    This class converts InternalRow instances to ColumnarBatches on the GPU through the magic of code generation. This just provides most of the framework a concrete implementation will be generated based off of the schema. The InternalRow instances are first converted to UnsafeRow, cheaply if the instance is already UnsafeRow, and then the UnsafeRow data is collected into a ColumnarBatch.

  312. trait JoinGatherer extends LazySpillable

    Generic trait for all join gather instances.

    Generic trait for all join gather instances. A JoinGatherer takes the gather maps that are the result of a cudf join call along with the data batches that need to be gathered and allow someone to materialize the join in batches. It also provides APIs to help decide on how many rows to gather.

    This is a LazySpillable instance so the life cycle follows that too.

  313. class JoinGathererImpl extends JoinGatherer

    JoinGatherer for a single map/table

  314. class JoinGathererSameTable extends JoinGatherer

    JoinGatherer for the case where the gather produces the same table as the input table.

  315. class JoinPartition extends AutoCloseable

    Tracks a collection of batches associated with a partition in a large join

  316. abstract class JoinPartitioner extends AutoCloseable

    Base class for a partitioner in a large join.

  317. class JustRowsColumnarBatch extends SpillableColumnarBatch

    Cudf does not support a table with columns and no rows.

    Cudf does not support a table with columns and no rows. This takes care of making one of those spillable, even though in reality there is no backing buffer. It does this by just keeping the row count in memory, and not dealing with the catalog at all.

  318. class JustRowsHostColumnarBatch extends SpillableColumnarBatch
  319. trait LazySpillable extends AutoCloseable with Retryable

    Holds something that can be spilled if it is marked as such, but it does not modify the data until it is ready to be spilled.

    Holds something that can be spilled if it is marked as such, but it does not modify the data until it is ready to be spilled. This avoids the performance penalty of making reformatting the underlying data so it is ready to be spilled.

    Call allowSpilling to indicate that the data can be released for spilling and call close to indicate that the data is not needed any longer.

    If the data is needed after allowSpilling is called the implementations should get the data back and cache it again until allowSpilling is called once more.

  320. trait LazySpillableColumnarBatch extends LazySpillable

    Holds a Columnar batch that is LazySpillable.

  321. class LazySpillableColumnarBatchImpl extends LazySpillableColumnarBatch

    Holds a columnar batch that is cached until it is marked that it can be spilled.

  322. trait LazySpillableGatherMap extends LazySpillable
  323. class LazySpillableGatherMapImpl extends LazySpillableGatherMap

    Holds a gather map that is also lazy spillable.

  324. class LeftCrossGatherMap extends BaseCrossJoinGatherMap
  325. trait LineBufferer extends AutoCloseable

    We read text files in one line at a time from Spark.

    We read text files in one line at a time from Spark. This provides an abstraction in how those lines are buffered before being processed on the GPU.

  326. trait LineBuffererFactory[BUFF <: LineBufferer] extends AnyRef

    Factory to create a LineBufferer instance that can be used to buffer lines being read in.

  327. class LiteralExprMeta extends ExprMeta[Literal]
  328. final class LocalGpuMetric extends GpuMetric

    A GPU metric class that just accumulates into a variable without implicit publishing.

  329. class MemoryBufferToHostByteBufferIterator extends AbstractHostByteBufferIterator

    Create an iterator that will emit ByteBuffer instances sequentially to work around the 2GB ByteBuffer size limitation after copying a MemoryBuffer (which is likely a DeviceMemoryBuffer) to a host-backed bounce buffer that is likely smaller than 2GB.

    Create an iterator that will emit ByteBuffer instances sequentially to work around the 2GB ByteBuffer size limitation after copying a MemoryBuffer (which is likely a DeviceMemoryBuffer) to a host-backed bounce buffer that is likely smaller than 2GB.

    returns

    ByteBuffer iterator

    Note

    It is the caller's responsibility to ensure this iterator does not outlive memoryBuffer. The iterator DOES NOT increment the reference count of memoryBuffer to ensure it remains valid.

  330. sealed trait MemoryState extends AnyRef
  331. class MetricRange extends AutoCloseable
  332. class MetricsBatchIterator extends Iterator[ColumnarBatch]
  333. sealed class MetricsLevel extends Serializable
  334. class MultiFileCloudOrcPartitionReader extends MultiFileCloudPartitionReaderBase with MultiFileReaderFunctions with OrcPartitionReaderBase

    A PartitionReader that can read multiple ORC files in parallel.

    A PartitionReader that can read multiple ORC files in parallel. This is most efficient running in a cloud environment where the I/O of reading is slow.

    Efficiently reading a ORC split on the GPU requires re-constructing the ORC file in memory that contains just the Stripes that are needed. This avoids sending unnecessary data to the GPU and saves GPU memory.

  335. class MultiFileCloudParquetPartitionReader extends MultiFileCloudPartitionReaderBase with ParquetPartitionReaderBase

    A PartitionReader that can read multiple Parquet files in parallel.

    A PartitionReader that can read multiple Parquet files in parallel. This is most efficient running in a cloud environment where the I/O of reading is slow.

    Efficiently reading a Parquet split on the GPU requires re-constructing the Parquet file in memory that contains just the column chunks that are needed. This avoids sending unnecessary data to the GPU and saves GPU memory.

  336. abstract class MultiFileCloudPartitionReaderBase extends FilePartitionReaderBase

    The Abstract multi-file cloud reading framework

    The Abstract multi-file cloud reading framework

    The data driven: next() -> if (first time) initAndStartReaders -> submit tasks (getBatchRunner) -> wait tasks done sequentially -> decode in GPU (readBatch)

  337. abstract class MultiFileCoalescingPartitionReaderBase extends FilePartitionReaderBase with MultiFileReaderFunctions

    The abstracted multi-file coalescing reading class, which tries to coalesce small ColumnarBatch into a bigger ColumnarBatch according to maxReadBatchSizeRows, maxReadBatchSizeBytes and the checkIfNeedToSplitDataBlock.

    The abstracted multi-file coalescing reading class, which tries to coalesce small ColumnarBatch into a bigger ColumnarBatch according to maxReadBatchSizeRows, maxReadBatchSizeBytes and the checkIfNeedToSplitDataBlock.

    Please be note, this class is applied to below similar file format

    | HEADER | -> optional

    | block | -> repeated

    | FOOTER | -> optional

    The data driven:

    next() -> populateCurrentBlockChunk (try the best to coalesce ColumnarBatch) -> allocate a bigger HostMemoryBuffer for HEADER + the populated block chunks + FOOTER -> write header to HostMemoryBuffer -> launch tasks to copy the blocks to the HostMemoryBuffer -> wait all tasks finished -> write footer to HostMemoryBuffer -> decode the HostMemoryBuffer in the GPU

  338. class MultiFileOrcPartitionReader extends MultiFileCoalescingPartitionReaderBase with OrcCommonFunctions

  339. class MultiFileParquetPartitionReader extends MultiFileCoalescingPartitionReaderBase with ParquetPartitionReaderBase

    A PartitionReader that can read multiple Parquet files up to the certain size.

    A PartitionReader that can read multiple Parquet files up to the certain size. It will coalesce small files together and copy the block data in a separate thread pool to speed up processing the small files before sending down to the GPU.

    Efficiently reading a Parquet split on the GPU requires re-constructing the Parquet file in memory that contains just the column chunks that are needed. This avoids sending unnecessary data to the GPU and saves GPU memory.

  340. abstract class MultiFilePartitionReaderFactoryBase extends PartitionReaderFactory with Logging

    The base multi-file partition reader factory to create the cloud reading or coalescing reading respectively.

  341. trait MultiFileReaderFunctions extends AnyRef
  342. case class MultiJoinGather(left: JoinGatherer, right: JoinGatherer) extends JoinGatherer with Product with Serializable

    Join Gatherer for a left table and a right table

  343. case class MutableBlockInfo(blockSize: Long, dataSize: Long, count: Long) extends Product with Serializable

    The mutable version of the BlockInfo without block start.

    The mutable version of the BlockInfo without block start. This is for reusing an existing instance when accessing data in the iterator pattern.

    blockSize

    the whole block size (the size between two sync buffers + sync buffer size)

    dataSize

    the data size in this block

    count

    how many entries in this block

  344. final class NoRuleDataFromReplacementRule extends DataFromReplacementRule

    A version of DataFromReplacementRule that is used when no replacement rule can be found.

  345. class NullFilteredBatchIterator extends Iterator[ColumnarBatch] with AutoCloseable

    Iterator that filters out rows with null keys.

  346. final class NullHostMemoryOutputStream extends HostMemoryOutputStream

    A HostMemoryOutputStream only counts the written bytes, nothing is actually written.

  347. class NvcompLZ4CompressionCodec extends TableCompressionCodec

    A table compression codec that uses nvcomp's LZ4-GPU codec

  348. class NvcompZSTDCompressionCodec extends TableCompressionCodec

    A table compression codec that uses nvcomp's ZSTD-GPU codec

  349. class NvtxWithMetrics extends NvtxRange

    NvtxRange with option to pass one or more nano timing metric(s) that are updated upon close by the amount of time spent in the range

  350. case class OomInjectionConf(numOoms: Int, skipCount: Int, withSplit: Boolean, oomInjectionFilter: OomInjectionType) extends Product with Serializable
  351. sealed abstract class Optimization extends AnyRef
  352. trait Optimizer extends AnyRef

    Optimizer that can operate on a physical query plan.

  353. class OptionalConfEntry[T] extends ConfEntry[Option[T]]
  354. case class OrcBlockMetaForSplitCheck(filePath: Path, typeDescription: TypeDescription, compressionKind: CompressionKind, requestedMapping: Option[Array[Int]]) extends Product with Serializable
  355. trait OrcCodecWritingHelper extends AnyRef
  356. trait OrcCommonFunctions extends OrcCodecWritingHelper

    Collections of some common functions for ORC

  357. case class OrcExtraInfo(requestedMapping: Option[Array[Int]]) extends ExtraInfo with Product with Serializable

    Orc extra information containing the requested column ids for the current coalescing stripes

  358. case class OrcOutputStripe(infoBuilder: Builder, footer: StripeFooter, inputDataRanges: DiskRangeList) extends Product with Serializable

    This class describes a stripe that will appear in the ORC output memory file.

    This class describes a stripe that will appear in the ORC output memory file.

    infoBuilder

    builder for output stripe info that has been populated with all fields except those that can only be known when the file is being written (e.g.: file offset, compressed footer length)

    footer

    stripe footer

    inputDataRanges

    input file ranges (based at file offset 0) of stripe data

  359. trait OrcPartitionReaderBase extends OrcCommonFunctions with Logging with ScanWithMetrics

    A base ORC partition reader which compose of some common methods

  360. case class OrcPartitionReaderContext(filePath: Path, conf: Configuration, fileSchema: TypeDescription, updatedReadSchema: TypeDescription, evolution: SchemaEvolution, fileTail: FileTail, compressionSize: Int, compressionKind: CompressionKind, readerOpts: Options, blockIterator: BufferedIterator[OrcOutputStripe], requestedMapping: Option[Array[Int]]) extends Product with Serializable

    This class holds fields needed to read and iterate over the OrcFile

    This class holds fields needed to read and iterate over the OrcFile

    filePath

    ORC file path

    conf

    the Hadoop configuration

    fileSchema

    the schema of the whole ORC file

    updatedReadSchema

    read schema mapped to the file's field names

    evolution

    infer and track the evolution between the schema as stored in the file and the schema that has been requested by the reader.

    fileTail

    the ORC FileTail

    compressionSize

    the ORC compression size

    compressionKind

    the ORC compression type

    readerOpts

    options for creating a RecordReader.

    blockIterator

    an iterator over the ORC output stripes

    requestedMapping

    the optional requested column ids

  361. case class OrcStripeWithMeta(stripe: OrcOutputStripe, ctx: OrcPartitionReaderContext) extends OrcCodecWritingHelper with Product with Serializable
  362. case class OrcTableReader(conf: Configuration, chunkSizeByteLimit: Long, maxChunkedReaderMemoryUsageSizeBytes: Long, parseOpts: ORCOptions, buffer: HostMemoryBuffer, offset: Long, bufferSize: Long, metrics: Map[String, GpuMetric], isSchemaCaseSensitive: Boolean, readDataSchema: StructType, tableSchema: TypeDescription, splits: Array[PartitionedFile], debugDumpPrefix: Option[String], debugDumpAlways: Boolean) extends GpuDataProducer[Table] with Logging with Product with Serializable
  363. case class OutOfCoreBatch(buffer: SpillableColumnarBatch, firstRow: UnsafeRow) extends AutoCloseable with Product with Serializable

    Holds data for the out of core sort.

    Holds data for the out of core sort. It includes the batch of data and the first row in that batch so we can sort the batches.

  364. class OverwriteByExpressionExecV1Meta extends SparkPlanMeta[OverwriteByExpressionExecV1] with HasCustomTaggingData
  365. case class ParamCheck(name: String, cudf: TypeSig, spark: TypeSig) extends Product with Serializable

    Checks a single parameter by position against a TypeSig

  366. case class ParquetCachedBatch(numRows: Int, buffer: Array[Byte]) extends CachedBatch with Product with Serializable
  367. class ParquetCachedBatchSerializer extends GpuCachedBatchSerializer

    This class assumes, the data is Columnar and the plugin is on.

    This class assumes, the data is Columnar and the plugin is on. Note, this class should not be referenced directly in source code. It should be loaded by reflection using ShimLoader.newInstanceOf, see ./docs/dev/shims.md

    Attributes
    protected
  368. class ParquetDumper extends HostBufferConsumer with AutoCloseable
  369. class ParquetExtraInfo extends ExtraInfo

    Parquet extra information containing rebase modes and whether there is int96 timestamp

  370. case class ParquetFileInfoWithBlockMeta(filePath: Path, blocks: Seq[BlockMetaData], partValues: InternalRow, schema: MessageType, readSchema: StructType, dateRebaseMode: DateTimeRebaseMode, timestampRebaseMode: DateTimeRebaseMode, hasInt96Timestamps: Boolean) extends Product with Serializable
  371. class ParquetPartitionReader extends FilePartitionReaderBase with ParquetPartitionReaderBase

    A PartitionReader that reads a Parquet file split on the GPU.

    A PartitionReader that reads a Parquet file split on the GPU.

    Efficiently reading a Parquet split on the GPU requires re-constructing the Parquet file in memory that contains just the column chunks that are needed. This avoids sending unnecessary data to the GPU and saves GPU memory.

  372. trait ParquetPartitionReaderBase extends Logging with ScanWithMetrics with MultiFileReaderFunctions
  373. case class ParquetTableReader(conf: Configuration, chunkSizeByteLimit: Long, maxChunkedReaderMemoryUsageSizeBytes: Long, opts: ParquetOptions, buffer: HostMemoryBuffer, offset: Long, len: Long, metrics: Map[String, GpuMetric], dateRebaseMode: DateTimeRebaseMode, timestampRebaseMode: DateTimeRebaseMode, hasInt96Timestamps: Boolean, isSchemaCaseSensitive: Boolean, useFieldId: Boolean, readDataSchema: StructType, clippedParquetSchema: MessageType, splits: Array[PartitionedFile], debugDumpPrefix: Option[String], debugDumpAlways: Boolean) extends GpuDataProducer[Table] with Logging with Product with Serializable
  374. abstract class PartChecks extends TypeChecks[Map[String, SupportLevel]]

    Base class all Partition checks must follow

  375. case class PartChecksImpl(paramCheck: Seq[ParamCheck] = Seq.empty, repeatingParamCheck: Option[RepeatingParamCheck] = None) extends PartChecks with Product with Serializable
  376. abstract class PartMeta[INPUT <: Partitioning] extends RapidsMeta[INPUT, Partitioning, GpuPartitioning]

    Base class for metadata around Partitioning.

  377. class PartRule[INPUT <: Partitioning] extends ReplacementRule[INPUT, Partitioning, PartMeta[INPUT]]

    Holds everything that is needed to replace a Partitioning with a GPU enabled version.

  378. class PartiallySupported extends SupportLevel

    The plugin partially supports this type.

  379. class PartitionIterator[T] extends Iterator[T]
  380. class PartitionReaderIterator extends Iterator[ColumnarBatch] with AutoCloseable

    An adaptor class that provides an Iterator interface for a PartitionReader.

  381. class PartitionReaderWithBytesRead extends PartitionReader[ColumnarBatch]

    Wraps a columnar PartitionReader to update bytes read metric based on filesystem statistics.

  382. case class PartitionRowData(rowValue: InternalRow, rowNum: Int) extends Product with Serializable

    Wrapper class that specifies how many rows to replicate the partition value.

  383. case class PartitionedFileInfoOptAlluxio(toRead: PartitionedFile, original: Option[PartitionedFile]) extends Product with Serializable
  384. sealed trait PathInstruction extends AnyRef
  385. class Pending extends AutoCloseable

    Data that the out of core sort algorithm has not finished sorting.

    Data that the out of core sort algorithm has not finished sorting. This acts as a priority queue with each batch sorted by the first row in that batch.

  386. trait PlanShims extends AnyRef
  387. class PluginException extends RuntimeException
  388. class PreProjectSplitIterator extends AbstractProjectSplitIterator

    An iterator that can be used to split the input of a project before it happens to prevent situations where the output could not be split later on.

    An iterator that can be used to split the input of a project before it happens to prevent situations where the output could not be split later on. In testing we tried to see what would happen if we split it to the target batch size, but there was a very significant performance degradation when that happened. For now this is only used in a few specific places and not everywhere. In the future this could be extended, but if we do that there are some places where we don't want a split, like a project before a window operation.

  389. class PrioritySemaphore[T] extends AnyRef
  390. case class ProfileEndMsg(executorId: String, path: String) extends ProfileMsg with Product with Serializable
  391. case class ProfileInitMsg(executorId: String, path: String) extends ProfileMsg with Product with Serializable
  392. case class ProfileJobStageQueryMsg(activeJobs: Array[Int], activeStages: Array[Int]) extends ProfileMsg with Product with Serializable
  393. trait ProfileMsg extends AnyRef
  394. case class ProfileStatusMsg(executorId: String, msg: String) extends ProfileMsg with Product with Serializable
  395. class ProfileWriter extends DataWriter with Logging
  396. sealed case class QuantifierFixedLength(length: Int) extends RegexQuantifier with Product with Serializable
  397. sealed case class QuantifierVariableLength(minLength: Int, maxLength: Option[Int]) extends RegexQuantifier with Product with Serializable
  398. abstract class QuaternaryExprMeta[INPUT <: QuaternaryExpression] extends ExprMeta[INPUT]

    Base class for metadata around QuaternaryExpression.

  399. class RangeConfMatcher extends AnyRef

    Determines if a value is in a comma-separated list of values and/or hyphenated ranges provided by the user for a configuration setting.

  400. trait RapidsBuffer extends AutoCloseable

    Interface provided by all types of RAPIDS buffers

  401. class RapidsBufferCatalog extends AutoCloseable with Logging

    Catalog for lookup of buffers by ID.

    Catalog for lookup of buffers by ID. The constructor is only visible for testing, generally RapidsBufferCatalog.singleton should be used instead.

  402. trait RapidsBufferChannelWritable extends AnyRef
  403. class RapidsBufferCopyIterator extends Iterator[MemoryBuffer] with AutoCloseable with Logging

    This iterator encapsulates a buffer's internal MemoryBuffer access for spill reasons.

    This iterator encapsulates a buffer's internal MemoryBuffer access for spill reasons. Internally, there are two known implementations: - either this is a "single shot" copy, where the entirety of the RapidsBuffer is already represented as a single contiguous blob of memory, then the expectation is that this iterator is exhausted with a single call to next - or, we have a RapidsBuffer that isn't contiguous. This iteration will then drive a ChunkedPacker to pack the RapidsBuffer's table as needed. The iterator will likely need several calls to next to be exhausted.

  404. trait RapidsBufferHandle extends AutoCloseable

    An object that client code uses to interact with an underlying RapidsBufferId.

    An object that client code uses to interact with an underlying RapidsBufferId.

    A handle is obtained when a buffer, batch, or table is added to the spill framework via the RapidsBufferCatalog api.

  405. trait RapidsBufferId extends AnyRef

    An identifier for a RAPIDS buffer that can be automatically spilled between buffer stores.

    An identifier for a RAPIDS buffer that can be automatically spilled between buffer stores. NOTE: Derived classes MUST implement proper hashCode and equals methods, as these objects are used as keys in hash maps. Scala case classes are recommended.

  406. abstract class RapidsBufferStore extends AutoCloseable with Logging

    Base class for all buffer store types.

  407. abstract class RapidsBufferStoreWithoutSpill extends RapidsBufferStore

    Buffers that inherit from this type do not support changing the spillable status of a RapidsBuffer.

    Buffers that inherit from this type do not support changing the spillable status of a RapidsBuffer. This is only used right now for disk.

  408. class RapidsConf extends Logging
  409. class RapidsDeviceMemoryStore extends RapidsBufferStore

    Buffer storage using device memory.

  410. class RapidsDiskStore extends RapidsBufferStoreWithoutSpill

    A buffer store using files on the local disks.

  411. class RapidsDriverPlugin extends DriverPlugin with Logging

    The Spark driver plugin provided by the RAPIDS Spark plugin.

  412. class RapidsExecutorPlugin extends ExecutorPlugin with Logging

    The Spark executor plugin provided by the RAPIDS Spark plugin.

  413. trait RapidsHostBatchBuffer extends AutoCloseable
  414. final class RapidsHostColumnBuilder extends AutoCloseable

    This is a copy of the cudf HostColumnVector.ColumnBuilder class.

    This is a copy of the cudf HostColumnVector.ColumnBuilder class. Moving this here to allow for iterating on host memory oom handling.

  415. final class RapidsHostColumnVector extends RapidsHostColumnVectorCore

    A GPU accelerated version of the Spark ColumnVector.

    A GPU accelerated version of the Spark ColumnVector. Most of the standard Spark APIs should never be called, as they assume that the data is on the host, and we want to keep as much of the data on the device as possible. We also provide GPU accelerated versions of the transitions to and from rows.

  416. class RapidsHostColumnVectorCore extends ColumnVector

    A GPU accelerated version of the Spark ColumnVector.

    A GPU accelerated version of the Spark ColumnVector. Most of the standard Spark APIs should never be called, as they assume that the data is on the host, and we want to keep as much of the data on the device as possible. We also provide GPU accelerated versions of the transitions to and from rows.

  417. class RapidsHostMemoryStore extends RapidsBufferStore

    A buffer store using host memory.

  418. abstract class RapidsMeta[INPUT <: BASE, BASE, OUTPUT <: BASE] extends AnyRef

    Holds metadata about a stage in the physical plan that is separate from the plan itself.

    Holds metadata about a stage in the physical plan that is separate from the plan itself. This is helpful in deciding when to replace part of the plan with a GPU enabled version.

    INPUT

    the exact type of the class we are wrapping.

    BASE

    the generic base class for this type of stage, i.e. SparkPlan, Expression, etc.

    OUTPUT

    when converting to a GPU enabled version of the plan, the generic base type for all GPU enabled versions.

  419. final class RapidsNullSafeHostColumnVector extends RapidsNullSafeHostColumnVectorCore

    Wrapper of a RapidsHostColumnVector, which will check nulls in each "getXXX" call and return the default value of a type when trying to read a null.

    Wrapper of a RapidsHostColumnVector, which will check nulls in each "getXXX" call and return the default value of a type when trying to read a null. The performance may not be good enough, so use it only when there is no other way.

  420. class RapidsNullSafeHostColumnVectorCore extends ColumnVector

    Wrapper of a RapidsHostColumnVectorCore, which will check nulls in each "getXXX" call and return the default value of a type when trying to read a null.

    Wrapper of a RapidsHostColumnVectorCore, which will check nulls in each "getXXX" call and return the default value of a type when trying to read a null. The performance may not be good enough, so use it only when there is no other way.

  421. class RapidsSerializerManager extends AnyRef

    It's a wrapper of Spark's SerializerManager, which supports compression and encryption on data streams.

    It's a wrapper of Spark's SerializerManager, which supports compression and encryption on data streams. For compression, it's turned on/off via seperated Rapids configurations and the underlying compression codec uses existing Spark's. For encryption, it's controlled by Spark's configuration to turn on/off.

  422. class RapidsShuffleHeartbeatEndpoint extends Logging with AutoCloseable
  423. class RapidsShuffleHeartbeatManager extends Logging
  424. class RapidsStack[T] extends AnyRef
  425. sealed trait RegexAST extends AnyRef
  426. sealed case class RegexBackref(num: Int, isNew: Boolean = false) extends RegexAST with Product with Serializable
  427. sealed case class RegexChar(ch: Char) extends RegexCharacterClassComponent with Product with Serializable
  428. sealed case class RegexCharacterClass(negated: Boolean, characters: ListBuffer[RegexCharacterClassComponent]) extends RegexAST with Product with Serializable
  429. sealed trait RegexCharacterClassComponent extends RegexAST
  430. sealed case class RegexCharacterRange(start: RegexCharacterClassComponent, end: RegexCharacterClassComponent) extends RegexCharacterClassComponent with Product with Serializable
  431. sealed case class RegexChoice(a: RegexAST, b: RegexAST) extends RegexAST with Product with Serializable
  432. sealed case class RegexEmpty() extends RegexAST with Product with Serializable
  433. sealed case class RegexEscaped(a: Char) extends RegexCharacterClassComponent with Product with Serializable
  434. sealed case class RegexGroup(capture: Boolean, term: RegexAST, lookahead: Option[RegexLookahead]) extends RegexAST with Product with Serializable
  435. sealed case class RegexHexDigit(a: String) extends RegexCharacterClassComponent with Product with Serializable
  436. sealed trait RegexLookahead extends AnyRef
  437. sealed trait RegexMode extends AnyRef
  438. sealed case class RegexOctalChar(a: String) extends RegexCharacterClassComponent with Product with Serializable
  439. sealed trait RegexOptimizationType extends AnyRef
  440. class RegexParser extends AnyRef

    Regular expression parser based on a Pratt Parser design.

    Regular expression parser based on a Pratt Parser design.

    The goal of this parser is to build a minimal AST that allows us to validate that we can support the expression on the GPU. The goal is not to parse with the level of detail that would be required if we were building an evaluation engine. For example, operator precedence is largely ignored but could be added if we need it later.

    The Java and cuDF regular expression documentation has been used as a reference:

    Java regex: https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html cuDF regex: https://docs.rapids.ai/api/libcudf/stable/md_regex.html

    The following blog posts provide some background on Pratt Parsers and parsing regex.

    - https://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-expression-parsing-made-easy/ - https://matt.might.net/articles/parsing-regex-with-recursive-descent/

  441. sealed trait RegexQuantifier extends RegexAST
  442. sealed case class RegexRepetition(a: RegexAST, quantifier: RegexQuantifier) extends RegexAST with Product with Serializable
  443. sealed case class RegexReplacement(parts: ListBuffer[RegexAST], numCaptureGroups: Int = 0) extends RegexAST with Product with Serializable
  444. sealed class RegexRewriteFlags extends AnyRef
  445. sealed case class RegexSequence(parts: ListBuffer[RegexAST]) extends RegexAST with Product with Serializable
  446. class RegexUnsupportedException extends SQLException
  447. case class RepeatingParamCheck(name: String, cudf: TypeSig, spark: TypeSig) extends Product with Serializable

    Checks the type signature for a parameter that repeats (Can only be used at the end of a list of position parameters)

  448. case class ReplaceSection[INPUT <: SparkPlan](plan: SparkPlanMeta[INPUT], totalCpuCost: Double, totalGpuCost: Double) extends Optimization with Product with Serializable
  449. abstract class ReplacementRule[INPUT <: BASE, BASE, WRAP_TYPE <: RapidsMeta[INPUT, BASE, _]] extends DataFromReplacementRule

    Base class for all ReplacementRules

    Base class for all ReplacementRules

    INPUT

    the exact type of the class we are wrapping.

    BASE

    the generic base class for this type of stage, i.e. SparkPlan, Expression, etc.

    WRAP_TYPE

    base class that should be returned by doWrap.

  450. abstract class ReplicateRowsExprMeta[INPUT <: ReplicateRows] extends GeneratorExprMeta[INPUT]

    Base class for metadata around GeneratorExprMeta.

  451. trait RequireSingleBatchLike extends AnyRef

    Trait used for pattern matching for single batch coalesce goals.

  452. case class RequireSingleBatchWithFilter(filterExpression: GpuExpression) extends CoalesceSizeGoal with RequireSingleBatchLike with Product with Serializable

    This is exactly the same as RequireSingleBatch except that if the batch would fail to coalesce because it reaches cuDF row-count limits, the coalesce code is free to null filter given the filter expression in filterExpression.

    This is exactly the same as RequireSingleBatch except that if the batch would fail to coalesce because it reaches cuDF row-count limits, the coalesce code is free to null filter given the filter expression in filterExpression.

    Note

    This is an ugly hack because ideally these rows are never read from the input source given that we normally push down IsNotNull in Spark. This should be removed when we can handle this in a proper way, likely at the logical plan optimization level. More details here: https://issues.apache.org/jira/browse/SPARK-39131

  453. class RightCrossGatherMap extends BaseCrossJoinGatherMap
  454. class RowToColumnarIterator extends Iterator[ColumnarBatch]
  455. final class RuleNotFoundCreatableRelationProviderMeta[INPUT <: CreatableRelationProvider] extends CreatableRelationProviderMeta[INPUT]
  456. final class RuleNotFoundDataWritingCommandMeta[INPUT <: DataWritingCommand] extends DataWritingCommandMeta[INPUT]

    Metadata for DataWritingCommand with no rule found

  457. final class RuleNotFoundExprMeta[INPUT <: Expression] extends ExprMeta[INPUT]

    Metadata for Expression with no rule found

  458. final class RuleNotFoundPartMeta[INPUT <: Partitioning] extends PartMeta[INPUT]

    Metadata for Partitioning with no rule found

  459. final class RuleNotFoundRunnableCommandMeta[INPUT <: RunnableCommand] extends RunnableCommandMeta[INPUT]

    Metadata for RunnableCommand with no rule found

  460. final class RuleNotFoundScanMeta[INPUT <: Scan] extends ScanMeta[INPUT]

    Metadata for Scan with no rule found

  461. final class RuleNotFoundSparkPlanMeta[INPUT <: SparkPlan] extends SparkPlanMeta[INPUT]

    Metadata for SparkPlan with no rule found

  462. abstract class RunnableCommandMeta[INPUT <: RunnableCommand] extends RapidsMeta[INPUT, RunnableCommand, RunnableCommand]

    Base class for metadata around RunnableCommand.

  463. class RunnableCommandRule[INPUT <: RunnableCommand] extends ReplacementRule[INPUT, RunnableCommand, RunnableCommandMeta[INPUT]]

    Holds everything that is needed to replace a RunnableCommand with a GPU enabled version.

  464. abstract class RuntimeReplaceableUnaryAstExprMeta[INPUT <: RuntimeReplaceable] extends RuntimeReplaceableUnaryExprMeta[INPUT]

    Base metadata class for RuntimeReplaceable expressions that support conversion to AST as well

  465. abstract class RuntimeReplaceableUnaryExprMeta[INPUT <: RuntimeReplaceable] extends UnaryExprMetaBase[INPUT]

    Base class for metadata around RuntimeReplaceableExpression.

    Base class for metadata around RuntimeReplaceableExpression. We will never get a RuntimeReplaceableExpression as it will be converted to the actual Expression by the time we get it. We need to have this here as some Expressions e.g. UnaryPositive don't extend UnaryExpression.

  466. class SaveIntoDataSourceCommandMeta extends RunnableCommandMeta[SaveIntoDataSourceCommand]
  467. abstract class ScanMeta[INPUT <: Scan] extends RapidsMeta[INPUT, Scan, GpuScan]

    Base class for metadata around Scan.

  468. class ScanRule[INPUT <: Scan] extends ReplacementRule[INPUT, Scan, ScanMeta[INPUT]]

    Holds everything that is needed to replace a Scan with a GPU enabled version.

  469. trait ScanWithMetrics extends AnyRef
  470. trait SchemaBase extends AnyRef

    A common trait for different schema in the MultiFileCoalescingPartitionReaderBase.

    A common trait for different schema in the MultiFileCoalescingPartitionReaderBase.

    The sub-class should wrap the real schema for the specific file format

  471. class SerializedBatchIterator extends Iterator[(Int, ColumnarBatch)]
  472. class SerializedTableColumn extends GpuColumnVectorBase

    A special ColumnVector that describes a serialized table read from shuffle.

    A special ColumnVector that describes a serialized table read from shuffle. This appears in a ColumnarBatch to pass serialized tables to GpuShuffleCoalesceExec which should always appear in the query plan immediately after a shuffle.

  473. trait ShimTaggingExpression extends UnaryExpression with TaggingExpression with ShimUnaryExpression
  474. class ShuffleBufferCatalog extends Logging

    Catalog for lookup of shuffle buffers by block ID

  475. case class ShuffleBufferId(blockId: ShuffleBlockId, tableId: Int) extends RapidsBufferId with Product with Serializable

    Identifier for a shuffle buffer that holds the data for a table

  476. class ShuffleReceivedBufferCatalog extends Logging

    Catalog for lookup of shuffle buffers by block ID

  477. case class ShuffleReceivedBufferId(tableId: Int) extends RapidsBufferId with Product with Serializable

    Identifier for a shuffle buffer that holds the data for a table on the read side

  478. sealed case class SimpleQuantifier(ch: Char) extends RegexQuantifier with Product with Serializable
  479. trait SingleDataBlockInfo extends AnyRef

    A single block info of a file, Eg, A parquet file has 3 RowGroup, then it will produce 3 SingleBlockInfoWithMeta

  480. class SingleGpuColumnarBatchIterator extends GpuColumnarBatchIterator
  481. class SingleGpuDataProducer[T <: AnyRef] extends GpuDataProducer[T]
  482. case class SingleHMBAndMeta(hmb: HostMemoryBuffer, bytes: Long, numRows: Long, blockMeta: Seq[DataBlockBase]) extends Product with Serializable

    This contains a single HostMemoryBuffer along with other metadata needed for combining the buffers before sending to GPU.

  483. class SlicedGpuColumnVector extends ColumnVector

    Wraps a GpuColumnVector but only points to a slice of it.

    Wraps a GpuColumnVector but only points to a slice of it. This is intended to only be used during shuffle after the data is partitioned and before it is serialized.

  484. sealed trait SortExecType extends Serializable
  485. abstract class SparkPlanMeta[INPUT <: SparkPlan] extends RapidsMeta[INPUT, SparkPlan, GpuExec]

    Base class for metadata around SparkPlan.

  486. case class SparkRapidsBuildInfoEvent(sparkRapidsBuildInfo: Map[String, String], sparkRapidsJniBuildInfo: Map[String, String], cudfBuildInfo: Map[String, String], sparkRapidsPrivateBuildInfo: Map[String, String]) extends SparkListenerEvent with Product with Serializable
  487. trait SparkShims extends AnyRef
  488. trait SpillAction extends AnyRef

    Helper case classes that contain the buffer we spilled or unspilled from our current tier and likely a new buffer created in a target store tier, but it can be set to None.

    Helper case classes that contain the buffer we spilled or unspilled from our current tier and likely a new buffer created in a target store tier, but it can be set to None. If the buffer already exists in the target store, newBuffer will be None.

  489. class SpillableBuffer extends AutoCloseable

    Just like a SpillableColumnarBatch but for buffers.

  490. trait SpillableColumnarBatch extends AutoCloseable

    Holds a ColumnarBatch that the backing buffers on it can be spilled.

  491. class SpillableColumnarBatchImpl extends SpillableColumnarBatch

    The implementation of SpillableColumnarBatch that points to buffers that can be spilled.

    The implementation of SpillableColumnarBatch that points to buffers that can be spilled.

    Note

    the buffer should be in the cache by the time this is created and this is taking over ownership of the life cycle of the batch. So don't call this constructor directly please use SpillableColumnarBatch.apply instead.

  492. class SpillableColumnarBatchQueueIterator extends GpuColumnarBatchIterator

    Iterator that produces columnar batches from a queue of spillable batches that were fetched first during probing and the (possibly empty) remaining iterator fo un-probed batches.

    Iterator that produces columnar batches from a queue of spillable batches that were fetched first during probing and the (possibly empty) remaining iterator fo un-probed batches. The iterator returns the queue elements first, followed by the elements of the remaining iterator.

  493. class SpillableHostBuffer extends AutoCloseable

    This represents a spillable HostMemoryBuffer and adds an interface to access this host buffer at the host layer, unlike SpillableBuffer (device)

  494. class SpillableHostColumnarBatchImpl extends SpillableColumnarBatch

    The implementation of SpillableHostColumnarBatch that points to buffers that can be spilled.

    The implementation of SpillableHostColumnarBatch that points to buffers that can be spilled.

    Note

    the buffer should be in the cache by the time this is created and this is taking over ownership of the life cycle of the batch. So don't call this constructor directly please use SpillableHostColumnarBatch.apply instead.

  495. class SpillableHostConcatResult extends AutoCloseable

    A spillable form of a HostConcatResult.

    A spillable form of a HostConcatResult. Takes ownership of the specified host buffer.

  496. class SpillableHostConcatResultFromColumnarBatchIterator extends Iterator[SpillableHostConcatResult]

    Converts an iterator of shuffle batches in host memory into an iterator of spillable host memory batches.

  497. trait SplittableGoal extends AnyRef

    Trait used for pattern matching for goals that could be split, as they only specify that batches won't be too much bigger than a maximum target size in bytes.

  498. abstract class SplittableJoinIterator extends AbstractGpuJoinIterator with Logging

    Base class for join iterators that split and spill batches to avoid GPU OOM errors.

  499. class StrategyRules extends Strategy

    Provides a Strategy that can implement rules for translating custom logical plan nodes to physical plan nodes.

    Provides a Strategy that can implement rules for translating custom logical plan nodes to physical plan nodes.

    Note

    This is instantiated via reflection from ShimLoader.

  500. class StreamSidePartitioner extends JoinPartitioner

    Join partitioner for the stream side of a large join.

  501. abstract class String2TrimExpressionMeta[INPUT <: String2TrimExpression] extends ExprMeta[INPUT]
  502. sealed abstract class SupportLevel extends AnyRef

    The level of support that the plugin has for a given type.

    The level of support that the plugin has for a given type. Used for documentation generation.

  503. class Supported extends SupportLevel

    Both Spark and the plugin support this.

  504. trait TableCompressionCodec extends AnyRef

    An interface to a compression codec that can compress a contiguous Table on the GPU

  505. case class TableCompressionCodecConfig(lz4ChunkSize: Long, zstdChunkSize: Long) extends Product with Serializable

    A small case class used to carry codec-specific settings.

  506. case class TargetSize(targetSizeBytes: Long) extends CoalesceSizeGoal with SplittableGoal with Product with Serializable

    Produce a stream of batches that are at most the given size in bytes.

    Produce a stream of batches that are at most the given size in bytes. The size is estimated in some cases so it may go over a little, but it should generally be very close to the target size. Generally you should not go over 2 GiB to avoid limitations in cudf for nested type columns.

    targetSizeBytes

    the size of each batch in bytes.

  507. trait TaskAutoCloseableResource extends AutoCloseable
  508. trait TaskCompletionCallbackHandle extends AnyRef

    A handle that can be used to remove a callback if needed

  509. abstract class TernaryExprMeta[INPUT <: TernaryExpression] extends ExprMeta[INPUT]

    Base class for metadata around TernaryExpression.

  510. class ThreadFactoryBuilder extends AnyRef
  511. class ToPrettyStringChecks extends CastChecks

    This class is just restricting the 'to' dataType to a StringType in the CastChecks class

  512. sealed trait TryAcquireResult extends AnyRef

    The result of trying to acquire a semaphore could be SemaphoreAcquired or AcquireFailed.

  513. abstract class TypeChecks[RET] extends AnyRef
  514. final class TypeSig extends AnyRef

    A type signature.

    A type signature. This is a bit limited in what it supports right now, but can express a set of base types and a separate set of types that can be nested under the base types (child types). It can also express if a particular base type has to be a literal or not.

  515. trait TypeSigUtilBase extends AnyRef

    Trait of TypeSigUtil for different spark versions

  516. class TypedConfBuilder[T] extends AnyRef
  517. abstract class TypedImperativeAggExprMeta[INPUT <: TypedImperativeAggregate[_]] extends ImperativeAggExprMeta[INPUT]

    Base class for metadata around TypedImperativeAggregate.

  518. abstract class UnaryAstExprMeta[INPUT <: UnaryExpression] extends UnaryExprMeta[INPUT]

    Base metadata class for unary expressions that support conversion to AST as well

  519. abstract class UnaryExprMeta[INPUT <: Expression with UnaryLike[Expression]] extends UnaryExprMetaBase[INPUT]

    Base class for metadata around UnaryExpression.

  520. abstract class UnaryExprMetaBase[INPUT <: Expression] extends ExprMeta[INPUT]
    Attributes
    protected
  521. trait WithTableBuffer extends AnyRef

    An interface for obtaining the device buffer backing a contiguous/packed table

  522. class WrappedGpuDataProducer[T, U] extends GpuDataProducer[T]
  523. final case class WrappedGpuMetric(sqlMetric: SQLMetric, withMetricsExclSemWait: Boolean = false) extends GpuMetric with Product with Serializable

Value Members

  1. object AggregateModeInfo extends Serializable
  2. object AggregateUtils
  3. object AlluxioCfgUtils
  4. object AlluxioUtils extends Logging
  5. object Arm extends ArmScalaSpecificImpl

    Implementation of the automatic-resource-management pattern

  6. object ArrayIndexUtils
  7. object AstExprContext extends ExpressionContext

    This is a special context.

    This is a special context. All other contexts are determined by the Spark query in a generic way. AST support in many cases is an optimization and so it is tagged and checked after it is determined that this operation will run on the GPU. In other cases it is required. In those cases AST support is determined and used when tagging the metas to see if they will work on the GPU or not. This part is not done automatically.

  8. object AstUtil
  9. object AutoCloseColumnBatchIterator
  10. object AvroFileReader
  11. object AvroFileWriter
  12. object AvroFormatType extends FileFormatType
  13. object BatchWithPartitionDataUtils
  14. object BoolUtils
  15. object CSVPartitionReader
  16. object CachedGpuBatchIterator

    Provides a transition between a GpuDataProducer[Table] and an Iterator[ColumnarBatch].

    Provides a transition between a GpuDataProducer[Table] and an Iterator[ColumnarBatch]. Because of the disconnect in semantics between a GpuDataProducer and generally how we use an Iterator[ColumnarBatch] pointing to GPU data this will drain the producer, converting the data to columnar batches, and make them all spillable so the GpuSemaphore can be released in between each call to next. There is one special case, if there is only one table from the producer it will not be made spillable on the assumption that the semaphore is already held and will not be released before the first table is consumed. This is also fitting with the semantics of how we use an Iterator[ColumnarBatch] pointing to GPU data.

  17. object CaseWhenCheck extends ExprChecks

    This is specific to CaseWhen, because it does not follow the typical parameter convention.

  18. object CastOptions extends Serializable
  19. object ChunkedPacker
  20. object CoalesceGoal
  21. object ColumnCastUtil

    This class casts a column to another column if the predicate passed resolves to true.

    This class casts a column to another column if the predicate passed resolves to true. This method should be able to handle nested or non-nested types

    At this time this is strictly a place for casting methods

  22. object ColumnarOutputWriter
  23. object ColumnarPartitionReaderWithPartitionValues
  24. object ConcatAndConsumeAll

    Consumes an Iterator of ColumnarBatches and concatenates them into a single ColumnarBatch.

    Consumes an Iterator of ColumnarBatches and concatenates them into a single ColumnarBatch. The batches will be closed when this operation is done.

  25. object ConfHelper
  26. object CreateMapCheck extends ExprChecks
  27. object CreateNamedStructCheck extends ExprChecks

    A check for CreateNamedStruct.

    A check for CreateNamedStruct. The parameter values alternate between one type and another. If this pattern shows up again we can make this more generic at that point.

  28. object CsvFormatType extends FileFormatType
  29. object CudfRowTransitions
  30. object CudfUnaryExpression
  31. object DataTypeMeta
  32. object DataTypeUtils
  33. object DateTimeRebaseCorrected extends DateTimeRebaseMode with Product with Serializable

    Mirror of Spark's LegacyBehaviorPolicy.CORRECTED.

  34. object DateTimeRebaseException extends DateTimeRebaseMode with Product with Serializable

    Mirror of Spark's LegacyBehaviorPolicy.EXCEPTION.

  35. object DateTimeRebaseLegacy extends DateTimeRebaseMode with Product with Serializable

    Mirror of Spark's LegacyBehaviorPolicy.LEGACY.

  36. object DateTimeRebaseMode extends Serializable
  37. object DateTimeRebaseUtils
  38. object DateUtils

    Class for helper functions for Date

  39. object DecimalUtil
  40. object DeltaFormatType extends FileFormatType
  41. object DeviceBuffersUtils
  42. object DumpUtils extends Logging
  43. object EmptyGpuColumnarBatchIterator extends GpuColumnarBatchIterator
  44. object EmptyTableReader extends EmptyGpuDataProducer[Table]
  45. object ExecChecks

    gives users an API to create ExecChecks.

  46. object Explain
  47. object ExplainPlan
  48. object ExprChecks
  49. object ExpressionContext
  50. object FileFormatChecks
  51. object FileUtils
  52. object FilterEmptyHostLineBuffererFactory extends LineBuffererFactory[HostLineBufferer]
  53. object FloatUtils
  54. object FullSortSingleBatch extends SortExecType
  55. object GatherUtils
  56. object GeneratedInternalRowToCudfRowIterator extends Logging
  57. object GpuAggFinalPassIterator
  58. object GpuAggFirstPassIterator
  59. object GpuAggregateIterator extends Logging
  60. object GpuBaseAggregateMeta
  61. object GpuBatchUtils

    Utility class with methods for calculating various metrics about GPU memory usage prior to allocation, along with some operations with batches.

  62. object GpuBindReferences extends Logging
  63. object GpuBuildLeft extends GpuBuildSide with Product with Serializable
  64. object GpuBuildRight extends GpuBuildSide with Product with Serializable
  65. object GpuCSVScan extends Serializable
  66. object GpuCanonicalize

    Rewrites an expression using rules that are guaranteed preserve the result while attempting to remove cosmetic variations.

    Rewrites an expression using rules that are guaranteed preserve the result while attempting to remove cosmetic variations. Deterministic expressions that are equal after canonicalization will always return the same answer given the same input (i.e. false positives should not be possible). However, it is possible that two canonical expressions that are not equal will in fact return the same answer given any input (i.e. false negatives are possible).

    The following rules are applied:

    • Names and nullability hints for org.apache.spark.sql.types.DataTypes are stripped.
    • Names for GetStructField are stripped.
    • TimeZoneId for Cast and AnsiCast are stripped if needsTimeZone is false.
    • Commutative and associative operations (Add and Multiply) have their children ordered by hashCode.
    • EqualTo and EqualNullSafe are reordered by hashCode.
    • Other comparisons (GreaterThan, LessThan) are reversed by hashCode.
    • Elements in In are reordered by hashCode.

    This is essentially a copy of the Spark Canonicalize class but updated for GPU operators

  67. object GpuCast extends Serializable
  68. object GpuCoalesceExec extends Serializable
  69. object GpuColumnarToRowExec extends Serializable
  70. object GpuCoreDumpHandler extends Logging
  71. object GpuDataProducer
  72. object GpuDataWritingCommand
  73. object GpuDeviceManager extends Logging
  74. object GpuEvalMode extends Enumeration

    Expression evaluation modes.

    Expression evaluation modes.

    • LEGACY: the default evaluation mode, which is compliant to Hive SQL.
    • ANSI: a evaluation mode which is compliant to ANSI SQL standard.
    • TRY: a evaluation mode for try_* functions. It is identical to ANSI evaluation mode except for returning null result on errors.
  75. object GpuExec extends Serializable
  76. object GpuExpressionWithSideEffectUtils
  77. object GpuExpressionsUtils
  78. object GpuFilter

    Run a filter on a batch.

    Run a filter on a batch. The batch will be consumed.

  79. object GpuGetJsonObject extends Serializable
  80. object GpuHashAggregateExecBase
  81. object GpuHashPartitioningBase extends Serializable
  82. object GpuJoinUtils
  83. object GpuKeyBatchingIterator
  84. object GpuListUtils

    Provide a set of APIs to manipulate array/list columns in common ways.

  85. object GpuLiteral extends Serializable
  86. object GpuMapUtils

    Provide a set of APIs to manipulate map columns in common ways.

    Provide a set of APIs to manipulate map columns in common ways. CUDF does not officially support maps so we store it as a list of key/value structs.

  87. object GpuMetric extends Logging with Serializable
  88. object GpuNvl
  89. object GpuOrcScan extends Serializable
  90. object GpuOverrideUtil extends Logging
  91. object GpuOverrides extends Logging with Serializable
  92. object GpuParquetFileFormat
  93. object GpuParquetScan extends Serializable
  94. object GpuParquetUtils extends Logging
  95. object GpuPartitioning
  96. object GpuProjectExec extends Serializable
  97. object GpuProjectExecLike extends Serializable
  98. object GpuRangePartitioner extends Serializable
  99. object GpuReadCSVFileFormat
  100. object GpuReadOrcFileFormat extends Serializable
  101. object GpuReadParquetFileFormat extends Serializable
  102. object GpuRegExpStringReplace extends GpuRegExpReplaceOpt
    Annotations
    @SerialVersionUID()
  103. object GpuRegExpStringReplaceMulti extends GpuRegExpReplaceOpt
    Annotations
    @SerialVersionUID()
  104. object GpuRunnableCommand
  105. object GpuScalar extends Logging
  106. object GpuSemaphore
  107. object GpuShuffledAsymmetricHashJoinExec extends Serializable
  108. object GpuShuffledHashJoinExec extends Logging with Serializable
  109. object GpuShuffledSizedHashJoinExec extends Serializable
  110. object GpuShuffledSymmetricHashJoinExec extends Serializable
  111. object GpuSinglePartitioning extends Expression with GpuExpression with ShimExpression with GpuPartitioning with Product with Serializable
  112. object GpuSortExec extends Serializable
  113. object GpuSpillableProjectedSortEachBatchIterator

    Create an iterator that will sort each batch as it comes in.

    Create an iterator that will sort each batch as it comes in. It will keep any projected columns in place after doing the sort on the assumption that you want to possibly combine them in some way afterwards.

  114. object GpuTextBasedDateUtils
  115. object GpuTextBasedPartitionReader
  116. object GpuTopN extends Serializable
  117. object GpuTransitionOverrides
  118. object GpuTypedImperativeSupportedAggregateExecMeta
  119. object GpuUnsignedIntegerType extends GpuUnsignedIntegerType with Product with Serializable
  120. object GpuUnsignedLongType extends GpuUnsignedLongType with Product with Serializable
  121. object GpuUserDefinedFunction extends Serializable
  122. object GroupByAggExprContext extends ExpressionContext
  123. object Header extends Serializable
  124. object HiveDelimitedTextFormatType extends FileFormatType
  125. object HostAlloc

    A new API for host memory allocation.

    A new API for host memory allocation. This can be used to limit the amount of host memory.

  126. object HostColumnarToGpu extends Logging with Serializable
  127. object HostLineBuffererFactory extends LineBuffererFactory[HostLineBufferer]
  128. object HostStringColBuffererFactory extends LineBuffererFactory[HostStringColBufferer]
  129. object IcebergFormatType extends FileFormatType
  130. object InputFileBlockRule

    A rule prevents the plans [SparkPlan (with first input_file_xxx expression), FileScan) from running on GPU.

    A rule prevents the plans [SparkPlan (with first input_file_xxx expression), FileScan) from running on GPU. For more details, please go to https://github.com/NVIDIA/spark-rapids/issues/3333.

  131. object JoinGatherer
  132. object JoinGathererImpl
  133. object JoinPartitioner
  134. object JsonFormatType extends FileFormatType
  135. object JsonPathParser extends RegexParsers
  136. object LazySpillableColumnarBatch
  137. object LazySpillableGatherMap
  138. object MakeOrcTableProducer extends Logging
  139. object MakeParquetTableProducer extends Logging
  140. object MemoryCostHelper
  141. object MetaUtils
  142. object MetricsLevel extends Serializable
  143. object MultiFileReaderThreadPool extends Logging
  144. object MultiFileReaderUtils
  145. object NoopMetric extends GpuMetric
  146. object NotApplicable extends SupportLevel

    N/A neither spark nor the plugin supports this.

  147. object NotSupported extends SupportLevel

    Spark supports this but the plugin does not.

  148. object NvtxWithMetrics
  149. object OrcBlockMetaForSplitCheck extends Serializable
  150. object OrcFormatType extends FileFormatType
  151. object OutOfCoreSort extends SortExecType
  152. object ParquetDumper
  153. object ParquetFormatType extends FileFormatType
  154. object ParquetPartitionReader
  155. object ParquetSchemaUtils
  156. object PartChecks
  157. object PartitionReaderIterator
  158. object PartitionRowData extends Serializable
  159. object PathInstruction
  160. object PlanShims
  161. object PreProjectSplitIterator
  162. object ProfilerOnDriver extends Logging
  163. object ProfilerOnExecutor extends Logging
  164. object ProjectExprContext extends ExpressionContext
  165. object RangeConfMatcher
  166. object RapidsBufferCatalog extends Logging
  167. object RapidsConf
  168. object RapidsExecutorPlugin
  169. object RapidsMeta
  170. object RapidsPluginImplicits

    RapidsPluginImplicits, adds implicit functions for ColumnarBatch, Seq, Seq[AutoCloseable], and Array[AutoCloseable] that help make resource management easier within the project.

  171. object RapidsPluginUtils extends Logging
  172. object RapidsReaderType extends Enumeration
  173. object ReadFileOp extends FileFormatOp
  174. object ReductionAggExprContext extends ExpressionContext
  175. object RegexComplexityEstimator
  176. object RegexFindMode extends RegexMode
  177. object RegexNegativeLookahead extends RegexLookahead
  178. object RegexOptimizationType
  179. object RegexParser
  180. object RegexPositiveLookahead extends RegexLookahead
  181. object RegexReplaceMode extends RegexMode
  182. object RegexRewrite
  183. object RegexSplitMode extends RegexMode
  184. object RequireSingleBatch extends CoalesceSizeGoal with RequireSingleBatchLike with Product with Serializable

    A single batch is required as the input to a node in the SparkPlan.

    A single batch is required as the input to a node in the SparkPlan. This means all of the data for a given task is in a single batch. This should be avoided as much as possible because it can result in running out of memory or run into limitations of the batch size by both Spark and cudf.

  185. object RmmRapidsRetryIterator extends Logging
  186. object RowCountPlanVisitor

    Estimate the number of rows that an operator will output.

    Estimate the number of rows that an operator will output. Note that these row counts are the aggregate across all output partitions.

    Logic is based on Spark's SizeInBytesOnlyStatsPlanVisitor. which operates on logical plans and only computes data sizes, not row counts.

  187. object SamplingUtils
  188. object ScalableTaskCompletion

    Provides a task completion listeners in Spark that you can remove if needed to help with scaling.

    Provides a task completion listeners in Spark that you can remove if needed to help with scaling. Spark guarantees LIFO order to the callbacks, but we do not. If you need that kind of a guarantee then use the Spark task APIs directly.

  189. object SchemaUtils
  190. object SemaphoreAcquired extends TryAcquireResult with Product with Serializable

    The Semaphore was successfully acquired.

  191. object SerializedTableColumn
  192. object ShimLoaderTemp
  193. object ShuffleBufferCatalog
  194. object ShuffleMetadata extends Logging
  195. object ShuffleReceivedBufferCatalog
  196. object SingleHMBAndMeta extends Serializable
  197. object SortEachBatch extends SortExecType
  198. object SortUtils
  199. object SpillPriorities

    Utility methods for managing spillable buffer priorities.

    Utility methods for managing spillable buffer priorities. The spill priority numerical space is divided into potentially overlapping ranges based on the type of buffer.

  200. object SpillableBuffer
  201. object SpillableColumnarBatch
  202. object SpillableHostBuffer
  203. object SpillableHostColumnarBatch
  204. object StorageTier extends Enumeration

    Enumeration of the storage tiers

  205. object SupportedOpsDocs

    Used for generating the support docs.

  206. object SupportedOpsForTools
  207. object TableCompressionCodec extends Logging
  208. object TaskRegistryTracker

    This handles keeping track of task threads and registering them with RMMSpark as needed.

    This handles keeping track of task threads and registering them with RMMSpark as needed. This is here to provide an efficient and lazy way to make sure that we can use the Retry API behind the scenes without having to have callbacks whenever a task starts, or trying to inject code in all of the operators that would first start on the GPU.

  209. object TypeEnum extends Enumeration

    The Supported Types.

    The Supported Types. The TypeSig API should be preferred for this, except in a few cases when TypeSig asks for a TypeEnum.

  210. object TypeSig
  211. object VersionUtils
  212. object WindowAggExprContext extends ExpressionContext
  213. object WindowSpecCheck extends ExprChecks

    This is specific to WindowSpec, because it does not follow the typical parameter convention.

  214. object WriteFileOp extends FileFormatOp

Ungrouped