Packages

t

com.nvidia.spark.rapids.GpuShuffledSizedHashJoinExec

HostHostUnspillableJoinSizer

trait HostHostUnspillableJoinSizer extends JoinSizer[ColumnarBatch]

Very similar to the HostHostJoinSizer except it does not support host spillable data. This should only be used when the amount of data being probed is the target batch size or less, which matches the behavior of normal shuffle processing today. Ideally we should be using HostHostJoinSizer, but this saves the overhead of registering and unregistering all of the shuffle buffers with the spill framework. See https://github.com/NVIDIA/spark-rapids/issues/11322.

Linear Supertypes
JoinSizer[ColumnarBatch], AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. HostHostUnspillableJoinSizer
  2. JoinSizer
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Abstract Value Members

  1. abstract def getJoinInfo(joinType: JoinType, leftKeys: Seq[Expression], leftOutput: Seq[Attribute], rawLeftIter: Iterator[ColumnarBatch], rightKeys: Seq[Expression], rightOutput: Seq[Attribute], rawRightIter: Iterator[ColumnarBatch], condition: Option[Expression], gpuBatchSizeBytes: Long, metrics: Map[String, GpuMetric]): JoinInfo

    Probe the left and right join inputs to determine which side should be used as the build side and which should be used as the stream side.

    Probe the left and right join inputs to determine which side should be used as the build side and which should be used as the stream side.

    joinType

    type of join to perform

    leftKeys

    join keys for the left table

    leftOutput

    schema of the left table

    rawLeftIter

    iterator of batches for the left table

    rightKeys

    join keys for the right table

    rightOutput

    schema of the right table

    rawRightIter

    iterator of batches for the right table

    condition

    inequality portions of the join condition

    gpuBatchSizeBytes

    target GPU batch size

    metrics

    map of metrics to update

    returns

    join information including build side, bound expressions, etc.

    Definition Classes
    JoinSizer

Concrete Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  6. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  10. def getProbeBatchDataSize(batch: ColumnarBatch): Long

    Get the data size in bytes of a batch of data

    Get the data size in bytes of a batch of data

    Definition Classes
    HostHostUnspillableJoinSizerJoinSizer
  11. def getProbeBatchRowCount(batch: ColumnarBatch): Long

    Get the row count of a batch of data

    Get the row count of a batch of data

    Definition Classes
    HostHostUnspillableJoinSizerJoinSizer
  12. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  13. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  14. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  15. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  16. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  17. def setupForJoin(queue: Queue[ColumnarBatch], remainingIter: Iterator[ColumnarBatch], batchTypes: Array[DataType], gpuBatchSizeBytes: Long, metrics: Map[String, GpuMetric]): Iterator[ColumnarBatch]

    Build an iterator in preparation for using it for sub-joins.

    Build an iterator in preparation for using it for sub-joins.

    queue

    a possibly empty queue of data that has already been fetched from the underlying iterator as part of probing sizes of the join inputs

    remainingIter

    the data remaining to be fetched from the original iterator. Iterating the queue followed by this iterator reconstructs the iteration order of the original input iterator.

    batchTypes

    the schema of the data

    gpuBatchSizeBytes

    target GPU batch size in bytes

    metrics

    metrics to update (e.g.: if coalescing batches)

    returns

    iterator of columnar batches to use in sub-joins

    Definition Classes
    HostHostUnspillableJoinSizerJoinSizer
  18. def setupForProbe(iter: Iterator[ColumnarBatch]): Iterator[ColumnarBatch]

    Wrap, if necessary, an iterator in preparation for probing the size before a join.

    Wrap, if necessary, an iterator in preparation for probing the size before a join.

    Definition Classes
    HostHostUnspillableJoinSizerJoinSizer
  19. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  20. def toString(): String
    Definition Classes
    AnyRef → Any
  21. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  22. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  23. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()

Inherited from JoinSizer[ColumnarBatch]

Inherited from AnyRef

Inherited from Any

Ungrouped