Packages

c

com.nvidia.spark.rapids

HostToGpuCoalesceIterator

class HostToGpuCoalesceIterator extends AbstractGpuCoalesceIterator

This iterator builds GPU batches from host batches. The host batches potentially use Spark's UnsafeRow so it is not safe to cache these batches. Rows must be read and immediately written to CuDF builders.

Linear Supertypes
AbstractGpuCoalesceIterator, Logging, Iterator[ColumnarBatch], TraversableOnce[ColumnarBatch], GenTraversableOnce[ColumnarBatch], AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. HostToGpuCoalesceIterator
  2. AbstractGpuCoalesceIterator
  3. Logging
  4. Iterator
  5. TraversableOnce
  6. GenTraversableOnce
  7. AnyRef
  8. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new HostToGpuCoalesceIterator(iter: Iterator[ColumnarBatch], goal: CoalesceSizeGoal, schema: StructType, numInputRows: GpuMetric, numInputBatches: GpuMetric, numOutputRows: GpuMetric, numOutputBatches: GpuMetric, streamTime: GpuMetric, concatTime: GpuMetric, copyBufTime: GpuMetric, opTime: GpuMetric, opName: String, useArrowCopyOpt: Boolean)

Type Members

  1. class GroupedIterator[B >: A] extends AbstractIterator[Seq[B]] with Iterator[Seq[B]]
    Definition Classes
    Iterator

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. def ++[B >: ColumnarBatch](that: ⇒ GenTraversableOnce[B]): Iterator[B]
    Definition Classes
    Iterator
  4. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  5. def addBatchToConcat(batch: ColumnarBatch): Unit

    addBatchToConcat for HostToGpuCoalesceIterator does not need to close batch because the batch is closed by the producer iterator.

    addBatchToConcat for HostToGpuCoalesceIterator does not need to close batch because the batch is closed by the producer iterator. See: https://github.com/NVIDIA/spark-rapids/issues/6995

    batch

    the batch to add in.

    Definition Classes
    HostToGpuCoalesceIteratorAbstractGpuCoalesceIterator
  6. def addString(b: StringBuilder): StringBuilder
    Definition Classes
    TraversableOnce
  7. def addString(b: StringBuilder, sep: String): StringBuilder
    Definition Classes
    TraversableOnce
  8. def addString(b: StringBuilder, start: String, sep: String, end: String): StringBuilder
    Definition Classes
    TraversableOnce
  9. def aggregate[B](z: ⇒ B)(seqop: (B, ColumnarBatch) ⇒ B, combop: (B, B) ⇒ B): B
    Definition Classes
    TraversableOnce → GenTraversableOnce
  10. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  11. var batchBuilder: GpuColumnarBatchBuilderBase
  12. var batchRowLimit: Int

    Optional row limit

    Optional row limit

    Definition Classes
    AbstractGpuCoalesceIterator
  13. def buffered: BufferedIterator[ColumnarBatch]
    Definition Classes
    Iterator
  14. def cleanupConcatIsDone(): Unit

    Called to cleanup any state when a batch is done (even if there was a failure)

    Called to cleanup any state when a batch is done (even if there was a failure)

    Definition Classes
    HostToGpuCoalesceIteratorAbstractGpuCoalesceIterator
  15. def cleanupInputBatch(batch: ColumnarBatch): Unit

    Perform the necessary cleanup for an input batch

    Perform the necessary cleanup for an input batch

    Attributes
    protected
    Definition Classes
    HostToGpuCoalesceIteratorAbstractGpuCoalesceIterator
  16. def clearOnDeck(): Unit

    If there is anything saved on deck close it.

    If there is anything saved on deck close it.

    Attributes
    protected
    Definition Classes
    HostToGpuCoalesceIteratorAbstractGpuCoalesceIterator
  17. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  18. def collect[B](pf: PartialFunction[ColumnarBatch, B]): Iterator[B]
    Definition Classes
    Iterator
    Annotations
    @migration
    Migration

    (Changed in version 2.8.0) collect has changed. The previous behavior can be reproduced with toSeq.

  19. def collectFirst[B](pf: PartialFunction[ColumnarBatch, B]): Option[B]
    Definition Classes
    TraversableOnce
  20. def concatAllAndPutOnGPU(): ColumnarBatch

    Called after all of the batches have been added in.

    Called after all of the batches have been added in.

    returns

    the concated batches on the GPU.

    Definition Classes
    HostToGpuCoalesceIteratorAbstractGpuCoalesceIterator
  21. def contains(elem: Any): Boolean
    Definition Classes
    Iterator
  22. def copyToArray[B >: ColumnarBatch](xs: Array[B], start: Int, len: Int): Unit
    Definition Classes
    Iterator → TraversableOnce → GenTraversableOnce
  23. def copyToArray[B >: ColumnarBatch](xs: Array[B]): Unit
    Definition Classes
    TraversableOnce → GenTraversableOnce
  24. def copyToArray[B >: ColumnarBatch](xs: Array[B], start: Int): Unit
    Definition Classes
    TraversableOnce → GenTraversableOnce
  25. def copyToBuffer[B >: ColumnarBatch](dest: Buffer[B]): Unit
    Definition Classes
    TraversableOnce
  26. def corresponds[B](that: GenTraversableOnce[B])(p: (ColumnarBatch, B) ⇒ Boolean): Boolean
    Definition Classes
    Iterator
  27. def count(p: (ColumnarBatch) ⇒ Boolean): Int
    Definition Classes
    TraversableOnce → GenTraversableOnce
  28. def drop(n: Int): Iterator[ColumnarBatch]
    Definition Classes
    Iterator
  29. def dropWhile(p: (ColumnarBatch) ⇒ Boolean): Iterator[ColumnarBatch]
    Definition Classes
    Iterator
  30. def duplicate: (Iterator[ColumnarBatch], Iterator[ColumnarBatch])
    Definition Classes
    Iterator
  31. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  32. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  33. def exists(p: (ColumnarBatch) ⇒ Boolean): Boolean
    Definition Classes
    Iterator → TraversableOnce → GenTraversableOnce
  34. def filter(p: (ColumnarBatch) ⇒ Boolean): Iterator[ColumnarBatch]
    Definition Classes
    Iterator
  35. def filterNot(p: (ColumnarBatch) ⇒ Boolean): Iterator[ColumnarBatch]
    Definition Classes
    Iterator
  36. val filteringModeRowsThreshold: Int

    For tests only.

    For tests only. Int.MaxValue is quite big for unit tests, then override this in tests to change to a smaller value.

    Attributes
    protected
    Definition Classes
    AbstractGpuCoalesceIterator
  37. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  38. def find(p: (ColumnarBatch) ⇒ Boolean): Option[ColumnarBatch]
    Definition Classes
    Iterator → TraversableOnce → GenTraversableOnce
  39. def flatMap[B](f: (ColumnarBatch) ⇒ GenTraversableOnce[B]): Iterator[B]
    Definition Classes
    Iterator
  40. def fold[A1 >: ColumnarBatch](z: A1)(op: (A1, A1) ⇒ A1): A1
    Definition Classes
    TraversableOnce → GenTraversableOnce
  41. def foldLeft[B](z: B)(op: (B, ColumnarBatch) ⇒ B): B
    Definition Classes
    TraversableOnce → GenTraversableOnce
  42. def foldRight[B](z: B)(op: (ColumnarBatch, B) ⇒ B): B
    Definition Classes
    TraversableOnce → GenTraversableOnce
  43. def forall(p: (ColumnarBatch) ⇒ Boolean): Boolean
    Definition Classes
    Iterator → TraversableOnce → GenTraversableOnce
  44. def foreach[U](f: (ColumnarBatch) ⇒ U): Unit
    Definition Classes
    Iterator → TraversableOnce → GenTraversableOnce
  45. def getBatchDataSize(batch: ColumnarBatch): Long

    Gets the size in bytes of the data buffer for a given column

    Gets the size in bytes of the data buffer for a given column

    Definition Classes
    HostToGpuCoalesceIteratorAbstractGpuCoalesceIterator
  46. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  47. def getCoalesceRetryIterator: Iterator[ColumnarBatch]

    Function that returns a retry iterator that returns coalesced batches, as much as possible.

    Function that returns a retry iterator that returns coalesced batches, as much as possible.

    Note this throws if the subclass does not support splitting its input. (supportsRetryIterator = false)

    returns

    an iterator that should be used to obtain coalesced batches

    Definition Classes
    HostToGpuCoalesceIteratorAbstractGpuCoalesceIterator
  48. def grouped[B >: ColumnarBatch](size: Int): GroupedIterator[B]
    Definition Classes
    Iterator
  49. def hasAnyToConcat: Boolean

    True if there are some batches to be concatenated, otherwise false.

    True if there are some batches to be concatenated, otherwise false.

    Definition Classes
    HostToGpuCoalesceIteratorAbstractGpuCoalesceIterator
  50. def hasDefiniteSize: Boolean
    Definition Classes
    Iterator → TraversableOnce → GenTraversableOnce
  51. def hasNext: Boolean
    Definition Classes
    AbstractGpuCoalesceIterator → Iterator
  52. def hasOnDeck: Boolean

    Return true if there is something saved on deck for later processing.

    Return true if there is something saved on deck for later processing.

    Attributes
    protected
    Definition Classes
    HostToGpuCoalesceIteratorAbstractGpuCoalesceIterator
  53. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  54. def indexOf[B >: ColumnarBatch](elem: B, from: Int): Int
    Definition Classes
    Iterator
  55. def indexOf[B >: ColumnarBatch](elem: B): Int
    Definition Classes
    Iterator
  56. def indexWhere(p: (ColumnarBatch) ⇒ Boolean, from: Int): Int
    Definition Classes
    Iterator
  57. def indexWhere(p: (ColumnarBatch) ⇒ Boolean): Int
    Definition Classes
    Iterator
  58. def initNewBatch(batch: ColumnarBatch): Unit

    Initialize the builders using an estimated row count based on the schema and the desired batch size defined by RapidsConf.GPU_BATCH_SIZE_BYTES.

    Initialize the builders using an estimated row count based on the schema and the desired batch size defined by RapidsConf.GPU_BATCH_SIZE_BYTES.

    Definition Classes
    HostToGpuCoalesceIteratorAbstractGpuCoalesceIterator
  59. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  60. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  61. def isEmpty: Boolean
    Definition Classes
    Iterator → TraversableOnce → GenTraversableOnce
  62. def isInFilteringMode: Boolean

    For tests only

    For tests only

    Definition Classes
    AbstractGpuCoalesceIterator
  63. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  64. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  65. def isTraversableAgain: Boolean
    Definition Classes
    Iterator → GenTraversableOnce
  66. def length: Int
    Definition Classes
    Iterator
  67. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  68. def logDebug(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  69. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  70. def logError(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  71. def logError(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  72. def logInfo(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  73. def logInfo(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  74. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  75. def logTrace(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  76. def logTrace(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  77. def logWarning(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  78. def logWarning(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  79. def map[B](f: (ColumnarBatch) ⇒ B): Iterator[B]
    Definition Classes
    Iterator
  80. def max[B >: ColumnarBatch](implicit cmp: Ordering[B]): ColumnarBatch
    Definition Classes
    TraversableOnce → GenTraversableOnce
  81. def maxBy[B](f: (ColumnarBatch) ⇒ B)(implicit cmp: Ordering[B]): ColumnarBatch
    Definition Classes
    TraversableOnce → GenTraversableOnce
  82. def min[B >: ColumnarBatch](implicit cmp: Ordering[B]): ColumnarBatch
    Definition Classes
    TraversableOnce → GenTraversableOnce
  83. def minBy[B](f: (ColumnarBatch) ⇒ B)(implicit cmp: Ordering[B]): ColumnarBatch
    Definition Classes
    TraversableOnce → GenTraversableOnce
  84. def mkString: String
    Definition Classes
    TraversableOnce → GenTraversableOnce
  85. def mkString(sep: String): String
    Definition Classes
    TraversableOnce → GenTraversableOnce
  86. def mkString(start: String, sep: String, end: String): String
    Definition Classes
    TraversableOnce → GenTraversableOnce
  87. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  88. def next(): ColumnarBatch

    Each call to next() will combine batches according to the goal specified.

    Each call to next() will combine batches according to the goal specified. However, if any incoming batch is greater than this size it will be passed through unmodified.

    If the coalesce goal is RequireSingleBatch then an exception will be thrown if there is remaining data after the first batch is produced.

    If OOMs occur while coalescing (which may include decompression depending on the instance), this may be retried, and as a result ColumnarBatch may be smaller than desired, since we follow a "coalesce half of the batches" strategy, which should half the number of batches that are candidates for coalesce at each OOM, leaving the rest for a subsequent call to next.

    returns

    The coalesced batch

    Definition Classes
    AbstractGpuCoalesceIterator → Iterator
  89. def nonEmpty: Boolean
    Definition Classes
    TraversableOnce → GenTraversableOnce
  90. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  91. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  92. def padTo[A1 >: ColumnarBatch](len: Int, elem: A1): Iterator[A1]
    Definition Classes
    Iterator
  93. def partition(p: (ColumnarBatch) ⇒ Boolean): (Iterator[ColumnarBatch], Iterator[ColumnarBatch])
    Definition Classes
    Iterator
  94. def patch[B >: ColumnarBatch](from: Int, patchElems: Iterator[B], replaced: Int): Iterator[B]
    Definition Classes
    Iterator
  95. def popOnDeck(): ColumnarBatch

    Remove whatever is on deck and return it.

    Remove whatever is on deck and return it.

    Attributes
    protected
    Definition Classes
    HostToGpuCoalesceIteratorAbstractGpuCoalesceIterator
  96. def populateCandidateBatches(): Boolean

    Add input batches to the batches collection up to the limit specified by the goal.

    Add input batches to the batches collection up to the limit specified by the goal. Note: for a size goal, if any incoming batch is greater than this size it will be passed through unmodified.

    If the coalesce goal is RequireSingleBatch then an exception will be thrown if there is remaining data after the first batch is added.

    returns

    boolean that is true if this call reached the last input batch.

    Attributes
    protected
    Definition Classes
    AbstractGpuCoalesceIterator
    Note

    protected for testing

  97. def product[B >: ColumnarBatch](implicit num: Numeric[B]): B
    Definition Classes
    TraversableOnce → GenTraversableOnce
  98. def reduce[A1 >: ColumnarBatch](op: (A1, A1) ⇒ A1): A1
    Definition Classes
    TraversableOnce → GenTraversableOnce
  99. def reduceLeft[B >: ColumnarBatch](op: (B, ColumnarBatch) ⇒ B): B
    Definition Classes
    TraversableOnce
  100. def reduceLeftOption[B >: ColumnarBatch](op: (B, ColumnarBatch) ⇒ B): Option[B]
    Definition Classes
    TraversableOnce → GenTraversableOnce
  101. def reduceOption[A1 >: ColumnarBatch](op: (A1, A1) ⇒ A1): Option[A1]
    Definition Classes
    TraversableOnce → GenTraversableOnce
  102. def reduceRight[B >: ColumnarBatch](op: (ColumnarBatch, B) ⇒ B): B
    Definition Classes
    TraversableOnce → GenTraversableOnce
  103. def reduceRightOption[B >: ColumnarBatch](op: (ColumnarBatch, B) ⇒ B): Option[B]
    Definition Classes
    TraversableOnce → GenTraversableOnce
  104. def reversed: List[ColumnarBatch]
    Attributes
    protected[this]
    Definition Classes
    TraversableOnce
  105. def sameElements(that: Iterator[_]): Boolean
    Definition Classes
    Iterator
  106. def saveOnDeck(batch: ColumnarBatch): Unit

    Save a batch for later processing.

    Save a batch for later processing. In case of an exception raised while saving the batch, saveOnDeck guarantees it closes batch.

    Attributes
    protected
    Definition Classes
    HostToGpuCoalesceIteratorAbstractGpuCoalesceIterator
  107. def scanLeft[B](z: B)(op: (B, ColumnarBatch) ⇒ B): Iterator[B]
    Definition Classes
    Iterator
  108. def scanRight[B](z: B)(op: (ColumnarBatch, B) ⇒ B): Iterator[B]
    Definition Classes
    Iterator
  109. def seq: Iterator[ColumnarBatch]
    Definition Classes
    Iterator → TraversableOnce → GenTraversableOnce
  110. def size: Int
    Definition Classes
    TraversableOnce → GenTraversableOnce
  111. def sizeHintIfCheap: Int
    Attributes
    protected[collection]
    Definition Classes
    GenTraversableOnce
  112. def slice(from: Int, until: Int): Iterator[ColumnarBatch]
    Definition Classes
    Iterator
  113. def sliceIterator(from: Int, until: Int): Iterator[ColumnarBatch]
    Attributes
    protected
    Definition Classes
    Iterator
  114. def sliding[B >: ColumnarBatch](size: Int, step: Int): GroupedIterator[B]
    Definition Classes
    Iterator
  115. def span(p: (ColumnarBatch) ⇒ Boolean): (Iterator[ColumnarBatch], Iterator[ColumnarBatch])
    Definition Classes
    Iterator
  116. def splitBatchesToCoalesceFn: (BatchesToCoalesce) ⇒ Seq[BatchesToCoalesce]

    Splits a BatchesToCoalesce instance into two.

    Splits a BatchesToCoalesce instance into two.

    returns

    Seq[BatchesToCoalesce] with 2 items.

    Attributes
    protected
    Definition Classes
    AbstractGpuCoalesceIterator
  117. def sum[B >: ColumnarBatch](implicit num: Numeric[B]): B
    Definition Classes
    TraversableOnce → GenTraversableOnce
  118. val supportsRetryIterator: Boolean
  119. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  120. def take(n: Int): Iterator[ColumnarBatch]
    Definition Classes
    Iterator
  121. def takeWhile(p: (ColumnarBatch) ⇒ Boolean): Iterator[ColumnarBatch]
    Definition Classes
    Iterator
  122. def to[Col[_]](implicit cbf: CanBuildFrom[Nothing, ColumnarBatch, Col[ColumnarBatch]]): Col[ColumnarBatch]
    Definition Classes
    TraversableOnce → GenTraversableOnce
  123. def toArray[B >: ColumnarBatch](implicit arg0: ClassTag[B]): Array[B]
    Definition Classes
    TraversableOnce → GenTraversableOnce
  124. def toBuffer[B >: ColumnarBatch]: Buffer[B]
    Definition Classes
    TraversableOnce → GenTraversableOnce
  125. def toIndexedSeq: IndexedSeq[ColumnarBatch]
    Definition Classes
    TraversableOnce → GenTraversableOnce
  126. def toIterable: Iterable[ColumnarBatch]
    Definition Classes
    TraversableOnce → GenTraversableOnce
  127. def toIterator: Iterator[ColumnarBatch]
    Definition Classes
    Iterator → GenTraversableOnce
  128. def toList: List[ColumnarBatch]
    Definition Classes
    TraversableOnce → GenTraversableOnce
  129. def toMap[T, U](implicit ev: <:<[ColumnarBatch, (T, U)]): Map[T, U]
    Definition Classes
    TraversableOnce → GenTraversableOnce
  130. def toSeq: Seq[ColumnarBatch]
    Definition Classes
    TraversableOnce → GenTraversableOnce
  131. def toSet[B >: ColumnarBatch]: Set[B]
    Definition Classes
    TraversableOnce → GenTraversableOnce
  132. def toStream: Stream[ColumnarBatch]
    Definition Classes
    Iterator → GenTraversableOnce
  133. def toString(): String
    Definition Classes
    Iterator → AnyRef → Any
  134. def toTraversable: Traversable[ColumnarBatch]
    Definition Classes
    Iterator → TraversableOnce → GenTraversableOnce
  135. def toVector: Vector[ColumnarBatch]
    Definition Classes
    TraversableOnce → GenTraversableOnce
  136. var totalRows: Int
  137. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  138. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  139. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  140. var wasLastBatch: Boolean
    Definition Classes
    AbstractGpuCoalesceIterator
  141. def withFilter(p: (ColumnarBatch) ⇒ Boolean): Iterator[ColumnarBatch]
    Definition Classes
    Iterator
  142. def zip[B](that: Iterator[B]): Iterator[(ColumnarBatch, B)]
    Definition Classes
    Iterator
  143. def zipAll[B, A1 >: ColumnarBatch, B1 >: B](that: Iterator[B], thisElem: A1, thatElem: B1): Iterator[(A1, B1)]
    Definition Classes
    Iterator
  144. def zipWithIndex: Iterator[(ColumnarBatch, Int)]
    Definition Classes
    Iterator

Deprecated Value Members

  1. def /:[B](z: B)(op: (B, ColumnarBatch) ⇒ B): B
    Definition Classes
    TraversableOnce → GenTraversableOnce
    Annotations
    @deprecated
    Deprecated

    (Since version 2.12.10) Use foldLeft instead of /:

  2. def :\[B](z: B)(op: (ColumnarBatch, B) ⇒ B): B
    Definition Classes
    TraversableOnce → GenTraversableOnce
    Annotations
    @deprecated
    Deprecated

    (Since version 2.12.10) Use foldRight instead of :\

Inherited from Logging

Inherited from Iterator[ColumnarBatch]

Inherited from TraversableOnce[ColumnarBatch]

Inherited from GenTraversableOnce[ColumnarBatch]

Inherited from AnyRef

Inherited from Any

Ungrouped