Packages

c

org.apache.spark.sql.rapids

GpuDynamicPartitionDataConcurrentWriter

class GpuDynamicPartitionDataConcurrentWriter extends GpuDynamicPartitionDataSingleWriter with Logging

Dynamic partition writer with concurrent writers, meaning multiple concurrent writers are opened for writing.

The process has the following steps:

  • Step 1: Maintain a map of output writers per each partition columns. Keep all writers opened; Cache the inputted batches by splitting them into sub-groups and each partition holds a list of spillable sub-groups; Find and write the max pending partition data if the total caches exceed the limitation.
  • Step 2: If number of concurrent writers exceeds limit, fall back to sort-based write (GpuDynamicPartitionDataSingleWriter), sort rest of batches on partition. Write batch by batch, and eagerly close the writer when finishing Caller is expected to call writeWithIterator() instead of write() to write records. Note: when fall back to GpuDynamicPartitionDataSingleWriter, the single writer should restore un-closed writers and should handle un-flushed spillable caches.
Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. GpuDynamicPartitionDataConcurrentWriter
  2. Logging
  3. GpuDynamicPartitionDataSingleWriter
  4. GpuFileFormatDataWriter
  5. DataWriter
  6. Closeable
  7. AutoCloseable
  8. AnyRef
  9. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new GpuDynamicPartitionDataConcurrentWriter(description: GpuWriteJobDescription, taskAttemptContext: TaskAttemptContext, committer: FileCommitProtocol, spec: GpuConcurrentOutputWriterSpec)

Type Members

  1. class WriterIndex extends Product2[Option[String], Option[Int]]

    Wrapper class to index a unique concurrent output writer.

    Wrapper class to index a unique concurrent output writer.

    Attributes
    protected
    Definition Classes
    GpuDynamicPartitionDataSingleWriter
  2. class WriterAndStatus extends AnyRef
    Attributes
    protected
    Definition Classes
    GpuFileFormatDataWriter

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. val MAX_FILE_COUNTER: Int

    Max number of files a single task writes out due to file size.

    Max number of files a single task writes out due to file size. In most cases the number of files written should be very small. This is just a safe guard to protect some really bad settings, e.g. maxRecordsPerFile = 1.

    Attributes
    protected
    Definition Classes
    GpuFileFormatDataWriter
  5. def abort(): Unit
    Definition Classes
    GpuFileFormatDataWriter → DataWriter
  6. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  7. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  8. def close(): Unit
    Definition Classes
    GpuFileFormatDataWriter → Closeable → AutoCloseable
  9. def commit(): WriteTaskResult

    Returns the summary of relative information which includes the list of partition strings written out.

    Returns the summary of relative information which includes the list of partition strings written out. The list of partitions is sent back to the driver and used to update the catalog. Other information will be sent back to the driver too and used to e.g. update the metrics in UI.

    Definition Classes
    GpuFileFormatDataWriter → DataWriter
  10. def copyToHostAsBatch(input: Table, colTypes: Array[DataType]): ColumnarBatch
    Attributes
    protected
    Definition Classes
    GpuDynamicPartitionDataSingleWriter
  11. def currentMetricsValues(): Array[CustomTaskMetric]
    Definition Classes
    DataWriter
  12. var currentWriterStatus: WriterAndStatus
    Attributes
    protected
    Definition Classes
    GpuFileFormatDataWriter
  13. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  14. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  15. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  16. def genGetBucketIdFunc(keyHostCb: ColumnarBatch): (Int) ⇒ Option[Int]
    Attributes
    protected
    Definition Classes
    GpuDynamicPartitionDataSingleWriter
  17. def genGetPartitionPathFunc(keyHostCb: ColumnarBatch): (Int) ⇒ Option[String]
    Attributes
    protected
    Definition Classes
    GpuDynamicPartitionDataSingleWriter
  18. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  19. lazy val getDataColumnsAsBatch: (ColumnarBatch) ⇒ ColumnarBatch

    Extracts the output values of an input batch.

    Extracts the output values of an input batch.

    Attributes
    protected
    Definition Classes
    GpuDynamicPartitionDataSingleWriter
  20. def getKeysBatch(cb: ColumnarBatch): ColumnarBatch
    Attributes
    protected
    Definition Classes
    GpuDynamicPartitionDataSingleWriter
  21. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  22. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  23. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  24. val isBucketed: Boolean

    Flag saying whether or not the data to be written out is bucketed.

    Flag saying whether or not the data to be written out is bucketed.

    Attributes
    protected
    Definition Classes
    GpuDynamicPartitionDataSingleWriter
  25. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  26. val isPartitioned: Boolean

    Flag saying whether or not the data to be written out is partitioned.

    Flag saying whether or not the data to be written out is partitioned.

    Attributes
    protected
    Definition Classes
    GpuDynamicPartitionDataSingleWriter
  27. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  28. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  29. def logDebug(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  30. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  31. def logError(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  32. def logError(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  33. def logInfo(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  34. def logInfo(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  35. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  36. def logTrace(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  37. def logTrace(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  38. def logWarning(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  39. def logWarning(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  40. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  41. def newWriter(partDir: Option[String], bucketId: Option[Int], fileCounter: Int): ColumnarOutputWriter

    Opens a new OutputWriter given a partition key and/or a bucket id.

    Opens a new OutputWriter given a partition key and/or a bucket id. If bucket id is specified, we will append it to the end of the file name, but before the file extension, e.g. part-r-00009-ea518ad4-455a-4431-b471-d24e03814677-00002.gz.parquet

    partDir

    the partition directory

    bucketId

    the bucket which all tuples being written by this OutputWriter belong to, currently does not support bucketId, it's always None

    fileCounter

    integer indicating the number of files to be written to partDir

    Definition Classes
    GpuDynamicPartitionDataSingleWriter
    Annotations
    @nowarn()
  42. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  43. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  44. def preUpdateCurrentWriterStatus(curWriterId: WriterIndex): Unit

    This is for the fallback case, used to clean the writers map.

    This is for the fallback case, used to clean the writers map.

    curWriterId

    the current writer index

    Definition Classes
    GpuDynamicPartitionDataConcurrentWriterGpuDynamicPartitionDataSingleWriter
  45. final def releaseOutWriter(status: WriterAndStatus): Unit

    Release resources of a WriterStatus.

    Release resources of a WriterStatus.

    Attributes
    protected
    Definition Classes
    GpuFileFormatDataWriter
  46. def releaseResources(): Unit

    Release all resources.

    Release all resources.

    Definition Classes
    GpuDynamicPartitionDataConcurrentWriterGpuFileFormatDataWriter
  47. final def renewOutWriter(newWriterId: WriterIndex, curWriterStatus: WriterAndStatus, closeOldWriter: Boolean = true): Unit

    Create a new writer according to the given writer id, and update the given writer status.

    Create a new writer according to the given writer id, and update the given writer status. It also closes the old writer in the writer status by default.

    Attributes
    protected
    Definition Classes
    GpuDynamicPartitionDataSingleWriter
  48. def setupCurrentWriter(newWriterId: WriterIndex, writerStatus: WriterAndStatus, closeOldWriter: Boolean): Unit

    This is for the fallback case, try to find the writer from cache first.

    This is for the fallback case, try to find the writer from cache first.

    Definition Classes
    GpuDynamicPartitionDataConcurrentWriterGpuDynamicPartitionDataSingleWriter
  49. val statsTrackers: Seq[ColumnarWriteTaskStatsTracker]

    Trackers for computing various statistics on the data as it's being written out.

    Trackers for computing various statistics on the data as it's being written out.

    Attributes
    protected
    Definition Classes
    GpuFileFormatDataWriter
  50. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  51. def toString(): String
    Definition Classes
    AnyRef → Any
  52. val updatedPartitions: Set[String]
    Attributes
    protected
    Definition Classes
    GpuFileFormatDataWriter
  53. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  54. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  55. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  56. def write(cb: ColumnarBatch): Unit

    The write path of concurrent writers

    The write path of concurrent writers

    cb

    the columnar batch to be written

    Definition Classes
    GpuDynamicPartitionDataConcurrentWriterGpuDynamicPartitionDataSingleWriterGpuFileFormatDataWriter → DataWriter
  57. final def writeBatchPerMaxRecordsAndClose(scb: SpillableColumnarBatch, writerId: WriterIndex, writerStatus: WriterAndStatus): Unit
    Attributes
    protected
    Definition Classes
    GpuDynamicPartitionDataSingleWriter
  58. final def writeUpdateMetricsAndClose(scb: SpillableColumnarBatch, writerStatus: WriterAndStatus): Unit
    Attributes
    protected
    Definition Classes
    GpuFileFormatDataWriter
  59. def writeWithIterator(iterator: Iterator[ColumnarBatch]): Unit

    Write an iterator of column batch.

    Write an iterator of column batch.

    Definition Classes
    GpuDynamicPartitionDataConcurrentWriterGpuFileFormatDataWriter

Inherited from Logging

Inherited from GpuFileFormatDataWriter

Inherited from DataWriter[ColumnarBatch]

Inherited from Closeable

Inherited from AutoCloseable

Inherited from AnyRef

Inherited from Any

Ungrouped