Packages

c

org.apache.spark.sql.rapids

GpuDynamicPartitionDataConcurrentWriter

class GpuDynamicPartitionDataConcurrentWriter extends GpuDynamicPartitionDataSingleWriter

Dynamic partition writer with concurrent writers, meaning multiple concurrent writers are opened for writing.

The process has the following steps:

  • Step 1: Maintain a map of output writers per each partition columns. Keep all writers opened; Cache the inputted batches by splitting them into sub-groups and each partition holds a list of spillable sub-groups; Find and write the max pending partition data if the total caches exceed the limitation.
  • Step 2: If number of concurrent writers exceeds limit, fall back to sort-based write (GpuDynamicPartitionDataSingleWriter), sort rest of batches on partition. Write batch by batch, and eagerly close the writer when finishing Caller is expected to call writeWithIterator() instead of write() to write records. Note: when fall back to GpuDynamicPartitionDataSingleWriter, the single writer should restore un-closed writers and should handle un-flushed spillable caches.
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. GpuDynamicPartitionDataConcurrentWriter
  2. GpuDynamicPartitionDataSingleWriter
  3. GpuFileFormatDataWriter
  4. DataWriter
  5. Closeable
  6. AutoCloseable
  7. AnyRef
  8. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new GpuDynamicPartitionDataConcurrentWriter(description: GpuWriteJobDescription, taskAttemptContext: TaskAttemptContext, committer: FileCommitProtocol, spec: GpuConcurrentOutputWriterSpec, taskContext: TaskContext)

Type Members

  1. class WriterStatus extends AnyRef

    Wrapper class for status of a unique single output writer.

    Wrapper class for status of a unique single output writer.

    Attributes
    protected
    Definition Classes
    GpuDynamicPartitionDataSingleWriter
  2. class WriterStatusWithCaches extends AnyRef

    Wrapper class for status and caches of a unique concurrent output writer.

    Wrapper class for status and caches of a unique concurrent output writer. Used by GpuDynamicPartitionDataConcurrentWriter

    Definition Classes
    GpuDynamicPartitionDataSingleWriter

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. val MAX_FILE_COUNTER: Int

    Max number of files a single task writes out due to file size.

    Max number of files a single task writes out due to file size. In most cases the number of files written should be very small. This is just a safe guard to protect some really bad settings, e.g. maxRecordsPerFile = 1.

    Attributes
    protected
    Definition Classes
    GpuFileFormatDataWriter
  5. def abort(): Unit
  6. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  7. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  8. def close(): Unit
    Definition Classes
    GpuFileFormatDataWriter → Closeable → AutoCloseable
  9. def closeCachesAndWriters(): Unit
  10. def commit(): WriteTaskResult

    Returns the summary of relative information which includes the list of partition strings written out.

    Returns the summary of relative information which includes the list of partition strings written out. The list of partitions is sent back to the driver and used to update the catalog. Other information will be sent back to the driver too and used to e.g. update the metrics in UI.

    Definition Classes
    GpuFileFormatDataWriter → DataWriter
  11. def copyToHostAsBatch(input: Table, colTypes: Array[DataType]): ColumnarBatch
    Attributes
    protected
    Definition Classes
    GpuDynamicPartitionDataSingleWriter
  12. var currentWriter: ColumnarOutputWriter
    Attributes
    protected
    Definition Classes
    GpuFileFormatDataWriter
  13. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  14. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  15. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  16. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  17. lazy val getOutputCb: (ColumnarBatch) ⇒ ColumnarBatch

    Extracts the output values of an input batch.

    Extracts the output values of an input batch.

    Attributes
    protected
    Definition Classes
    GpuDynamicPartitionDataSingleWriter
  18. lazy val getPartitionColumnsAsBatch: (ColumnarBatch) ⇒ ColumnarBatch

    Extracts the partition values out of an input batch.

    Extracts the partition values out of an input batch.

    Attributes
    protected
    Definition Classes
    GpuDynamicPartitionDataSingleWriter
  19. lazy val getPartitionPath: (InternalRow) ⇒ String

    Evaluates the partitionPathExpression above on a row of partitionValues and returns the partition string.

    Evaluates the partitionPathExpression above on a row of partitionValues and returns the partition string.

    Attributes
    protected
    Definition Classes
    GpuDynamicPartitionDataSingleWriter
  20. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  21. val isBucketed: Boolean

    Flag saying whether or not the data to be written out is bucketed.

    Flag saying whether or not the data to be written out is bucketed.

    Attributes
    protected
    Definition Classes
    GpuDynamicPartitionDataSingleWriter
  22. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  23. val isPartitioned: Boolean

    Flag saying whether or not the data to be written out is partitioned.

    Flag saying whether or not the data to be written out is partitioned.

    Attributes
    protected
    Definition Classes
    GpuDynamicPartitionDataSingleWriter
  24. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  25. def newWriter(partDir: String, bucketId: Option[Int], fileCounter: Int): ColumnarOutputWriter

    Opens a new OutputWriter given a partition key and/or a bucket id.

    Opens a new OutputWriter given a partition key and/or a bucket id. If bucket id is specified, we will append it to the end of the file name, but before the file extension, e.g. part-r-00009-ea518ad4-455a-4431-b471-d24e03814677-00002.gz.parquet

    partDir

    the partition directory

    bucketId

    the bucket which all tuples being written by this OutputWriter belong to, currently does not support bucketId, it's always None

    fileCounter

    integer indicating the number of files to be written to partDir

    Definition Classes
    GpuDynamicPartitionDataSingleWriter
    Annotations
    @nowarn()
  26. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  27. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  28. def releaseCurrentWriter(): Unit

    Release resources of currentWriter.

    Release resources of currentWriter.

    Attributes
    protected
    Definition Classes
    GpuFileFormatDataWriter
  29. def releaseResources(): Unit

    Release all resources.

  30. val statsTrackers: Seq[ColumnarWriteTaskStatsTracker]

    Trackers for computing various statistics on the data as it's being written out.

    Trackers for computing various statistics on the data as it's being written out.

    Attributes
    protected
    Definition Classes
    GpuFileFormatDataWriter
  31. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  32. def toString(): String
    Definition Classes
    AnyRef → Any
  33. val updatedPartitions: Set[String]
    Attributes
    protected
    Definition Classes
    GpuFileFormatDataWriter
  34. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  35. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  36. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  37. def write(cb: ColumnarBatch): Unit

    concurrent write the columnar batch Note: if new partitions number in cb plus existing partitions number is greater than maxWriters limit, will put back the whole cb to 'single writer

    concurrent write the columnar batch Note: if new partitions number in cb plus existing partitions number is greater than maxWriters limit, will put back the whole cb to 'single writer

    cb

    the columnar batch

    Definition Classes
    GpuDynamicPartitionDataConcurrentWriterGpuDynamicPartitionDataSingleWriterGpuFileFormatDataWriter → DataWriter
  38. def write(batch: ColumnarBatch, cachesMap: Option[HashMap[String, WriterStatusWithCaches]]): Unit

    Write columnar batch.

    Write columnar batch. If the cachesMap is not empty, this single writer should restore the writers and caches in the cachesMap, this single writer should first combine the caches and current split data for a specific partition before write.

    cachesMap

    used by GpuDynamicPartitionDataConcurrentWriter when fall back to single writer, single writer should handle the stored writers and the pending caches

    Attributes
    protected
    Definition Classes
    GpuDynamicPartitionDataSingleWriter
  39. def writeUpdateMetricsAndClose(writerStatus: WriterStatus, spillableBatch: SpillableColumnarBatch): Unit
    Attributes
    protected
    Definition Classes
    GpuDynamicPartitionDataSingleWriter
  40. def writeWithIterator(iterator: Iterator[ColumnarBatch]): Unit

    Write an iterator of column batch.

    Write an iterator of column batch.

    iterator

    the iterator of column batch

    Definition Classes
    GpuDynamicPartitionDataConcurrentWriterGpuFileFormatDataWriter

Inherited from GpuFileFormatDataWriter

Inherited from DataWriter[ColumnarBatch]

Inherited from Closeable

Inherited from AutoCloseable

Inherited from AnyRef

Inherited from Any

Ungrouped