Packages

c

org.apache.spark.sql.rapids

GpuDynamicPartitionDataSingleWriter

class GpuDynamicPartitionDataSingleWriter extends GpuFileFormatDataWriter

Dynamic partition writer with single writer, meaning only one writer is opened at any time for writing, meaning this single function can write to multiple directories (partitions) or files (bucketing). The data to be written are required to be sorted on partition and/or bucket column(s) before writing.

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. GpuDynamicPartitionDataSingleWriter
  2. GpuFileFormatDataWriter
  3. DataWriter
  4. Closeable
  5. AutoCloseable
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new GpuDynamicPartitionDataSingleWriter(description: GpuWriteJobDescription, taskAttemptContext: TaskAttemptContext, committer: FileCommitProtocol)

Type Members

  1. class WriterIndex extends Product2[Option[String], Option[Int]]

    Wrapper class to index a unique concurrent output writer.

    Wrapper class to index a unique concurrent output writer.

    Attributes
    protected
  2. class WriterAndStatus extends AnyRef
    Attributes
    protected
    Definition Classes
    GpuFileFormatDataWriter

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. val MAX_FILE_COUNTER: Int

    Max number of files a single task writes out due to file size.

    Max number of files a single task writes out due to file size. In most cases the number of files written should be very small. This is just a safe guard to protect some really bad settings, e.g. maxRecordsPerFile = 1.

    Attributes
    protected
    Definition Classes
    GpuFileFormatDataWriter
  5. def abort(): Unit
    Definition Classes
    GpuFileFormatDataWriter → DataWriter
  6. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  7. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  8. def close(): Unit
    Definition Classes
    GpuFileFormatDataWriter → Closeable → AutoCloseable
  9. def commit(): WriteTaskResult

    Returns the summary of relative information which includes the list of partition strings written out.

    Returns the summary of relative information which includes the list of partition strings written out. The list of partitions is sent back to the driver and used to update the catalog. Other information will be sent back to the driver too and used to e.g. update the metrics in UI.

    Definition Classes
    GpuFileFormatDataWriter → DataWriter
  10. def copyToHostAsBatch(input: Table, colTypes: Array[DataType]): ColumnarBatch
    Attributes
    protected
  11. def currentMetricsValues(): Array[CustomTaskMetric]
    Definition Classes
    DataWriter
  12. var currentWriterStatus: WriterAndStatus
    Attributes
    protected
    Definition Classes
    GpuFileFormatDataWriter
  13. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  14. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  15. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  16. def genGetBucketIdFunc(keyHostCb: ColumnarBatch): (Int) ⇒ Option[Int]
    Attributes
    protected
  17. def genGetPartitionPathFunc(keyHostCb: ColumnarBatch): (Int) ⇒ Option[String]
    Attributes
    protected
  18. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  19. lazy val getDataColumnsAsBatch: (ColumnarBatch) ⇒ ColumnarBatch

    Extracts the output values of an input batch.

    Extracts the output values of an input batch.

    Attributes
    protected
  20. def getKeysBatch(cb: ColumnarBatch): ColumnarBatch
    Attributes
    protected
  21. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  22. val isBucketed: Boolean

    Flag saying whether or not the data to be written out is bucketed.

    Flag saying whether or not the data to be written out is bucketed.

    Attributes
    protected
  23. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  24. val isPartitioned: Boolean

    Flag saying whether or not the data to be written out is partitioned.

    Flag saying whether or not the data to be written out is partitioned.

    Attributes
    protected
  25. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  26. def newWriter(partDir: Option[String], bucketId: Option[Int], fileCounter: Int): ColumnarOutputWriter

    Opens a new OutputWriter given a partition key and/or a bucket id.

    Opens a new OutputWriter given a partition key and/or a bucket id. If bucket id is specified, we will append it to the end of the file name, but before the file extension, e.g. part-r-00009-ea518ad4-455a-4431-b471-d24e03814677-00002.gz.parquet

    partDir

    the partition directory

    bucketId

    the bucket which all tuples being written by this OutputWriter belong to, currently does not support bucketId, it's always None

    fileCounter

    integer indicating the number of files to be written to partDir

    Annotations
    @nowarn()
  27. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  28. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  29. def preUpdateCurrentWriterStatus(curWriterId: WriterIndex): Unit

    Called just before updating the current writer status when seeing a new partition or a bucket.

    Called just before updating the current writer status when seeing a new partition or a bucket.

    curWriterId

    the current writer index

    Attributes
    protected
  30. final def releaseOutWriter(status: WriterAndStatus): Unit

    Release resources of a WriterStatus.

    Release resources of a WriterStatus.

    Attributes
    protected
    Definition Classes
    GpuFileFormatDataWriter
  31. def releaseResources(): Unit

    Release all resources.

    Release all resources. Public for testing

    Definition Classes
    GpuFileFormatDataWriter
  32. final def renewOutWriter(newWriterId: WriterIndex, curWriterStatus: WriterAndStatus, closeOldWriter: Boolean = true): Unit

    Create a new writer according to the given writer id, and update the given writer status.

    Create a new writer according to the given writer id, and update the given writer status. It also closes the old writer in the writer status by default.

    Attributes
    protected
  33. def setupCurrentWriter(newWriterId: WriterIndex, curWriterStatus: WriterAndStatus, closeOldWriter: Boolean = true): Unit

    Set up a writer to the given writer status for the given writer id.

    Set up a writer to the given writer status for the given writer id. It will create a new one if needed. This is used when seeing a new partition and(or) a new bucket id.

    Attributes
    protected
  34. val statsTrackers: Seq[ColumnarWriteTaskStatsTracker]

    Trackers for computing various statistics on the data as it's being written out.

    Trackers for computing various statistics on the data as it's being written out.

    Attributes
    protected
    Definition Classes
    GpuFileFormatDataWriter
  35. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  36. def toString(): String
    Definition Classes
    AnyRef → Any
  37. val updatedPartitions: Set[String]
    Attributes
    protected
    Definition Classes
    GpuFileFormatDataWriter
  38. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  39. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  40. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  41. def write(batch: ColumnarBatch): Unit

    Writes a columnar batch of records

    Writes a columnar batch of records

    Definition Classes
    GpuDynamicPartitionDataSingleWriterGpuFileFormatDataWriter → DataWriter
  42. final def writeBatchPerMaxRecordsAndClose(scb: SpillableColumnarBatch, writerId: WriterIndex, writerStatus: WriterAndStatus): Unit
    Attributes
    protected
  43. final def writeUpdateMetricsAndClose(scb: SpillableColumnarBatch, writerStatus: WriterAndStatus): Unit
    Attributes
    protected
    Definition Classes
    GpuFileFormatDataWriter
  44. def writeWithIterator(iterator: Iterator[ColumnarBatch]): Unit

    Write an iterator of column batch.

    Write an iterator of column batch.

    Definition Classes
    GpuFileFormatDataWriter

Inherited from GpuFileFormatDataWriter

Inherited from DataWriter[ColumnarBatch]

Inherited from Closeable

Inherited from AutoCloseable

Inherited from AnyRef

Inherited from Any

Ungrouped