class GpuDynamicPartitionDataSingleWriter extends GpuFileFormatDataWriter
Dynamic partition writer with single writer, meaning only one writer is opened at any time for writing, meaning this single function can write to multiple directories (partitions) or files (bucketing). The data to be written are required to be sorted on partition and/or bucket column(s) before writing.
- Alphabetic
- By Inheritance
- GpuDynamicPartitionDataSingleWriter
- GpuFileFormatDataWriter
- DataWriter
- Closeable
- AutoCloseable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
- new GpuDynamicPartitionDataSingleWriter(description: GpuWriteJobDescription, taskAttemptContext: TaskAttemptContext, committer: FileCommitProtocol)
Type Members
-
class
WriterIndex extends Product2[Option[String], Option[Int]]
Wrapper class to index a unique concurrent output writer.
Wrapper class to index a unique concurrent output writer.
- Attributes
- protected
-
class
WriterAndStatus extends AnyRef
- Attributes
- protected
- Definition Classes
- GpuFileFormatDataWriter
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
val
MAX_FILE_COUNTER: Int
Max number of files a single task writes out due to file size.
Max number of files a single task writes out due to file size. In most cases the number of files written should be very small. This is just a safe guard to protect some really bad settings, e.g. maxRecordsPerFile = 1.
- Attributes
- protected
- Definition Classes
- GpuFileFormatDataWriter
-
def
abort(): Unit
- Definition Classes
- GpuFileFormatDataWriter → DataWriter
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
close(): Unit
- Definition Classes
- GpuFileFormatDataWriter → Closeable → AutoCloseable
-
def
commit(): WriteTaskResult
Returns the summary of relative information which includes the list of partition strings written out.
Returns the summary of relative information which includes the list of partition strings written out. The list of partitions is sent back to the driver and used to update the catalog. Other information will be sent back to the driver too and used to e.g. update the metrics in UI.
- Definition Classes
- GpuFileFormatDataWriter → DataWriter
-
def
copyToHostAsBatch(input: Table, colTypes: Array[DataType]): ColumnarBatch
- Attributes
- protected
-
def
currentMetricsValues(): Array[CustomTaskMetric]
- Definition Classes
- DataWriter
-
var
currentWriterStatus: WriterAndStatus
- Attributes
- protected
- Definition Classes
- GpuFileFormatDataWriter
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
genGetBucketIdFunc(keyHostCb: ColumnarBatch): (Int) ⇒ Option[Int]
- Attributes
- protected
-
def
genGetPartitionPathFunc(keyHostCb: ColumnarBatch): (Int) ⇒ Option[String]
- Attributes
- protected
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
lazy val
getDataColumnsAsBatch: (ColumnarBatch) ⇒ ColumnarBatch
Extracts the output values of an input batch.
Extracts the output values of an input batch.
- Attributes
- protected
-
def
getKeysBatch(cb: ColumnarBatch): ColumnarBatch
- Attributes
- protected
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
val
isBucketed: Boolean
Flag saying whether or not the data to be written out is bucketed.
Flag saying whether or not the data to be written out is bucketed.
- Attributes
- protected
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
val
isPartitioned: Boolean
Flag saying whether or not the data to be written out is partitioned.
Flag saying whether or not the data to be written out is partitioned.
- Attributes
- protected
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
newWriter(partDir: Option[String], bucketId: Option[Int], fileCounter: Int): ColumnarOutputWriter
Opens a new OutputWriter given a partition key and/or a bucket id.
Opens a new OutputWriter given a partition key and/or a bucket id. If bucket id is specified, we will append it to the end of the file name, but before the file extension, e.g. part-r-00009-ea518ad4-455a-4431-b471-d24e03814677-00002.gz.parquet
- partDir
the partition directory
- bucketId
the bucket which all tuples being written by this OutputWriter belong to, currently does not support
bucketId, it's always None- fileCounter
integer indicating the number of files to be written to
partDir
- Annotations
- @nowarn()
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
preUpdateCurrentWriterStatus(curWriterId: WriterIndex): Unit
Called just before updating the current writer status when seeing a new partition or a bucket.
Called just before updating the current writer status when seeing a new partition or a bucket.
- curWriterId
the current writer index
- Attributes
- protected
-
final
def
releaseOutWriter(status: WriterAndStatus): Unit
Release resources of a WriterStatus.
Release resources of a WriterStatus.
- Attributes
- protected
- Definition Classes
- GpuFileFormatDataWriter
-
def
releaseResources(): Unit
Release all resources.
Release all resources. Public for testing
- Definition Classes
- GpuFileFormatDataWriter
-
final
def
renewOutWriter(newWriterId: WriterIndex, curWriterStatus: WriterAndStatus, closeOldWriter: Boolean = true): Unit
Create a new writer according to the given writer id, and update the given writer status.
Create a new writer according to the given writer id, and update the given writer status. It also closes the old writer in the writer status by default.
- Attributes
- protected
-
def
setupCurrentWriter(newWriterId: WriterIndex, curWriterStatus: WriterAndStatus, closeOldWriter: Boolean = true): Unit
Set up a writer to the given writer status for the given writer id.
Set up a writer to the given writer status for the given writer id. It will create a new one if needed. This is used when seeing a new partition and(or) a new bucket id.
- Attributes
- protected
-
val
statsTrackers: Seq[ColumnarWriteTaskStatsTracker]
Trackers for computing various statistics on the data as it's being written out.
Trackers for computing various statistics on the data as it's being written out.
- Attributes
- protected
- Definition Classes
- GpuFileFormatDataWriter
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
val
updatedPartitions: Set[String]
- Attributes
- protected
- Definition Classes
- GpuFileFormatDataWriter
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
write(batch: ColumnarBatch): Unit
Writes a columnar batch of records
Writes a columnar batch of records
- Definition Classes
- GpuDynamicPartitionDataSingleWriter → GpuFileFormatDataWriter → DataWriter
-
final
def
writeBatchPerMaxRecordsAndClose(scb: SpillableColumnarBatch, writerId: WriterIndex, writerStatus: WriterAndStatus): Unit
- Attributes
- protected
-
final
def
writeUpdateMetricsAndClose(scb: SpillableColumnarBatch, writerStatus: WriterAndStatus): Unit
- Attributes
- protected
- Definition Classes
- GpuFileFormatDataWriter
-
def
writeWithIterator(iterator: Iterator[ColumnarBatch]): Unit
Write an iterator of column batch.
Write an iterator of column batch.
- Definition Classes
- GpuFileFormatDataWriter