package filesystem
Type Members
-
class
DeltaBulkBucketWriter[IN, BucketID] extends BulkBucketWriter[IN, BucketID]
A factory that creates
DeltaBulkPartWriters.A factory that creates
DeltaBulkPartWriters.This class is provided as a part of workaround for getting actual file size.
Compared to its original version
BulkPartWriterit changes only the return types for methodsDeltaBulkBucketWriter#resumeFromandDeltaBulkBucketWriter#openNewto a custom implementation ofBulkPartWriterthat isDeltaBulkPartWriter. -
class
DeltaBulkPartWriter[IN, BucketID] extends AbstractPartFileWriter[IN, BucketID]
This class is an implementation of
InProgressFileWriterfor writing elements to a part usingBulkPartWriter.This class is an implementation of
InProgressFileWriterfor writing elements to a part usingBulkPartWriter. This also implements thePartFileInfo.An instance of this class represents one in-progress files that is currently "opened" by one of the
io.delta.flink.sink.internal.writer.DeltaWriterBucketinstance.It's provided as a workaround for getting actual size of in-progress file right before transitioning it to a pending state ("closing").
The changed behaviour compared to the original
BulkPartWriterincludes addingDeltaBulkPartWriter#closeWritermethod which is called first during "close" operation for in-progress file. After calling it we can safely get the actual file size and then callDeltaBulkPartWriter#closeForCommit()method.This workaround is needed because for Parquet format the writer's buffer needs to be explicitly flushed before getting the file size (and there is also no easy why to track the bytes send to the writer). If such a flush will not be performed then
PartFileInfo#getSizewill show file size without considering data buffered in writer's memory (which in most cases are all the events consumed within given checkpoint interval).Lifecycle of instances of this class is as follows:
- Since it's a class member of
DeltaInProgressPartit shares its life span as well - Instances of this class are being created inside
io.delta.flink.sink.internal.writer.DeltaWriterBucketmethod every time a bucket processes the first event or if the previously opened file met conditions for rolling (e.g. size threshold) - Its life span holds as long as the underlying file stays in an in-progress state (so until it's "rolled"), but no longer then single checkpoint interval.
- During pre-commit phase every existing
DeltaInProgressPartinstance is automatically transformed ("rolled") into aDeltaPendingFileinstance
This class is almost exact copy of
OutputStreamBasedPartFileWriter. The only modified behaviour is extendingDeltaBulkPartWriter#closeWriter()method with flushing of the internal buffer. - Since it's a class member of
-
class
DeltaInProgressPart[IN] extends AnyRef
Wrapper class for part files in the
io.delta.flink.sink.DeltaSink.Wrapper class for part files in the
io.delta.flink.sink.DeltaSink. Part files are files that are currently "opened" for writing new data. Similar behaviour might be observed in theorg.apache.flink.connector.file.sink.FileSinkhowever as opposite to the FileSink, in DeltaSink we need to keep the name of the file attached to the opened file in order to be further able to transformDeltaInProgressPartinstance intoDeltaPendingFileinstance and finally to commit the written file to theio.delta.standalone.DeltaLogduring global commit phase.Additionally, we need a custom implementation of
DeltaBulkPartWriteras a workaround for getting actual file size (what is currently not possible for bulk formats when operating on an interface level ofPartFileInfo, seeDeltaBulkPartWriterfor details).Lifecycle of instances of this class is as follows:
- Instances of this class are being created inside
io.delta.flink.sink.internal.writer.DeltaWriterBucket#rollPartFilemethod every time a bucket processes the first event or if the previously opened file met conditions for rolling (e.g. size threshold) - It's life span holds as long as the underlying file stays in an in-progress state (so until it's "rolled"), but no longer then single checkpoint interval.
- During pre-commit phase every existing
DeltaInProgressPartinstance is automatically transformed ("rolled") into aDeltaPendingFileinstance
- Instances of this class are being created inside
-
class
DeltaPendingFile extends AnyRef
Wrapper class for
InProgressFileWriter.PendingFileRecoverableobject.Wrapper class for
InProgressFileWriter.PendingFileRecoverableobject. This class carries the internal committable information to be used during the checkpoint/commit phase.As similar to
org.apache.flink.connector.file.sink.FileSinkwe need to carryInProgressFileWriter.PendingFileRecoverableinformation to perform "local" commit on file that the sink has written data to. However, as opposite to mentioned FileSink, in DeltaSink we need to perform also "global" commit to theio.delta.standalone.DeltaLogand for that additional file metadata must be provided. Hence, this class provides the required information for both types of commits by wrapping pending file and attaching file's metadata.Lifecycle of instances of this class is as follows:
- Instances of this class are being created inside
io.delta.flink.sink.internal.writer.DeltaWriterBucket#closePartFilemethod every time when any in-progress is called to be closed. This happens either when some conditions for closing are met or at the end of every checkpoint interval during a pre-commit phase when we are closing all the open files in all buckets - Its life span holds only until the end of a checkpoint interval
- During pre-commit phase (and after closing every in-progress files) every existing
DeltaPendingFileinstance is automatically transformed into aDeltaCommittableinstance
- Instances of this class are being created inside