package committer
Type Members
-
class
DeltaCommitter extends Committer[DeltaCommittable]
Committer implementation for
DeltaSink.Committer implementation for
DeltaSink.This committer is responsible for taking staged part-files, i.e. part-files in "pending" state, created by the
io.delta.flink.sink.internal.writer.DeltaWriterand put them in "finished" state ready to be committed to the DeltaLog during "global" commit.This class behaves almost in the same way as its equivalent
org.apache.flink.connector.file.sink.committer.FileCommitterin theorg.apache.flink.connector.file.sink.FileSink. The only differences are:- use of the
DeltaCommittableinstead oforg.apache.flink.connector.file.sink.FileSinkCommittable - some simplifications for the committable's internal information and commit behaviour.
In particular in
DeltaCommitter#commitmethod we do not take care of any inprogress file's state (as opposite toorg.apache.flink.connector.file.sink.committer.FileCommitter#commitbecause inDeltaWriter#prepareCommitwe always roll all of the in-progress files. Valid note here is that's also the defaultorg.apache.flink.connector.file.sink.FileSink's behaviour for all of the bulk formats (Parquet included).
Lifecycle of instances of this class is as follows:
- Instances of this class are being created during a commit stage
- For every
DeltaWriterobject there is only one of correspondingDeltaCommittercreated, thus the number of created instances is equal to the parallelism of the application's sink - Every instance exists only during given commit stage after finishing particular
checkpoint interval. Despite being bundled to a finish phase of a checkpoint interval
a single instance of
DeltaCommittermay process committables from multiple checkpoints intervals (it happens e.g. when there was a app's failure and Flink has recovered committables from previous commit stage to be re-committed.
- use of the
-
class
DeltaGlobalCommitter extends GlobalCommitter[DeltaCommittable, DeltaGlobalCommittable]
A
GlobalCommitterimplementation forio.delta.flink.sink.DeltaSink.A
GlobalCommitterimplementation forio.delta.flink.sink.DeltaSink.It commits written files to the DeltaLog and provides exactly once semantics by guaranteeing idempotence behaviour of the commit phase. It means that when given the same set of
DeltaCommittableobjects (that contain metadata about written files along with unique identifier of the given Flink's job and checkpoint id) it will never commit them multiple times. Such behaviour is achieved by constructing transactional id using mentioned app identifier and checkpointId.Lifecycle of instances of this class is as follows:
- Instances of this class are being created during a (global) commit stage
- For given commit stage there is only one singleton instance of
DeltaGlobalCommitter - Every instance exists only during given commit stage after finishing particular
checkpoint interval. Despite being bundled to a finish phase of a checkpoint interval
a single instance of
DeltaGlobalCommittermay process committables from multiple checkpoints intervals (it happens e.g. when there was a app's failure and Flink has recovered committables from previous commit stage to be re-committed.