Packages

package committer

Type Members

  1. class DeltaCommitter extends Committer[DeltaCommittable]

    Committer implementation for DeltaSink.

    Committer implementation for DeltaSink.

    This committer is responsible for taking staged part-files, i.e. part-files in "pending" state, created by the io.delta.flink.sink.internal.writer.DeltaWriter and put them in "finished" state ready to be committed to the DeltaLog during "global" commit.

    This class behaves almost in the same way as its equivalent org.apache.flink.connector.file.sink.committer.FileCommitter in the org.apache.flink.connector.file.sink.FileSink. The only differences are:

    • use of the DeltaCommittable instead of org.apache.flink.connector.file.sink.FileSinkCommittable
    • some simplifications for the committable's internal information and commit behaviour. In particular in DeltaCommitter#commit method we do not take care of any inprogress file's state (as opposite to org.apache.flink.connector.file.sink.committer.FileCommitter#commit because in DeltaWriter#prepareCommit we always roll all of the in-progress files. Valid note here is that's also the default org.apache.flink.connector.file.sink.FileSink's behaviour for all of the bulk formats (Parquet included).

    Lifecycle of instances of this class is as follows:

    • Instances of this class are being created during a commit stage
    • For every DeltaWriter object there is only one of corresponding DeltaCommitter created, thus the number of created instances is equal to the parallelism of the application's sink
    • Every instance exists only during given commit stage after finishing particular checkpoint interval. Despite being bundled to a finish phase of a checkpoint interval a single instance of DeltaCommitter may process committables from multiple checkpoints intervals (it happens e.g. when there was a app's failure and Flink has recovered committables from previous commit stage to be re-committed.
  2. class DeltaGlobalCommitter extends GlobalCommitter[DeltaCommittable, DeltaGlobalCommittable]

    A GlobalCommitter implementation for io.delta.flink.sink.DeltaSink.

    A GlobalCommitter implementation for io.delta.flink.sink.DeltaSink.

    It commits written files to the DeltaLog and provides exactly once semantics by guaranteeing idempotence behaviour of the commit phase. It means that when given the same set of DeltaCommittable objects (that contain metadata about written files along with unique identifier of the given Flink's job and checkpoint id) it will never commit them multiple times. Such behaviour is achieved by constructing transactional id using mentioned app identifier and checkpointId.

    Lifecycle of instances of this class is as follows:

    • Instances of this class are being created during a (global) commit stage
    • For given commit stage there is only one singleton instance of DeltaGlobalCommitter
    • Every instance exists only during given commit stage after finishing particular checkpoint interval. Despite being bundled to a finish phase of a checkpoint interval a single instance of DeltaGlobalCommitter may process committables from multiple checkpoints intervals (it happens e.g. when there was a app's failure and Flink has recovered committables from previous commit stage to be re-committed.

Ungrouped