Packages

c

io.delta.flink.sink.internal.committer

DeltaGlobalCommitter

class DeltaGlobalCommitter extends GlobalCommitter[DeltaCommittable, DeltaGlobalCommittable]

A GlobalCommitter implementation for io.delta.flink.sink.DeltaSink.

It commits written files to the DeltaLog and provides exactly once semantics by guaranteeing idempotence behaviour of the commit phase. It means that when given the same set of DeltaCommittable objects (that contain metadata about written files along with unique identifier of the given Flink's job and checkpoint id) it will never commit them multiple times. Such behaviour is achieved by constructing transactional id using mentioned app identifier and checkpointId.

Lifecycle of instances of this class is as follows:

  • Instances of this class are being created during a (global) commit stage
  • For given commit stage there is only one singleton instance of DeltaGlobalCommitter
  • Every instance exists only during given commit stage after finishing particular checkpoint interval. Despite being bundled to a finish phase of a checkpoint interval a single instance of DeltaGlobalCommitter may process committables from multiple checkpoints intervals (it happens e.g. when there was a app's failure and Flink has recovered committables from previous commit stage to be re-committed.
Linear Supertypes
GlobalCommitter[DeltaCommittable, DeltaGlobalCommittable], AutoCloseable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DeltaGlobalCommitter
  2. GlobalCommitter
  3. AutoCloseable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DeltaGlobalCommitter(conf: Configuration, basePath: Path, rowType: RowType, mergeSchema: Boolean)

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  6. def close(): Unit
    Definition Classes
    DeltaGlobalCommitter → AutoCloseable
    Annotations
    @Override()
  7. def combine(committables: List[DeltaCommittable]): DeltaGlobalCommittable

    Compute an aggregated committable from a list of committables.

    Compute an aggregated committable from a list of committables.

    We just wrap received list of committables inside a DeltaGlobalCommitter instance as we will do all of the processing in GlobalCommitter#commit method.

    committables

    list of committables object that may be coming from multiple checkpoint intervals

    returns

    DeltaGlobalCommittable serving as a wrapper class for received committables

    Definition Classes
    DeltaGlobalCommitter → GlobalCommitter
    Annotations
    @Override()
  8. def commit(globalCommittables: List[DeltaGlobalCommittable]): List[DeltaGlobalCommittable]

    Commits already written files to the Delta table using unique identifier for the given Flink job (appId) and checkpointId delivered with every committable object.

    Commits already written files to the Delta table using unique identifier for the given Flink job (appId) and checkpointId delivered with every committable object. Those ids together construct transactionId that will be used for verification whether given set of files has already been committed to the Delta table.

    During commit preparation phase:

    • First appId is resolved from any of the provided committables. If no appId is resolved then it means that no committables were provided and no commit is performed. Such situations may happen when e.g. there were no stream events received within given checkpoint interval,
    • If appId is successfully resolved then the provided set of committables needs to be flattened (as one DeltaGlobalCommittable contains a list of DeltaCommittable), mapped to AddFile objects and then grouped by checkpointId. The grouping part is necessary as committer object may receive committables from different checkpoint intervals,
    • We process each of the resolved checkpointId in increasing order,
    • During processing each of the checkpointId and their committables, we first query the DeltaLog for last committed transaction version for given appId. Here transaction version equals checkpointId. We proceed with the transaction only if current checkpointId is greater than last committed transaction version.
    • If above condition is met, then we handle the metadata for data in given stream by comparing the stream's schema with current table snapshot's schema. We proceed with the transaction only when the schemas are matching or when it was explicitly configured during creation of the sink that we can try to update the schema.
    • If above validation passes then we prepare the final set of Action objects to be committed along with transaction's metadata and mandatory parameters,
    • We try to commit the prepared transaction
    • If the commit fails then we fail the application as well. If it succeeds then we proceed with the next checkpointId (if any).
    globalCommittables

    list of combined committables objects

    returns

    always empty collection as we do not want any retry behaviour

    Definition Classes
    DeltaGlobalCommitter → GlobalCommitter
    Annotations
    @Override()
  9. def endOfInput(): Unit
    Definition Classes
    DeltaGlobalCommitter → GlobalCommitter
    Annotations
    @Override()
  10. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  11. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  12. def filterRecoveredCommittables(globalCommittables: List[DeltaGlobalCommittable]): List[DeltaGlobalCommittable]

    Filters committables that will be provided to GlobalCommitter#commit method.

    Filters committables that will be provided to GlobalCommitter#commit method.

    We are always returning all the committables as we do not implement any retry behaviour in GlobalCommitter#commit method and always want to try to commit all the received committables.

    If there will be any previous committables from checkpoint intervals other than the most recent one then we will try to commit them in an idempotent manner during DeltaGlobalCommitter#commit method and not by filtering them.

    globalCommittables

    list of combined committables objects

    returns

    same as input

    Definition Classes
    DeltaGlobalCommitter → GlobalCommitter
    Annotations
    @Override()
  13. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  14. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  15. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  16. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  17. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  18. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  19. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  20. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  21. def toString(): String
    Definition Classes
    AnyRef → Any
  22. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  23. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  24. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()

Inherited from GlobalCommitter[DeltaCommittable, DeltaGlobalCommittable]

Inherited from AutoCloseable

Inherited from AnyRef

Inherited from Any

Ungrouped