class DeltaGlobalCommitter extends GlobalCommitter[DeltaCommittable, DeltaGlobalCommittable]
A GlobalCommitter implementation for
io.delta.flink.sink.DeltaSink.
It commits written files to the DeltaLog and provides exactly once semantics by guaranteeing
idempotence behaviour of the commit phase. It means that when given the same set of
DeltaCommittable objects (that contain metadata about written files along with unique
identifier of the given Flink's job and checkpoint id) it will never commit them multiple times.
Such behaviour is achieved by constructing transactional id using mentioned app identifier and
checkpointId.
Lifecycle of instances of this class is as follows:
- Instances of this class are being created during a (global) commit stage
- For given commit stage there is only one singleton instance of
DeltaGlobalCommitter - Every instance exists only during given commit stage after finishing particular
checkpoint interval. Despite being bundled to a finish phase of a checkpoint interval
a single instance of
DeltaGlobalCommittermay process committables from multiple checkpoints intervals (it happens e.g. when there was a app's failure and Flink has recovered committables from previous commit stage to be re-committed.
- Alphabetic
- By Inheritance
- DeltaGlobalCommitter
- GlobalCommitter
- AutoCloseable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
- new DeltaGlobalCommitter(conf: Configuration, basePath: Path, rowType: RowType, mergeSchema: Boolean)
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
close(): Unit
- Definition Classes
- DeltaGlobalCommitter → AutoCloseable
- Annotations
- @Override()
-
def
combine(committables: List[DeltaCommittable]): DeltaGlobalCommittable
Compute an aggregated committable from a list of committables.
Compute an aggregated committable from a list of committables.
We just wrap received list of committables inside a
DeltaGlobalCommitterinstance as we will do all of the processing inGlobalCommitter#commitmethod.- committables
list of committables object that may be coming from multiple checkpoint intervals
- returns
DeltaGlobalCommittableserving as a wrapper class for received committables
- Definition Classes
- DeltaGlobalCommitter → GlobalCommitter
- Annotations
- @Override()
-
def
commit(globalCommittables: List[DeltaGlobalCommittable]): List[DeltaGlobalCommittable]
Commits already written files to the Delta table using unique identifier for the given Flink job (appId) and checkpointId delivered with every committable object.
Commits already written files to the Delta table using unique identifier for the given Flink job (appId) and checkpointId delivered with every committable object. Those ids together construct transactionId that will be used for verification whether given set of files has already been committed to the Delta table.
During commit preparation phase:
- First appId is resolved from any of the provided committables. If no appId is resolved then it means that no committables were provided and no commit is performed. Such situations may happen when e.g. there were no stream events received within given checkpoint interval,
- If appId is successfully resolved then the provided set of committables needs to be
flattened (as one
DeltaGlobalCommittablecontains a list ofDeltaCommittable), mapped toAddFileobjects and then grouped by checkpointId. The grouping part is necessary as committer object may receive committables from different checkpoint intervals, - We process each of the resolved checkpointId in increasing order,
- During processing each of the checkpointId and their committables, we first query the DeltaLog for last committed transaction version for given appId. Here transaction version equals checkpointId. We proceed with the transaction only if current checkpointId is greater than last committed transaction version.
- If above condition is met, then we handle the metadata for data in given stream by comparing the stream's schema with current table snapshot's schema. We proceed with the transaction only when the schemas are matching or when it was explicitly configured during creation of the sink that we can try to update the schema.
- If above validation passes then we prepare the final set of
Actionobjects to be committed along with transaction's metadata and mandatory parameters, - We try to commit the prepared transaction
- If the commit fails then we fail the application as well. If it succeeds then we proceed with the next checkpointId (if any).
- globalCommittables
list of combined committables objects
- returns
always empty collection as we do not want any retry behaviour
- Definition Classes
- DeltaGlobalCommitter → GlobalCommitter
- Annotations
- @Override()
-
def
endOfInput(): Unit
- Definition Classes
- DeltaGlobalCommitter → GlobalCommitter
- Annotations
- @Override()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
filterRecoveredCommittables(globalCommittables: List[DeltaGlobalCommittable]): List[DeltaGlobalCommittable]
Filters committables that will be provided to
GlobalCommitter#commitmethod.Filters committables that will be provided to
GlobalCommitter#commitmethod.We are always returning all the committables as we do not implement any retry behaviour in
GlobalCommitter#commitmethod and always want to try to commit all the received committables.If there will be any previous committables from checkpoint intervals other than the most recent one then we will try to commit them in an idempotent manner during
DeltaGlobalCommitter#commitmethod and not by filtering them.- globalCommittables
list of combined committables objects
- returns
same as input
- Definition Classes
- DeltaGlobalCommitter → GlobalCommitter
- Annotations
- @Override()
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()