class DeltaSinkInternal[IN] extends Sink[IN, DeltaCommittable, DeltaWriterBucketState, DeltaGlobalCommittable]
A unified sink that emits its input elements to file system files within buckets using Parquet
format and commits those files to the io.delta.standalone.DeltaLog. This sink achieves
exactly-once semantics for both BATCH and STREAMING.
Behaviour of this sink splits down upon two phases. The first phase takes place between
application's checkpoints when records are being flushed to files (or appended to writers'
buffers) where the behaviour is almost identical as in case of org.apache.flink.connector.file.sink.FileSink.
Next during the checkpoint phase files are "closed" (renamed) by the independent instances of
io.delta.flink.sink.internal.committer.DeltaCommitter that behave very similar to org.apache.flink.connector.file.sink.committer.FileCommitter. When all the parallel committers
are done, then all the files are committed at once by single-parallelism
io.delta.flink.sink.internal.committer.DeltaGlobalCommitter.
This DeltaSinkInternal sources many specific implementations from the org.apache.flink.connector.file.sink.FileSink so for most of the low level behaviour one may
refer to the docs from this module. The most notable differences to the FileSinks are:
- tightly coupling DeltaSink to the Bulk-/ParquetFormat
- extending committable information with files metadata (name, size, rows, last update timestamp)
- providing DeltaLake-specific behaviour which is mostly contained in the
io.delta.flink.sink.internal.committer.DeltaGlobalCommitterimplementing the commit to theio.delta.standalone.DeltaLogat the final stage of each checkpoint.
- Alphabetic
- By Inheritance
- DeltaSinkInternal
- Sink
- Serializable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
DeltaSinkInternal(sinkBuilder: DeltaSinkBuilder[IN])
- Attributes
- protected[internal]
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
createCommitter(): Optional[Committer[DeltaCommittable]]
- Definition Classes
- DeltaSinkInternal → Sink
- Annotations
- @Override()
-
def
createGlobalCommitter(): Optional[GlobalCommitter[DeltaCommittable, DeltaGlobalCommittable]]
- Definition Classes
- DeltaSinkInternal → Sink
- Annotations
- @Override()
-
def
createWriter(context: InitContext, states: List[DeltaWriterBucketState]): SinkWriter[IN, DeltaCommittable, DeltaWriterBucketState]
This method creates the
SinkWriterinstance that will be responsible for passing incoming stream events to the correct bucket writer and then flushed to the underlying files.This method creates the
SinkWriterinstance that will be responsible for passing incoming stream events to the correct bucket writer and then flushed to the underlying files.The logic for resolving constructor params differ depending on whether any previous writer's states were provided. If there are no previous states then we assume that this is a fresh start of the app and set next checkpoint id in
io.delta.flink.sink.internal.writer .DeltaWriterto 1 and app id is taken from theDeltaSinkBuilder#getAppIdwhat guarantees us that each writer will get the same value. In other case, if we are provided by the Flink framework with some previous writers' states then we use those to restore values of appId and nextCheckpointId.- context
SinkWriterinit context object- states
restored states of the writers. Will be empty collection for fresh start.
- returns
new
SinkWriterobject
- Definition Classes
- DeltaSinkInternal → Sink
- Annotations
- @Override()
- Exceptions thrown
IOExceptionWhen the recoverable writer cannot be instantiated.
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getCommittableSerializer(): Optional[SimpleVersionedSerializer[DeltaCommittable]]
- Definition Classes
- DeltaSinkInternal → Sink
- Annotations
- @Override()
-
def
getCompatibleStateNames(): Collection[String]
- Definition Classes
- Sink
-
def
getGlobalCommittableSerializer(): Optional[SimpleVersionedSerializer[DeltaGlobalCommittable]]
- Definition Classes
- DeltaSinkInternal → Sink
- Annotations
- @Override()
-
def
getWriterStateSerializer(): Optional[SimpleVersionedSerializer[DeltaWriterBucketState]]
- Definition Classes
- DeltaSinkInternal → Sink
- Annotations
- @Override()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()