class DeltaSink[IN] extends DeltaSinkInternal[IN]
A unified sink that emits its input elements to file system files within buckets using
Parquet format and commits those files to the DeltaLog. This sink achieves exactly-once
semantics for both BATCH and STREAMING.
For most use cases users should use DeltaSink#forRowData utility method to instantiate
the sink which provides proper writer factory implementation for the stream of RowData.
To create new instance of the sink to a non-partitioned Delta table for stream of
RowData:
DataStream<RowData> stream = ...;
RowType rowType = ...;
...
// sets a sink to a non-partitioned Delta table
DeltaSink<RowData> deltaSink = DeltaSink.forRowData(
new Path(deltaTablePath),
new Configuration(),
rowType).build();
stream.sinkTo(deltaSink);
To create new instance of the sink to a partitioned Delta table for stream of RowData:
String[] partitionCols = ...; // array of partition columns' names
DeltaSink<RowData> deltaSink = DeltaSink.forRowData(
new Path(deltaTablePath),
new Configuration(),
rowType)
.withPartitionColumns(partitionCols)
.build();
stream.sinkTo(deltaSink);
Behaviour of this sink splits down upon two phases. The first phase takes place between
application's checkpoints when records are being flushed to files (or appended to writers'
buffers) where the behaviour is almost identical as in case of
org.apache.flink.connector.file.sink.FileSink.
Next during the checkpoint phase files are "closed" (renamed) by the independent instances of
io.delta.flink.sink.internal.committer.DeltaCommitter that behave very similar
to org.apache.flink.connector.file.sink.committer.FileCommitter.
When all the parallel committers are done, then all the files are committed at once by
single-parallelism io.delta.flink.sink.internal.committer.DeltaGlobalCommitter.
- Alphabetic
- By Inheritance
- DeltaSink
- DeltaSinkInternal
- Sink
- Serializable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
createCommitter(): Optional[Committer[DeltaCommittable]]
- Definition Classes
- DeltaSinkInternal → Sink
- Annotations
- @Override()
-
def
createGlobalCommitter(): Optional[GlobalCommitter[DeltaCommittable, DeltaGlobalCommittable]]
- Definition Classes
- DeltaSinkInternal → Sink
- Annotations
- @Override()
-
def
createWriter(context: InitContext, states: List[DeltaWriterBucketState]): SinkWriter[IN, DeltaCommittable, DeltaWriterBucketState]
This method creates the
SinkWriterinstance that will be responsible for passing incoming stream events to the correct bucket writer and then flushed to the underlying files.This method creates the
SinkWriterinstance that will be responsible for passing incoming stream events to the correct bucket writer and then flushed to the underlying files.The logic for resolving constructor params differ depending on whether any previous writer's states were provided. If there are no previous states then we assume that this is a fresh start of the app and set next checkpoint id in
io.delta.flink.sink.internal.writer .DeltaWriterto 1 and app id is taken from theDeltaSinkBuilder#getAppIdwhat guarantees us that each writer will get the same value. In other case, if we are provided by the Flink framework with some previous writers' states then we use those to restore values of appId and nextCheckpointId.- context
SinkWriterinit context object- states
restored states of the writers. Will be empty collection for fresh start.
- returns
new
SinkWriterobject
- Definition Classes
- DeltaSinkInternal → Sink
- Annotations
- @Override()
- Exceptions thrown
IOExceptionWhen the recoverable writer cannot be instantiated.
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getCommittableSerializer(): Optional[SimpleVersionedSerializer[DeltaCommittable]]
- Definition Classes
- DeltaSinkInternal → Sink
- Annotations
- @Override()
-
def
getCompatibleStateNames(): Collection[String]
- Definition Classes
- Sink
-
def
getGlobalCommittableSerializer(): Optional[SimpleVersionedSerializer[DeltaGlobalCommittable]]
- Definition Classes
- DeltaSinkInternal → Sink
- Annotations
- @Override()
-
def
getWriterStateSerializer(): Optional[SimpleVersionedSerializer[DeltaWriterBucketState]]
- Definition Classes
- DeltaSinkInternal → Sink
- Annotations
- @Override()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()