Packages

c

io.delta.flink.sink

DeltaSink

class DeltaSink[IN] extends DeltaSinkInternal[IN]

A unified sink that emits its input elements to file system files within buckets using Parquet format and commits those files to the DeltaLog. This sink achieves exactly-once semantics for both BATCH and STREAMING.

For most use cases users should use DeltaSink#forRowData utility method to instantiate the sink which provides proper writer factory implementation for the stream of RowData.

To create new instance of the sink to a non-partitioned Delta table for stream of RowData:

    DataStream<RowData> stream = ...;
    RowType rowType = ...;
    ...

    // sets a sink to a non-partitioned Delta table
    DeltaSink<RowData> deltaSink = DeltaSink.forRowData(
            new Path(deltaTablePath),
            new Configuration(),
            rowType).build();
    stream.sinkTo(deltaSink);

To create new instance of the sink to a partitioned Delta table for stream of RowData:

    String[] partitionCols = ...; // array of partition columns' names

    DeltaSink<RowData> deltaSink = DeltaSink.forRowData(
            new Path(deltaTablePath),
            new Configuration(),
            rowType)
        .withPartitionColumns(partitionCols)
        .build();
    stream.sinkTo(deltaSink);

Behaviour of this sink splits down upon two phases. The first phase takes place between application's checkpoints when records are being flushed to files (or appended to writers' buffers) where the behaviour is almost identical as in case of org.apache.flink.connector.file.sink.FileSink. Next during the checkpoint phase files are "closed" (renamed) by the independent instances of io.delta.flink.sink.internal.committer.DeltaCommitter that behave very similar to org.apache.flink.connector.file.sink.committer.FileCommitter. When all the parallel committers are done, then all the files are committed at once by single-parallelism io.delta.flink.sink.internal.committer.DeltaGlobalCommitter.

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DeltaSink
  2. DeltaSinkInternal
  3. Sink
  4. Serializable
  5. AnyRef
  6. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  6. def createCommitter(): Optional[Committer[DeltaCommittable]]
    Definition Classes
    DeltaSinkInternal → Sink
    Annotations
    @Override()
  7. def createGlobalCommitter(): Optional[GlobalCommitter[DeltaCommittable, DeltaGlobalCommittable]]
    Definition Classes
    DeltaSinkInternal → Sink
    Annotations
    @Override()
  8. def createWriter(context: InitContext, states: List[DeltaWriterBucketState]): SinkWriter[IN, DeltaCommittable, DeltaWriterBucketState]

    This method creates the SinkWriter instance that will be responsible for passing incoming stream events to the correct bucket writer and then flushed to the underlying files.

    This method creates the SinkWriter instance that will be responsible for passing incoming stream events to the correct bucket writer and then flushed to the underlying files.

    The logic for resolving constructor params differ depending on whether any previous writer's states were provided. If there are no previous states then we assume that this is a fresh start of the app and set next checkpoint id in io.delta.flink.sink.internal.writer .DeltaWriter to 1 and app id is taken from the DeltaSinkBuilder#getAppId what guarantees us that each writer will get the same value. In other case, if we are provided by the Flink framework with some previous writers' states then we use those to restore values of appId and nextCheckpointId.

    context

    SinkWriter init context object

    states

    restored states of the writers. Will be empty collection for fresh start.

    returns

    new SinkWriter object

    Definition Classes
    DeltaSinkInternal → Sink
    Annotations
    @Override()
    Exceptions thrown

    IOException When the recoverable writer cannot be instantiated.

  9. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  10. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  11. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  12. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  13. def getCommittableSerializer(): Optional[SimpleVersionedSerializer[DeltaCommittable]]
    Definition Classes
    DeltaSinkInternal → Sink
    Annotations
    @Override()
  14. def getCompatibleStateNames(): Collection[String]
    Definition Classes
    Sink
  15. def getGlobalCommittableSerializer(): Optional[SimpleVersionedSerializer[DeltaGlobalCommittable]]
    Definition Classes
    DeltaSinkInternal → Sink
    Annotations
    @Override()
  16. def getWriterStateSerializer(): Optional[SimpleVersionedSerializer[DeltaWriterBucketState]]
    Definition Classes
    DeltaSinkInternal → Sink
    Annotations
    @Override()
  17. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  18. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  19. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  20. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  21. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  22. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  23. def toString(): String
    Definition Classes
    AnyRef → Any
  24. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  25. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  26. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()

Inherited from DeltaSinkInternal[IN]

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped