Packages

package sink

Type Members

  1. class DeltaSink[IN] extends DeltaSinkInternal[IN]

    A unified sink that emits its input elements to file system files within buckets using Parquet format and commits those files to the DeltaLog.

    A unified sink that emits its input elements to file system files within buckets using Parquet format and commits those files to the DeltaLog. This sink achieves exactly-once semantics for both BATCH and STREAMING.

    For most use cases users should use DeltaSink#forRowData utility method to instantiate the sink which provides proper writer factory implementation for the stream of RowData.

    To create new instance of the sink to a non-partitioned Delta table for stream of RowData:

        DataStream<RowData> stream = ...;
        RowType rowType = ...;
        ...
    
        // sets a sink to a non-partitioned Delta table
        DeltaSink<RowData> deltaSink = DeltaSink.forRowData(
                new Path(deltaTablePath),
                new Configuration(),
                rowType).build();
        stream.sinkTo(deltaSink);
    

    To create new instance of the sink to a partitioned Delta table for stream of RowData:

        String[] partitionCols = ...; // array of partition columns' names
    
        DeltaSink<RowData> deltaSink = DeltaSink.forRowData(
                new Path(deltaTablePath),
                new Configuration(),
                rowType)
            .withPartitionColumns(partitionCols)
            .build();
        stream.sinkTo(deltaSink);
    

    Behaviour of this sink splits down upon two phases. The first phase takes place between application's checkpoints when records are being flushed to files (or appended to writers' buffers) where the behaviour is almost identical as in case of org.apache.flink.connector.file.sink.FileSink. Next during the checkpoint phase files are "closed" (renamed) by the independent instances of io.delta.flink.sink.internal.committer.DeltaCommitter that behave very similar to org.apache.flink.connector.file.sink.committer.FileCommitter. When all the parallel committers are done, then all the files are committed at once by single-parallelism io.delta.flink.sink.internal.committer.DeltaGlobalCommitter.

  2. class RowDataDeltaSinkBuilder extends AnyRef

    A builder class for DeltaSink for a stream of RowData.

    A builder class for DeltaSink for a stream of RowData.

    For most common use cases use DeltaSink#forRowData utility method to instantiate the sink. After instantiation of this builder you can either call RowDataDeltaSinkBuilder#build() method to get the instance of a DeltaSink or configure additional behaviour (like merging of the schema or setting partition columns) and then build the sink.

Ungrouped