package internal
Type Members
-
class
DeltaBucketAssigner[T] extends BucketAssigner[T, String]
Custom implementation of
BucketAssignerclass required to provide behaviour on how to map particular events to buckets (aka partitions).Custom implementation of
BucketAssignerclass required to provide behaviour on how to map particular events to buckets (aka partitions).This implementation can be perceived as a utility class for complying to the DeltaLake's partitioning style (that follows Apache Hive's partitioning style by providing the partitioning column's and its values as FS directories paths, e.g. "/some_path/table_1/date=2020-01-01") It's still possible for users to roll out their own version of
BucketAssignerand pass it to theDeltaSinkBuilderduring creation of the sink.This
DeltaBucketAssigneris applicable only toDeltaSinkBuilderand not toRowDataDeltaSinkBuilder. The former lets you use thisDeltaBucketAssignerto provide the required custom bucketing behaviour, while the latter doesn't expose a custom bucketing API, and you can provide the partition column keys only.Thus, this
DeltaBucketAssigneris currently not exposed to the user through any public API.In the future, if you'd like to implement your own custom bucketing...
///////////////////////////////////////////////////////////////////////////////// // implements a custom partition computer ///////////////////////////////////////////////////////////////////////////////// static class CustomPartitionColumnComputer implements DeltaPartitionComputer<RowData> { @Override public LinkedHashMap<String, String> generatePartitionValues( RowData element, BucketAssigner.Context context) { String f1 = element.getString(0).toString(); int f3 = element.getInt(2); LinkedHashMap<String, String> partitionSpec = new LinkedHashMap<>(); partitionSpec.put("f1", f1); partitionSpec.put("f3", Integer.toString(f3)); return partitionSpec; } } ... ///////////////////////////////////////// // creates partition assigner for a custom partition computer ///////////////////////////////////////// DeltaBucketAssignerInternal<RowData> partitionAssigner = new DeltaBucketAssignerInternal<>(new CustomPartitionColumnComputer()); ... ///////////////////////////////////////////////////////////////////////////////// // create the builder ///////////////////////////////////////////////////////////////////////////////// DeltaSinkBuilder<RowData></RowData> foo = new DeltaSinkBuilder.DefaultDeltaFormatBuilder<>( ..., partitionAssigner, ...) - trait DeltaPartitionComputer[T] extends Serializable
-
class
DeltaSinkBuilder[IN] extends Serializable
A builder class for
DeltaSinkInternal.A builder class for
DeltaSinkInternal.For most common use cases use
DeltaSink#forRowDatautility method to instantiate the sink. This builder should be used only if you need to provide custom writer factory instance or configure some low level settings for the sink.Example how to use this class for the stream of
RowData:RowType rowType = ...; Configuration conf = new Configuration(); conf.set("parquet.compression", "SNAPPY"); ParquetWriterFactory<RowData> writerFactory = ParquetRowDataBuilder.createWriterFactory(rowType, conf, true); DeltaSinkBuilder<RowData> sinkBuilder = new DeltaSinkBuilder( basePath, conf, bucketCheckInterval, writerFactory, new BasePathBucketAssigner<>(), OnCheckpointRollingPolicy.build(), OutputFileConfig.builder().withPartSuffix(".snappy.parquet").build(), appId, rowType, mergeSchema ); DeltaSink<RowData> sink = sinkBuilder.build(); -
class
DeltaSinkInternal[IN] extends Sink[IN, DeltaCommittable, DeltaWriterBucketState, DeltaGlobalCommittable]
A unified sink that emits its input elements to file system files within buckets using Parquet format and commits those files to the
io.delta.standalone.DeltaLog.A unified sink that emits its input elements to file system files within buckets using Parquet format and commits those files to the
io.delta.standalone.DeltaLog. This sink achieves exactly-once semantics for bothBATCHandSTREAMING.Behaviour of this sink splits down upon two phases. The first phase takes place between application's checkpoints when records are being flushed to files (or appended to writers' buffers) where the behaviour is almost identical as in case of
org.apache.flink.connector.file.sink.FileSink.Next during the checkpoint phase files are "closed" (renamed) by the independent instances of
io.delta.flink.sink.internal.committer.DeltaCommitterthat behave very similar toorg.apache.flink.connector.file.sink.committer.FileCommitter. When all the parallel committers are done, then all the files are committed at once by single-parallelismio.delta.flink.sink.internal.committer.DeltaGlobalCommitter.This
DeltaSinkInternalsources many specific implementations from theorg.apache.flink.connector.file.sink.FileSinkso for most of the low level behaviour one may refer to the docs from this module. The most notable differences to the FileSinks are:- tightly coupling DeltaSink to the Bulk-/ParquetFormat
- extending committable information with files metadata (name, size, rows, last update timestamp)
- providing DeltaLake-specific behaviour which is mostly contained in the
io.delta.flink.sink.internal.committer.DeltaGlobalCommitterimplementing the commit to theio.delta.standalone.DeltaLogat the final stage of each checkpoint.
-
class
DeltaSinkOptions extends AnyRef
This class contains all available options for
io.delta.flink.sink.DeltaSinkwith their type and default values. -
class
SchemaConverter extends AnyRef
This is a utility class to convert from Flink's specific
RowTypeinto DeltaLake's specificStructTypewhich is used for schema-matching comparisons duringio.delta.standalone.DeltaLogcommits.