Packages

c

io.delta.flink.sink.internal

DeltaSinkBuilder

class DeltaSinkBuilder[IN] extends Serializable

A builder class for DeltaSinkInternal.

For most common use cases use DeltaSink#forRowData utility method to instantiate the sink. This builder should be used only if you need to provide custom writer factory instance or configure some low level settings for the sink.

Example how to use this class for the stream of RowData:

    RowType rowType = ...;
    Configuration conf = new Configuration();
    conf.set("parquet.compression", "SNAPPY");
    ParquetWriterFactory<RowData> writerFactory =
        ParquetRowDataBuilder.createWriterFactory(rowType, conf, true);

    DeltaSinkBuilder<RowData> sinkBuilder = new DeltaSinkBuilder(
        basePath,
        conf,
        bucketCheckInterval,
        writerFactory,
        new BasePathBucketAssigner<>(),
        OnCheckpointRollingPolicy.build(),
        OutputFileConfig.builder().withPartSuffix(".snappy.parquet").build(),
        appId,
        rowType,
        mergeSchema
    );

    DeltaSink<RowData> sink = sinkBuilder.build();

Linear Supertypes
Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DeltaSinkBuilder
  2. Serializable
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DeltaSinkBuilder(basePath: Path, conf: Configuration, bucketCheckInterval: Long, writerFactory: ParquetWriterFactory[IN], assigner: BucketAssigner[IN, String], policy: CheckpointRollingPolicy[IN, String], outputFileConfig: OutputFileConfig, appId: String, rowType: RowType, mergeSchema: Boolean, sinkConfiguration: DeltaConnectorConfiguration)

    Creates instance of the builder for DeltaSink.

    Creates instance of the builder for DeltaSink.

    basePath

    path to a Delta table

    conf

    Hadoop's conf object

    bucketCheckInterval

    interval (in milliseconds) for triggering Sink.ProcessingTimeService within internal io.delta.flink.sink.internal.writer.DeltaWriter instance

    writerFactory

    a factory that in runtime is used to create instances of org.apache.flink.api.common.serialization.BulkWriter

    assigner

    BucketAssigner used with a Delta sink to determine the bucket each incoming element should be put into

    policy

    instance of CheckpointRollingPolicy which rolls on every checkpoint by default

    outputFileConfig

    part file name configuration. This allow to define a prefix and a suffix to the part file name.

    appId

    unique identifier of the Flink application that will be used as a part of transactional id in Delta's transactions. It is crucial for this value to be unique across all applications committing to a given Delta table

    rowType

    Flink's logical type to indicate the structure of the events in the stream

    mergeSchema

    indicator whether we should try to update table's schema with stream's schema in case those will not match. The update is not guaranteed as there will be still some checks performed whether the updates to the schema are compatible.

    Attributes
    protected[internal]
  2. new DeltaSinkBuilder(basePath: Path, conf: Configuration, writerFactory: ParquetWriterFactory[IN], assigner: BucketAssigner[IN, String], policy: CheckpointRollingPolicy[IN, String], rowType: RowType, mergeSchema: Boolean, sinkConfiguration: DeltaConnectorConfiguration)

    Attributes
    protected[internal]

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def build(): DeltaSinkInternal[IN]

    Creates the actual sink.

    Creates the actual sink.

    returns

    constructed DeltaSink object

  6. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  7. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  8. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  9. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  10. def getAppId(): String
    Attributes
    protected[internal]
  11. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  12. def getSerializableConfiguration(): SerializableConfiguration
    Attributes
    protected[internal]
  13. def getTableBasePath(): Path
    Attributes
    protected[internal]
  14. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  15. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  16. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  17. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  18. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  19. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  20. def toString(): String
    Definition Classes
    AnyRef → Any
  21. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  22. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  23. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  24. def withBucketAssigner(assigner: BucketAssigner[IN, String]): DeltaSinkBuilder[IN]

    Sets bucket assigner responsible for mapping events to its partitions.

    Sets bucket assigner responsible for mapping events to its partitions.

    assigner

    bucket assigner instance for this sink

    returns

    builder for DeltaSink

  25. def withMergeSchema(mergeSchema: Boolean): DeltaSinkBuilder[IN]

    Sets the sink's option whether in case of any differences between stream's schema and Delta table's schema we should try to update it during commit to the io.delta.standalone.DeltaLog.

    Sets the sink's option whether in case of any differences between stream's schema and Delta table's schema we should try to update it during commit to the io.delta.standalone.DeltaLog. The update is not guaranteed as there will be some compatibility checks performed.

    mergeSchema

    whether we should try to update table's schema with stream's schema in case those will not match. See DeltaSinkBuilder#mergeSchema for details.

    returns

    builder for DeltaSink

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped