class DeltaSinkBuilder[IN] extends Serializable
A builder class for DeltaSinkInternal.
For most common use cases use DeltaSink#forRowData utility method to instantiate the
sink. This builder should be used only if you need to provide custom writer factory instance
or configure some low level settings for the sink.
Example how to use this class for the stream of RowData:
RowType rowType = ...;
Configuration conf = new Configuration();
conf.set("parquet.compression", "SNAPPY");
ParquetWriterFactory<RowData> writerFactory =
ParquetRowDataBuilder.createWriterFactory(rowType, conf, true);
DeltaSinkBuilder<RowData> sinkBuilder = new DeltaSinkBuilder(
basePath,
conf,
bucketCheckInterval,
writerFactory,
new BasePathBucketAssigner<>(),
OnCheckpointRollingPolicy.build(),
OutputFileConfig.builder().withPartSuffix(".snappy.parquet").build(),
appId,
rowType,
mergeSchema
);
DeltaSink<RowData> sink = sinkBuilder.build();
- Alphabetic
- By Inheritance
- DeltaSinkBuilder
- Serializable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
DeltaSinkBuilder(basePath: Path, conf: Configuration, bucketCheckInterval: Long, writerFactory: ParquetWriterFactory[IN], assigner: BucketAssigner[IN, String], policy: CheckpointRollingPolicy[IN, String], outputFileConfig: OutputFileConfig, appId: String, rowType: RowType, mergeSchema: Boolean, sinkConfiguration: DeltaConnectorConfiguration)
Creates instance of the builder for
DeltaSink.Creates instance of the builder for
DeltaSink.- basePath
path to a Delta table
- conf
Hadoop's conf object
- bucketCheckInterval
interval (in milliseconds) for triggering
Sink.ProcessingTimeServicewithin internalio.delta.flink.sink.internal.writer.DeltaWriterinstance- writerFactory
a factory that in runtime is used to create instances of
org.apache.flink.api.common.serialization.BulkWriter- assigner
BucketAssignerused with a Delta sink to determine the bucket each incoming element should be put into- policy
instance of
CheckpointRollingPolicywhich rolls on every checkpoint by default- outputFileConfig
part file name configuration. This allow to define a prefix and a suffix to the part file name.
- appId
unique identifier of the Flink application that will be used as a part of transactional id in Delta's transactions. It is crucial for this value to be unique across all applications committing to a given Delta table
- rowType
Flink's logical type to indicate the structure of the events in the stream
- mergeSchema
indicator whether we should try to update table's schema with stream's schema in case those will not match. The update is not guaranteed as there will be still some checks performed whether the updates to the schema are compatible.
- Attributes
- protected[internal]
-
new
DeltaSinkBuilder(basePath: Path, conf: Configuration, writerFactory: ParquetWriterFactory[IN], assigner: BucketAssigner[IN, String], policy: CheckpointRollingPolicy[IN, String], rowType: RowType, mergeSchema: Boolean, sinkConfiguration: DeltaConnectorConfiguration)
- Attributes
- protected[internal]
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
build(): DeltaSinkInternal[IN]
Creates the actual sink.
Creates the actual sink.
- returns
constructed
DeltaSinkobject
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
getAppId(): String
- Attributes
- protected[internal]
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getSerializableConfiguration(): SerializableConfiguration
- Attributes
- protected[internal]
-
def
getTableBasePath(): Path
- Attributes
- protected[internal]
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
withBucketAssigner(assigner: BucketAssigner[IN, String]): DeltaSinkBuilder[IN]
Sets bucket assigner responsible for mapping events to its partitions.
Sets bucket assigner responsible for mapping events to its partitions.
- assigner
bucket assigner instance for this sink
- returns
builder for
DeltaSink
-
def
withMergeSchema(mergeSchema: Boolean): DeltaSinkBuilder[IN]
Sets the sink's option whether in case of any differences between stream's schema and Delta table's schema we should try to update it during commit to the
io.delta.standalone.DeltaLog.Sets the sink's option whether in case of any differences between stream's schema and Delta table's schema we should try to update it during commit to the
io.delta.standalone.DeltaLog. The update is not guaranteed as there will be some compatibility checks performed.- mergeSchema
whether we should try to update table's schema with stream's schema in case those will not match. See
DeltaSinkBuilder#mergeSchemafor details.- returns
builder for
DeltaSink