Class KafkaIO.WriteRecords<K,V>
- java.lang.Object
-
- org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PCollection<org.apache.kafka.clients.producer.ProducerRecord<K,V>>,org.apache.beam.sdk.values.PDone>
-
- org.apache.beam.sdk.io.kafka.KafkaIO.WriteRecords<K,V>
-
- All Implemented Interfaces:
java.io.Serializable,org.apache.beam.sdk.transforms.display.HasDisplayData
- Enclosing class:
- KafkaIO
public abstract static class KafkaIO.WriteRecords<K,V> extends org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PCollection<org.apache.kafka.clients.producer.ProducerRecord<K,V>>,org.apache.beam.sdk.values.PDone>APTransformto write to a Kafka topic with ProducerRecord's. SeeKafkaIOfor more information on usage and configuration.- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description WriteRecords()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Deprecated Methods Modifier and Type Method Description org.apache.beam.sdk.values.PDoneexpand(org.apache.beam.sdk.values.PCollection<org.apache.kafka.clients.producer.ProducerRecord<K,V>> input)abstract org.apache.beam.sdk.transforms.errorhandling.ErrorHandler<org.apache.beam.sdk.transforms.errorhandling.BadRecord,?>getBadRecordErrorHandler()abstract org.apache.beam.sdk.transforms.errorhandling.BadRecordRoutergetBadRecordRouter()abstract @Nullable org.apache.beam.sdk.transforms.SerializableFunction<java.util.Map<java.lang.String,java.lang.Object>,? extends org.apache.kafka.clients.consumer.Consumer<?,?>>getConsumerFactoryFn()abstract @Nullable java.lang.Class<? extends org.apache.kafka.common.serialization.Serializer<K>>getKeySerializer()abstract intgetNumShards()abstract java.util.Map<java.lang.String,java.lang.Object>getProducerConfig()abstract @Nullable org.apache.beam.sdk.transforms.SerializableFunction<java.util.Map<java.lang.String,java.lang.Object>,org.apache.kafka.clients.producer.Producer<K,V>>getProducerFactoryFn()abstract @Nullable KafkaPublishTimestampFunction<org.apache.kafka.clients.producer.ProducerRecord<K,V>>getPublishTimestampFunction()abstract @Nullable java.lang.StringgetSinkGroupId()abstract @Nullable java.lang.StringgetTopic()abstract @Nullable java.lang.Class<? extends org.apache.kafka.common.serialization.Serializer<V>>getValueSerializer()abstract booleanisEOS()voidpopulateDisplayData(org.apache.beam.sdk.transforms.display.DisplayData.Builder builder)KafkaIO.WriteRecords<K,V>updateProducerProperties(java.util.Map<java.lang.String,java.lang.Object> configUpdates)Deprecated.as of version 2.13.voidvalidate(@Nullable org.apache.beam.sdk.options.PipelineOptions options)KafkaIO.WriteRecords<K,V>withBadRecordErrorHandler(org.apache.beam.sdk.transforms.errorhandling.ErrorHandler<org.apache.beam.sdk.transforms.errorhandling.BadRecord,?> badRecordErrorHandler)KafkaIO.WriteRecords<K,V>withBootstrapServers(java.lang.String bootstrapServers)Returns a newKafkaIO.Writetransform with Kafka producer pointing tobootstrapServers.KafkaIO.WriteRecords<K,V>withConsumerFactoryFn(org.apache.beam.sdk.transforms.SerializableFunction<java.util.Map<java.lang.String,java.lang.Object>,? extends org.apache.kafka.clients.consumer.Consumer<?,?>> consumerFactoryFn)When exactly-once semantics are enabled (seewithEOS(int, String)), the sink needs to fetch previously stored state with Kafka topic.KafkaIO.WriteRecords<K,V>withEOS(int numShards, java.lang.String sinkGroupId)Provides exactly-once semantics while writing to Kafka, which enables applications with end-to-end exactly-once guarantees on top of exactly-once semantics within Beam pipelines.KafkaIO.WriteRecords<K,V>withInputTimestamp()The timestamp for each record being published is set to timestamp of the element in the pipeline.KafkaIO.WriteRecords<K,V>withKeySerializer(java.lang.Class<? extends org.apache.kafka.common.serialization.Serializer<K>> keySerializer)Sets aSerializerfor serializing key (if any) to bytes.KafkaIO.WriteRecords<K,V>withProducerConfigUpdates(java.util.Map<java.lang.String,java.lang.Object> configUpdates)Update configuration for the producer.KafkaIO.WriteRecords<K,V>withProducerFactoryFn(org.apache.beam.sdk.transforms.SerializableFunction<java.util.Map<java.lang.String,java.lang.Object>,org.apache.kafka.clients.producer.Producer<K,V>> producerFactoryFn)Sets a custom function to create Kafka producer.KafkaIO.WriteRecords<K,V>withPublishTimestampFunction(KafkaPublishTimestampFunction<org.apache.kafka.clients.producer.ProducerRecord<K,V>> timestampFunction)Deprecated.useProducerRecordsto set publish timestamp.KafkaIO.WriteRecords<K,V>withTopic(java.lang.String topic)Sets the default Kafka topic to write to.KafkaIO.WriteRecords<K,V>withValueSerializer(java.lang.Class<? extends org.apache.kafka.common.serialization.Serializer<V>> valueSerializer)Sets aSerializerfor serializing value to bytes.-
Methods inherited from class org.apache.beam.sdk.transforms.PTransform
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, setDisplayData, setResourceHints, toString, validate
-
-
-
-
Method Detail
-
getTopic
@Pure public abstract @Nullable java.lang.String getTopic()
-
getProducerConfig
@Pure public abstract java.util.Map<java.lang.String,java.lang.Object> getProducerConfig()
-
getProducerFactoryFn
@Pure public abstract @Nullable org.apache.beam.sdk.transforms.SerializableFunction<java.util.Map<java.lang.String,java.lang.Object>,org.apache.kafka.clients.producer.Producer<K,V>> getProducerFactoryFn()
-
getKeySerializer
@Pure public abstract @Nullable java.lang.Class<? extends org.apache.kafka.common.serialization.Serializer<K>> getKeySerializer()
-
getValueSerializer
@Pure public abstract @Nullable java.lang.Class<? extends org.apache.kafka.common.serialization.Serializer<V>> getValueSerializer()
-
getPublishTimestampFunction
@Pure public abstract @Nullable KafkaPublishTimestampFunction<org.apache.kafka.clients.producer.ProducerRecord<K,V>> getPublishTimestampFunction()
-
isEOS
@Pure public abstract boolean isEOS()
-
getSinkGroupId
@Pure public abstract @Nullable java.lang.String getSinkGroupId()
-
getNumShards
@Pure public abstract int getNumShards()
-
getConsumerFactoryFn
@Pure public abstract @Nullable org.apache.beam.sdk.transforms.SerializableFunction<java.util.Map<java.lang.String,java.lang.Object>,? extends org.apache.kafka.clients.consumer.Consumer<?,?>> getConsumerFactoryFn()
-
getBadRecordRouter
@Pure public abstract org.apache.beam.sdk.transforms.errorhandling.BadRecordRouter getBadRecordRouter()
-
getBadRecordErrorHandler
@Pure public abstract org.apache.beam.sdk.transforms.errorhandling.ErrorHandler<org.apache.beam.sdk.transforms.errorhandling.BadRecord,?> getBadRecordErrorHandler()
-
withBootstrapServers
public KafkaIO.WriteRecords<K,V> withBootstrapServers(java.lang.String bootstrapServers)
Returns a newKafkaIO.Writetransform with Kafka producer pointing tobootstrapServers.
-
withTopic
public KafkaIO.WriteRecords<K,V> withTopic(java.lang.String topic)
Sets the default Kafka topic to write to. UseProducerRecordsto set topic name per published record.
-
withKeySerializer
public KafkaIO.WriteRecords<K,V> withKeySerializer(java.lang.Class<? extends org.apache.kafka.common.serialization.Serializer<K>> keySerializer)
Sets aSerializerfor serializing key (if any) to bytes.A key is optional while writing to Kafka. Note when a key is set, its hash is used to determine partition in Kafka (see
ProducerRecordfor more details).
-
withValueSerializer
public KafkaIO.WriteRecords<K,V> withValueSerializer(java.lang.Class<? extends org.apache.kafka.common.serialization.Serializer<V>> valueSerializer)
Sets aSerializerfor serializing value to bytes.
-
updateProducerProperties
@Deprecated public KafkaIO.WriteRecords<K,V> updateProducerProperties(java.util.Map<java.lang.String,java.lang.Object> configUpdates)
Deprecated.as of version 2.13. UsewithProducerConfigUpdates(Map)instead.Adds the given producer properties, overriding old values of properties with the same key.
-
withProducerConfigUpdates
public KafkaIO.WriteRecords<K,V> withProducerConfigUpdates(java.util.Map<java.lang.String,java.lang.Object> configUpdates)
Update configuration for the producer. Note that the default producer properties will not be completely overridden. This method only updates the value which has the same key.By default, the producer uses the configuration from
DEFAULT_PRODUCER_PROPERTIES.
-
withProducerFactoryFn
public KafkaIO.WriteRecords<K,V> withProducerFactoryFn(org.apache.beam.sdk.transforms.SerializableFunction<java.util.Map<java.lang.String,java.lang.Object>,org.apache.kafka.clients.producer.Producer<K,V>> producerFactoryFn)
Sets a custom function to create Kafka producer. Primarily used for tests. Default isKafkaProducer
-
withInputTimestamp
public KafkaIO.WriteRecords<K,V> withInputTimestamp()
The timestamp for each record being published is set to timestamp of the element in the pipeline. This is equivalent towithPublishTimestampFunction((e, ts) -> ts).
NOTE: Kafka's retention policies are based on message timestamps. If the pipeline is processing messages from the past, they might be deleted immediately by Kafka after being published if the timestamps are older than Kafka cluster'slog.retention.hours.
-
withPublishTimestampFunction
@Deprecated public KafkaIO.WriteRecords<K,V> withPublishTimestampFunction(KafkaPublishTimestampFunction<org.apache.kafka.clients.producer.ProducerRecord<K,V>> timestampFunction)
Deprecated.useProducerRecordsto set publish timestamp.A function to provide timestamp for records being published.
NOTE: Kafka's retention policies are based on message timestamps. If the pipeline is processing messages from the past, they might be deleted immediately by Kafka after being published if the timestamps are older than Kafka cluster'slog.retention.hours.
-
withEOS
public KafkaIO.WriteRecords<K,V> withEOS(int numShards, java.lang.String sinkGroupId)
Provides exactly-once semantics while writing to Kafka, which enables applications with end-to-end exactly-once guarantees on top of exactly-once semantics within Beam pipelines. It ensures that records written to sink are committed on Kafka exactly once, even in the case of retries during pipeline execution even when some processing is retried. Retries typically occur when workers restart (as in failure recovery), or when the work is redistributed (as in an autoscaling event).Beam runners typically provide exactly-once semantics for results of a pipeline, but not for side effects from user code in transform. If a transform such as Kafka sink writes to an external system, those writes might occur more than once. When EOS is enabled here, the sink transform ties checkpointing semantics in compatible Beam runners and transactions in Kafka (version 0.11+) to ensure a record is written only once. As the implementation relies on runners checkpoint semantics, not all the runners are compatible. The sink throws an exception during initialization if the runner is not explicitly allowed. The Dataflow, Flink, and Spark runners are compatible.
Note on performance: Exactly-once sink involves two shuffles of the records. In addition to cost of shuffling the records among workers, the records go through 2 serialization-deserialization cycles. Depending on volume and cost of serialization, the CPU cost might be noticeable. The CPU cost can be reduced by writing byte arrays (i.e. serializing them to byte before writing to Kafka sink).
- Parameters:
numShards- Sets sink parallelism. The state metadata stored on Kafka is stored across this many virtual partitions usingsinkGroupId. A good rule of thumb is to set this to be around number of partitions in Kafka topic.sinkGroupId- The group id used to store small amount of state as metadata on Kafka. It is similar to consumer group id used with aKafkaConsumer. Each job should use a unique group id so that restarts/updates of job preserve the state to ensure exactly-once semantics. The state is committed atomically with sink transactions on Kafka. SeeKafkaProducer.sendOffsetsToTransaction(Map, String)for more information. The sink performs multiple sanity checks during initialization to catch common mistakes so that it does not end up using state that does not seem to be written by the same job.
-
withConsumerFactoryFn
public KafkaIO.WriteRecords<K,V> withConsumerFactoryFn(org.apache.beam.sdk.transforms.SerializableFunction<java.util.Map<java.lang.String,java.lang.Object>,? extends org.apache.kafka.clients.consumer.Consumer<?,?>> consumerFactoryFn)
When exactly-once semantics are enabled (seewithEOS(int, String)), the sink needs to fetch previously stored state with Kafka topic. Fetching the metadata requires a consumer. Similar toKafkaIO.Read.withConsumerFactoryFn(SerializableFunction), a factory function can be supplied if required in a specific case. The default isKafkaConsumer.
-
withBadRecordErrorHandler
public KafkaIO.WriteRecords<K,V> withBadRecordErrorHandler(org.apache.beam.sdk.transforms.errorhandling.ErrorHandler<org.apache.beam.sdk.transforms.errorhandling.BadRecord,?> badRecordErrorHandler)
-
expand
public org.apache.beam.sdk.values.PDone expand(org.apache.beam.sdk.values.PCollection<org.apache.kafka.clients.producer.ProducerRecord<K,V>> input)
-
validate
public void validate(@Nullable org.apache.beam.sdk.options.PipelineOptions options)
-
populateDisplayData
public void populateDisplayData(org.apache.beam.sdk.transforms.display.DisplayData.Builder builder)
- Specified by:
populateDisplayDatain interfaceorg.apache.beam.sdk.transforms.display.HasDisplayData- Overrides:
populateDisplayDatain classorg.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PCollection<org.apache.kafka.clients.producer.ProducerRecord<K,V>>,org.apache.beam.sdk.values.PDone>
-
-