Class KafkaIO.Read<K,V>
- java.lang.Object
-
- org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,org.apache.beam.sdk.values.PCollection<KafkaRecord<K,V>>>
-
- org.apache.beam.sdk.io.kafka.KafkaIO.Read<K,V>
-
- All Implemented Interfaces:
java.io.Serializable,org.apache.beam.sdk.transforms.display.HasDisplayData
- Enclosing class:
- KafkaIO
public abstract static class KafkaIO.Read<K,V> extends org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,org.apache.beam.sdk.values.PCollection<KafkaRecord<K,V>>>
APTransformto read from Kafka topics. SeeKafkaIOfor more information on usage and configuration.- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classKafkaIO.Read.ExternalExposesKafkaIO.TypedWithoutMetadataas an external transform for cross-language usage.static interfaceKafkaIO.Read.FakeFlinkPipelineOptions
-
Field Summary
Fields Modifier and Type Field Description static java.lang.Class<org.apache.beam.sdk.io.kafka.AutoValue_KafkaIO_Read>AUTOVALUE_CLASSstatic org.apache.beam.sdk.runners.PTransformOverrideKAFKA_READ_OVERRIDEAPTransformOverridefor runners to swapKafkaIO.Read.ReadFromKafkaViaSDFto legacy Kafka read if runners doesn't have a good support on executing unbounded Splittable DoFn.
-
Constructor Summary
Constructors Constructor Description Read()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Deprecated Methods Modifier and Type Method Description KafkaIO.Read<K,V>commitOffsetsInFinalize()Finalized offsets are committed to Kafka.org.apache.beam.sdk.values.PCollection<KafkaRecord<K,V>>expand(org.apache.beam.sdk.values.PBegin input)org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.Row>>externalWithMetadata()abstract @Nullable org.apache.beam.sdk.transforms.errorhandling.ErrorHandler<org.apache.beam.sdk.transforms.errorhandling.BadRecord,?>getBadRecordErrorHandler()abstract @Nullable CheckStopReadingFngetCheckStopReadingFn()abstract java.util.Map<java.lang.String,java.lang.Object>getConsumerConfig()abstract org.apache.beam.sdk.transforms.SerializableFunction<java.util.Map<java.lang.String,java.lang.Object>,org.apache.kafka.clients.consumer.Consumer<byte[],byte[]>>getConsumerFactoryFn()abstract longgetConsumerPollingTimeout()abstract @Nullable org.apache.beam.sdk.coders.Coder<K>getKeyCoder()abstract @Nullable DeserializerProvider<K>getKeyDeserializerProvider()abstract longgetMaxNumRecords()abstract @Nullable org.joda.time.DurationgetMaxReadTime()abstract @Nullable java.util.Map<java.lang.String,java.lang.Object>getOffsetConsumerConfig()abstract intgetRedistributeNumKeys()abstract @Nullable org.joda.time.InstantgetStartReadTime()abstract @Nullable org.joda.time.InstantgetStopReadTime()abstract TimestampPolicyFactory<K,V>getTimestampPolicyFactory()abstract @Nullable java.util.List<org.apache.kafka.common.TopicPartition>getTopicPartitions()abstract @Nullable java.util.regex.PatterngetTopicPattern()abstract @Nullable java.util.List<java.lang.String>getTopics()abstract @Nullable org.apache.beam.sdk.coders.Coder<V>getValueCoder()abstract @Nullable DeserializerProvider<V>getValueDeserializerProvider()abstract @Nullable org.joda.time.DurationgetWatchTopicPartitionDuration()abstract @Nullable org.apache.beam.sdk.transforms.SerializableFunction<KafkaRecord<K,V>,org.joda.time.Instant>getWatermarkFn()abstract booleanisAllowDuplicates()abstract booleanisCommitOffsetsInFinalizeEnabled()abstract booleanisDynamicRead()abstract booleanisRedistributed()voidpopulateDisplayData(org.apache.beam.sdk.transforms.display.DisplayData.Builder builder)KafkaIO.Read<K,V>updateConsumerProperties(java.util.Map<java.lang.String,java.lang.Object> configUpdates)Deprecated.as of version 2.13.KafkaIO.Read<K,V>withAllowDuplicates(java.lang.Boolean allowDuplicates)KafkaIO.Read<K,V>withBadRecordErrorHandler(org.apache.beam.sdk.transforms.errorhandling.ErrorHandler<org.apache.beam.sdk.transforms.errorhandling.BadRecord,?> badRecordErrorHandler)KafkaIO.Read<K,V>withBootstrapServers(java.lang.String bootstrapServers)Sets the bootstrap servers for the Kafka consumer.KafkaIO.Read<K,V>withCheckStopReadingFn(CheckStopReadingFn checkStopReadingFn)A customCheckStopReadingFnthat determines whether theReadFromKafkaDoFnshould stop reading from the givenTopicPartition.KafkaIO.Read<K,V>withCheckStopReadingFn(org.apache.beam.sdk.transforms.SerializableFunction<org.apache.kafka.common.TopicPartition,java.lang.Boolean> checkStopReadingFn)A customSerializableFunctionthat determines whether theReadFromKafkaDoFnshould stop reading from the givenTopicPartition.KafkaIO.Read<K,V>withConsumerConfigUpdates(java.util.Map<java.lang.String,java.lang.Object> configUpdates)Update configuration for the backend main consumer.KafkaIO.Read<K,V>withConsumerFactoryFn(org.apache.beam.sdk.transforms.SerializableFunction<java.util.Map<java.lang.String,java.lang.Object>,org.apache.kafka.clients.consumer.Consumer<byte[],byte[]>> consumerFactoryFn)A factory to create KafkaConsumerfrom consumer configuration.KafkaIO.Read<K,V>withConsumerPollingTimeout(long duration)Sets the timeout time in seconds for Kafka consumer polling request in theReadFromKafkaDoFn.KafkaIO.Read<K,V>withCreateTime(org.joda.time.Duration maxDelay)Sets the timestamps policy based onKafkaTimestampType.CREATE_TIMEtimestamp of the records.KafkaIO.Read<K,V>withDynamicRead(org.joda.time.Duration duration)Configure the KafkaIO to useWatchForKafkaTopicPartitionsto detect and emit any new availableTopicPartitionforReadFromKafkaDoFnto consume during pipeline execution time.KafkaIO.Read<K,V>withGCPApplicationDefaultCredentials()Creates and sets the Application Default Credentials for a Kafka consumer.KafkaIO.Read<K,V>withKeyDeserializer(java.lang.Class<? extends org.apache.kafka.common.serialization.Deserializer<K>> keyDeserializer)Sets a KafkaDeserializerto interpret key bytes read from Kafka.KafkaIO.Read<K,V>withKeyDeserializer(DeserializerProvider<K> deserializerProvider)KafkaIO.Read<K,V>withKeyDeserializerAndCoder(java.lang.Class<? extends org.apache.kafka.common.serialization.Deserializer<K>> keyDeserializer, org.apache.beam.sdk.coders.Coder<K> keyCoder)Sets a KafkaDeserializerfor interpreting key bytes read from Kafka along with aCoderfor helping the Beam runner materialize key objects at runtime if necessary.KafkaIO.Read<K,V>withKeyDeserializerProviderAndCoder(DeserializerProvider<K> deserializerProvider, org.apache.beam.sdk.coders.Coder<K> keyCoder)KafkaIO.Read<K,V>withLogAppendTime()KafkaIO.Read<K,V>withMaxNumRecords(long maxNumRecords)Similar toRead.Unbounded.withMaxNumRecords(long).KafkaIO.Read<K,V>withMaxReadTime(org.joda.time.Duration maxReadTime)Similar toRead.Unbounded.withMaxReadTime(Duration).KafkaIO.Read<K,V>withOffsetConsumerConfigOverrides(java.util.Map<java.lang.String,java.lang.Object> offsetConsumerConfig)Set additional configuration for the backend offset consumer.org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.KV<K,V>>>withoutMetadata()Returns aPTransformfor PCollection ofKV, dropping Kafka metatdata.KafkaIO.Read<K,V>withProcessingTime()KafkaIO.Read<K,V>withReadCommitted()Sets "isolation_level" to "read_committed" in Kafka consumer configuration.KafkaIO.Read<K,V>withRedistribute()Sets redistribute transform that hints to the runner to try to redistribute the work evenly.KafkaIO.Read<K,V>withRedistributeNumKeys(int redistributeNumKeys)KafkaIO.Read<K,V>withStartReadTime(org.joda.time.Instant startReadTime)Use timestamp to set up start offset.KafkaIO.Read<K,V>withStopReadTime(org.joda.time.Instant stopReadTime)Use timestamp to set up stop offset.KafkaIO.Read<K,V>withTimestampFn(org.apache.beam.sdk.transforms.SerializableFunction<org.apache.beam.sdk.values.KV<K,V>,org.joda.time.Instant> timestampFn)Deprecated.as of version 2.4.KafkaIO.Read<K,V>withTimestampFn2(org.apache.beam.sdk.transforms.SerializableFunction<KafkaRecord<K,V>,org.joda.time.Instant> timestampFn)Deprecated.as of version 2.4.KafkaIO.Read<K,V>withTimestampPolicyFactory(TimestampPolicyFactory<K,V> timestampPolicyFactory)Provide customTimestampPolicyFactoryto set event times and watermark for each partition.KafkaIO.Read<K,V>withTopic(java.lang.String topic)Sets the topic to read from.KafkaIO.Read<K,V>withTopicPartitions(java.util.List<org.apache.kafka.common.TopicPartition> topicPartitions)Sets a list of partitions to read from.KafkaIO.Read<K,V>withTopicPattern(java.lang.String topicPattern)Internally sets aPatternof topics to read from.KafkaIO.Read<K,V>withTopics(java.util.List<java.lang.String> topics)Sets a list of topics to read from.KafkaIO.Read<K,V>withValueDeserializer(java.lang.Class<? extends org.apache.kafka.common.serialization.Deserializer<V>> valueDeserializer)Sets a KafkaDeserializerto interpret value bytes read from Kafka.KafkaIO.Read<K,V>withValueDeserializer(DeserializerProvider<V> deserializerProvider)KafkaIO.Read<K,V>withValueDeserializerAndCoder(java.lang.Class<? extends org.apache.kafka.common.serialization.Deserializer<V>> valueDeserializer, org.apache.beam.sdk.coders.Coder<V> valueCoder)Sets a KafkaDeserializerfor interpreting value bytes read from Kafka along with aCoderfor helping the Beam runner materialize value objects at runtime if necessary.KafkaIO.Read<K,V>withValueDeserializerProviderAndCoder(DeserializerProvider<V> deserializerProvider, org.apache.beam.sdk.coders.Coder<V> valueCoder)KafkaIO.Read<K,V>withWatermarkFn(org.apache.beam.sdk.transforms.SerializableFunction<org.apache.beam.sdk.values.KV<K,V>,org.joda.time.Instant> watermarkFn)Deprecated.as of version 2.4.KafkaIO.Read<K,V>withWatermarkFn2(org.apache.beam.sdk.transforms.SerializableFunction<KafkaRecord<K,V>,org.joda.time.Instant> watermarkFn)Deprecated.as of version 2.4.-
Methods inherited from class org.apache.beam.sdk.transforms.PTransform
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, setDisplayData, setResourceHints, toString, validate, validate
-
-
-
-
Field Detail
-
AUTOVALUE_CLASS
public static final java.lang.Class<org.apache.beam.sdk.io.kafka.AutoValue_KafkaIO_Read> AUTOVALUE_CLASS
-
KAFKA_READ_OVERRIDE
@Internal public static final org.apache.beam.sdk.runners.PTransformOverride KAFKA_READ_OVERRIDE
APTransformOverridefor runners to swapKafkaIO.Read.ReadFromKafkaViaSDFto legacy Kafka read if runners doesn't have a good support on executing unbounded Splittable DoFn.
-
-
Method Detail
-
getConsumerConfig
@Pure public abstract java.util.Map<java.lang.String,java.lang.Object> getConsumerConfig()
-
getTopics
@Pure public abstract @Nullable java.util.List<java.lang.String> getTopics()
-
getTopicPartitions
@Pure public abstract @Nullable java.util.List<org.apache.kafka.common.TopicPartition> getTopicPartitions()
-
getTopicPattern
@Pure public abstract @Nullable java.util.regex.Pattern getTopicPattern()
-
getKeyCoder
@Pure public abstract @Nullable org.apache.beam.sdk.coders.Coder<K> getKeyCoder()
-
getValueCoder
@Pure public abstract @Nullable org.apache.beam.sdk.coders.Coder<V> getValueCoder()
-
getConsumerFactoryFn
@Pure public abstract org.apache.beam.sdk.transforms.SerializableFunction<java.util.Map<java.lang.String,java.lang.Object>,org.apache.kafka.clients.consumer.Consumer<byte[],byte[]>> getConsumerFactoryFn()
-
getWatermarkFn
@Pure public abstract @Nullable org.apache.beam.sdk.transforms.SerializableFunction<KafkaRecord<K,V>,org.joda.time.Instant> getWatermarkFn()
-
getMaxNumRecords
@Pure public abstract long getMaxNumRecords()
-
getMaxReadTime
@Pure public abstract @Nullable org.joda.time.Duration getMaxReadTime()
-
getStartReadTime
@Pure public abstract @Nullable org.joda.time.Instant getStartReadTime()
-
getStopReadTime
@Pure public abstract @Nullable org.joda.time.Instant getStopReadTime()
-
isCommitOffsetsInFinalizeEnabled
@Pure public abstract boolean isCommitOffsetsInFinalizeEnabled()
-
isDynamicRead
@Pure public abstract boolean isDynamicRead()
-
isRedistributed
@Pure public abstract boolean isRedistributed()
-
isAllowDuplicates
@Pure public abstract boolean isAllowDuplicates()
-
getRedistributeNumKeys
@Pure public abstract int getRedistributeNumKeys()
-
getWatchTopicPartitionDuration
@Pure public abstract @Nullable org.joda.time.Duration getWatchTopicPartitionDuration()
-
getTimestampPolicyFactory
@Pure public abstract TimestampPolicyFactory<K,V> getTimestampPolicyFactory()
-
getOffsetConsumerConfig
@Pure public abstract @Nullable java.util.Map<java.lang.String,java.lang.Object> getOffsetConsumerConfig()
-
getKeyDeserializerProvider
@Pure public abstract @Nullable DeserializerProvider<K> getKeyDeserializerProvider()
-
getValueDeserializerProvider
@Pure public abstract @Nullable DeserializerProvider<V> getValueDeserializerProvider()
-
getCheckStopReadingFn
@Pure public abstract @Nullable CheckStopReadingFn getCheckStopReadingFn()
-
getBadRecordErrorHandler
@Pure public abstract @Nullable org.apache.beam.sdk.transforms.errorhandling.ErrorHandler<org.apache.beam.sdk.transforms.errorhandling.BadRecord,?> getBadRecordErrorHandler()
-
getConsumerPollingTimeout
@Pure public abstract long getConsumerPollingTimeout()
-
withBootstrapServers
public KafkaIO.Read<K,V> withBootstrapServers(java.lang.String bootstrapServers)
Sets the bootstrap servers for the Kafka consumer.
-
withTopic
public KafkaIO.Read<K,V> withTopic(java.lang.String topic)
Sets the topic to read from.See
UnboundedSource.split(int, PipelineOptions)for description of how the partitions are distributed among the splits.
-
withTopics
public KafkaIO.Read<K,V> withTopics(java.util.List<java.lang.String> topics)
Sets a list of topics to read from. All the partitions from each of the topics are read.See
UnboundedSource.split(int, PipelineOptions)for description of how the partitions are distributed among the splits.
-
withTopicPartitions
public KafkaIO.Read<K,V> withTopicPartitions(java.util.List<org.apache.kafka.common.TopicPartition> topicPartitions)
Sets a list of partitions to read from. This allows reading only a subset of partitions for one or more topics when (if ever) needed.See
UnboundedSource.split(int, PipelineOptions)for description of how the partitions are distributed among the splits.
-
withRedistribute
public KafkaIO.Read<K,V> withRedistribute()
Sets redistribute transform that hints to the runner to try to redistribute the work evenly.
-
withAllowDuplicates
public KafkaIO.Read<K,V> withAllowDuplicates(java.lang.Boolean allowDuplicates)
-
withRedistributeNumKeys
public KafkaIO.Read<K,V> withRedistributeNumKeys(int redistributeNumKeys)
-
withTopicPattern
public KafkaIO.Read<K,V> withTopicPattern(java.lang.String topicPattern)
Internally sets aPatternof topics to read from. All the partitions from each of the matching topics are read.See
UnboundedSource.split(int, PipelineOptions)for description of how the partitions are distributed among the splits.
-
withKeyDeserializer
public KafkaIO.Read<K,V> withKeyDeserializer(java.lang.Class<? extends org.apache.kafka.common.serialization.Deserializer<K>> keyDeserializer)
Sets a KafkaDeserializerto interpret key bytes read from Kafka.In addition, Beam also needs a
Coderto serialize and deserialize key objects at runtime. KafkaIO tries to infer a coder for the key based on theDeserializerclass, however in case that fails, you can usewithKeyDeserializerAndCoder(Class, Coder)to provide the key coder explicitly.
-
withKeyDeserializerAndCoder
public KafkaIO.Read<K,V> withKeyDeserializerAndCoder(java.lang.Class<? extends org.apache.kafka.common.serialization.Deserializer<K>> keyDeserializer, org.apache.beam.sdk.coders.Coder<K> keyCoder)
Sets a KafkaDeserializerfor interpreting key bytes read from Kafka along with aCoderfor helping the Beam runner materialize key objects at runtime if necessary.Use this method only if your pipeline doesn't work with plain
withKeyDeserializer(Class).
-
withKeyDeserializer
public KafkaIO.Read<K,V> withKeyDeserializer(DeserializerProvider<K> deserializerProvider)
-
withKeyDeserializerProviderAndCoder
public KafkaIO.Read<K,V> withKeyDeserializerProviderAndCoder(DeserializerProvider<K> deserializerProvider, org.apache.beam.sdk.coders.Coder<K> keyCoder)
-
withValueDeserializer
public KafkaIO.Read<K,V> withValueDeserializer(java.lang.Class<? extends org.apache.kafka.common.serialization.Deserializer<V>> valueDeserializer)
Sets a KafkaDeserializerto interpret value bytes read from Kafka.In addition, Beam also needs a
Coderto serialize and deserialize value objects at runtime. KafkaIO tries to infer a coder for the value based on theDeserializerclass, however in case that fails, you can usewithValueDeserializerAndCoder(Class, Coder)to provide the value coder explicitly.
-
withValueDeserializerAndCoder
public KafkaIO.Read<K,V> withValueDeserializerAndCoder(java.lang.Class<? extends org.apache.kafka.common.serialization.Deserializer<V>> valueDeserializer, org.apache.beam.sdk.coders.Coder<V> valueCoder)
Sets a KafkaDeserializerfor interpreting value bytes read from Kafka along with aCoderfor helping the Beam runner materialize value objects at runtime if necessary.Use this method only if your pipeline doesn't work with plain
withValueDeserializer(Class).
-
withValueDeserializer
public KafkaIO.Read<K,V> withValueDeserializer(DeserializerProvider<V> deserializerProvider)
-
withValueDeserializerProviderAndCoder
public KafkaIO.Read<K,V> withValueDeserializerProviderAndCoder(DeserializerProvider<V> deserializerProvider, org.apache.beam.sdk.coders.Coder<V> valueCoder)
-
withConsumerFactoryFn
public KafkaIO.Read<K,V> withConsumerFactoryFn(org.apache.beam.sdk.transforms.SerializableFunction<java.util.Map<java.lang.String,java.lang.Object>,org.apache.kafka.clients.consumer.Consumer<byte[],byte[]>> consumerFactoryFn)
A factory to create KafkaConsumerfrom consumer configuration. This is useful for supporting another version of Kafka consumer. Default isKafkaConsumer.
-
updateConsumerProperties
@Deprecated public KafkaIO.Read<K,V> updateConsumerProperties(java.util.Map<java.lang.String,java.lang.Object> configUpdates)
Deprecated.as of version 2.13. UsewithConsumerConfigUpdates(Map)insteadUpdate consumer configuration with new properties.
-
withMaxNumRecords
public KafkaIO.Read<K,V> withMaxNumRecords(long maxNumRecords)
Similar toRead.Unbounded.withMaxNumRecords(long). Mainly used for tests and demo applications.
-
withStartReadTime
public KafkaIO.Read<K,V> withStartReadTime(org.joda.time.Instant startReadTime)
Use timestamp to set up start offset. It is only supported by Kafka Client 0.10.1.0 onwards and the message format version after 0.10.0.Note that this take priority over start offset configuration
ConsumerConfig.AUTO_OFFSET_RESET_CONFIGand any auto committed offsets.This results in hard failures in either of the following two cases : 1. If one or more partitions do not contain any messages with timestamp larger than or equal to desired timestamp. 2. If the message format version in a partition is before 0.10.0, i.e. the messages do not have timestamps.
-
withStopReadTime
public KafkaIO.Read<K,V> withStopReadTime(org.joda.time.Instant stopReadTime)
Use timestamp to set up stop offset. It is only supported by Kafka Client 0.10.1.0 onwards and the message format version after 0.10.0.This results in hard failures in either of the following two cases : 1. If one or more partitions do not contain any messages with timestamp larger than or equal to desired timestamp. 2. If the message format version in a partition is before 0.10.0, i.e. the messages do not have timestamps.
-
withMaxReadTime
public KafkaIO.Read<K,V> withMaxReadTime(org.joda.time.Duration maxReadTime)
Similar toRead.Unbounded.withMaxReadTime(Duration). Mainly used for tests and demo applications.
-
withLogAppendTime
public KafkaIO.Read<K,V> withLogAppendTime()
SetsTimestampPolicytoTimestampPolicyFactory.LogAppendTimePolicy. The policy assigns Kafka's log append time (server side ingestion time) to each record. The watermark for each Kafka partition is the timestamp of the last record read. If a partition is idle, the watermark advances to couple of seconds behind wall time. Every record consumed from Kafka is expected to have its timestamp type set to 'LOG_APPEND_TIME'.In Kafka, log append time needs to be enabled for each topic, and all the subsequent records wil have their timestamp set to log append time. If a record does not have its timestamp type set to 'LOG_APPEND_TIME' for any reason, it's timestamp is set to previous record timestamp or latest watermark, whichever is larger.
The watermark for the entire source is the oldest of each partition's watermark. If one of the readers falls behind possibly due to uneven distribution of records among Kafka partitions, it ends up holding the watermark for the entire source.
-
withProcessingTime
public KafkaIO.Read<K,V> withProcessingTime()
SetsTimestampPolicytoTimestampPolicyFactory.ProcessingTimePolicy. This is the default timestamp policy. It assigns processing time to each record. Specifically, this is the timestamp when the record becomes 'current' in the reader. The watermark aways advances to current time. If server side time (log append time) is enabled in Kafka,withLogAppendTime()is recommended over this.
-
withCreateTime
public KafkaIO.Read<K,V> withCreateTime(org.joda.time.Duration maxDelay)
Sets the timestamps policy based onKafkaTimestampType.CREATE_TIMEtimestamp of the records. It is an error if a record's timestamp type is notKafkaTimestampType.CREATE_TIME. The timestamps within a partition are expected to be roughly monotonically increasing with a cap on out of order delays (e.g. 'max delay' of 1 minute). The watermark at any time is '(Min(now(), Max(event timestamp so far)) - max delay)'. However, watermark is never set in future and capped to 'now - max delay'. In addition, watermark advanced to 'now - max delay' when a partition is idle.- Parameters:
maxDelay- For any record in the Kafka partition, the timestamp of any subsequent record is expected to be aftercurrent record timestamp - maxDelay.
-
withTimestampPolicyFactory
public KafkaIO.Read<K,V> withTimestampPolicyFactory(TimestampPolicyFactory<K,V> timestampPolicyFactory)
Provide customTimestampPolicyFactoryto set event times and watermark for each partition.TimestampPolicyFactory.createTimestampPolicy(TopicPartition, Optional)is invoked for each partition when the reader starts.
-
withTimestampFn2
@Deprecated public KafkaIO.Read<K,V> withTimestampFn2(org.apache.beam.sdk.transforms.SerializableFunction<KafkaRecord<K,V>,org.joda.time.Instant> timestampFn)
Deprecated.as of version 2.4. UsewithTimestampPolicyFactory(TimestampPolicyFactory)instead.A function to assign a timestamp to a record. Default is processing timestamp.
-
withWatermarkFn2
@Deprecated public KafkaIO.Read<K,V> withWatermarkFn2(org.apache.beam.sdk.transforms.SerializableFunction<KafkaRecord<K,V>,org.joda.time.Instant> watermarkFn)
Deprecated.as of version 2.4. UsewithTimestampPolicyFactory(TimestampPolicyFactory)instead.A function to calculate watermark after a record. Default is last record timestamp.- See Also:
withTimestampFn(SerializableFunction)
-
withTimestampFn
@Deprecated public KafkaIO.Read<K,V> withTimestampFn(org.apache.beam.sdk.transforms.SerializableFunction<org.apache.beam.sdk.values.KV<K,V>,org.joda.time.Instant> timestampFn)
Deprecated.as of version 2.4. UsewithTimestampPolicyFactory(TimestampPolicyFactory)instead.A function to assign a timestamp to a record. Default is processing timestamp.
-
withWatermarkFn
@Deprecated public KafkaIO.Read<K,V> withWatermarkFn(org.apache.beam.sdk.transforms.SerializableFunction<org.apache.beam.sdk.values.KV<K,V>,org.joda.time.Instant> watermarkFn)
Deprecated.as of version 2.4. UsewithTimestampPolicyFactory(TimestampPolicyFactory)instead.A function to calculate watermark after a record. Default is last record timestamp.- See Also:
withTimestampFn(SerializableFunction)
-
withReadCommitted
public KafkaIO.Read<K,V> withReadCommitted()
Sets "isolation_level" to "read_committed" in Kafka consumer configuration. This is ensures that the consumer does not read uncommitted messages. Kafka version 0.11 introduced transactional writes. Applications requiring end-to-end exactly-once semantics should only read committed messages. See JavaDoc forKafkaConsumerfor more description.
-
commitOffsetsInFinalize
public KafkaIO.Read<K,V> commitOffsetsInFinalize()
Finalized offsets are committed to Kafka. SeeUnboundedSource.CheckpointMark.finalizeCheckpoint(). It helps with minimizing gaps or duplicate processing of records while restarting a pipeline from scratch. But it does not provide hard processing guarantees. There could be a short delay to commit afterUnboundedSource.CheckpointMark.finalizeCheckpoint()is invoked, as reader might be blocked on reading from Kafka. Note that it is independent of 'AUTO_COMMIT' Kafka consumer configuration. Usually either this or AUTO_COMMIT in Kafka consumer is enabled, but not both.
-
withDynamicRead
public KafkaIO.Read<K,V> withDynamicRead(org.joda.time.Duration duration)
Configure the KafkaIO to useWatchForKafkaTopicPartitionsto detect and emit any new availableTopicPartitionforReadFromKafkaDoFnto consume during pipeline execution time. The KafkaIO will regularly check the availability based on the given duration. If the duration is not specified asnull, the default duration is 1 hour.
-
withOffsetConsumerConfigOverrides
public KafkaIO.Read<K,V> withOffsetConsumerConfigOverrides(java.util.Map<java.lang.String,java.lang.Object> offsetConsumerConfig)
Set additional configuration for the backend offset consumer. It may be required for a secured Kafka cluster, especially when you see similar WARN log message 'exception while fetching latest offset for partition {}. will be retried'.In
KafkaIO.read(), there're two consumers running in the backend actually:
1. the main consumer, which reads data from kafka;
2. the secondary offset consumer, which is used to estimate backlog, by fetching latest offset;
By default, offset consumer inherits the configuration from main consumer, with an auto-generated
ConsumerConfig.GROUP_ID_CONFIG. This may not work in a secured Kafka which requires more configurations.
-
withConsumerConfigUpdates
public KafkaIO.Read<K,V> withConsumerConfigUpdates(java.util.Map<java.lang.String,java.lang.Object> configUpdates)
Update configuration for the backend main consumer. Note that the default consumer properties will not be completely overridden. This method only updates the value which has the same key.In
KafkaIO.read(), there're two consumers running in the backend actually:
1. the main consumer, which reads data from kafka;
2. the secondary offset consumer, which is used to estimate backlog, by fetching latest offset;
By default, main consumer uses the configuration from
KafkaIOUtils.DEFAULT_CONSUMER_PROPERTIES.
-
withCheckStopReadingFn
public KafkaIO.Read<K,V> withCheckStopReadingFn(CheckStopReadingFn checkStopReadingFn)
A customCheckStopReadingFnthat determines whether theReadFromKafkaDoFnshould stop reading from the givenTopicPartition.
-
withCheckStopReadingFn
public KafkaIO.Read<K,V> withCheckStopReadingFn(org.apache.beam.sdk.transforms.SerializableFunction<org.apache.kafka.common.TopicPartition,java.lang.Boolean> checkStopReadingFn)
A customSerializableFunctionthat determines whether theReadFromKafkaDoFnshould stop reading from the givenTopicPartition.
-
withBadRecordErrorHandler
public KafkaIO.Read<K,V> withBadRecordErrorHandler(org.apache.beam.sdk.transforms.errorhandling.ErrorHandler<org.apache.beam.sdk.transforms.errorhandling.BadRecord,?> badRecordErrorHandler)
-
withConsumerPollingTimeout
public KafkaIO.Read<K,V> withConsumerPollingTimeout(long duration)
Sets the timeout time in seconds for Kafka consumer polling request in theReadFromKafkaDoFn. A lower timeout optimizes for latency. Increase the timeout if the consumer is not fetching any records. The default is 2 seconds.
-
withGCPApplicationDefaultCredentials
public KafkaIO.Read<K,V> withGCPApplicationDefaultCredentials()
Creates and sets the Application Default Credentials for a Kafka consumer. This allows the consumer to be authenticated with a Google Kafka Server using OAuth.
-
withoutMetadata
public org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.KV<K,V>>> withoutMetadata()
Returns aPTransformfor PCollection ofKV, dropping Kafka metatdata.
-
externalWithMetadata
public org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.Row>> externalWithMetadata()
-
expand
public org.apache.beam.sdk.values.PCollection<KafkaRecord<K,V>> expand(org.apache.beam.sdk.values.PBegin input)
- Specified by:
expandin classorg.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,org.apache.beam.sdk.values.PCollection<KafkaRecord<K,V>>>
-
populateDisplayData
public void populateDisplayData(org.apache.beam.sdk.transforms.display.DisplayData.Builder builder)
- Specified by:
populateDisplayDatain interfaceorg.apache.beam.sdk.transforms.display.HasDisplayData- Overrides:
populateDisplayDatain classorg.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,org.apache.beam.sdk.values.PCollection<KafkaRecord<K,V>>>
-
-