Class KafkaIO.ReadSourceDescriptors<K,​V>

  • All Implemented Interfaces:
    java.io.Serializable, org.apache.beam.sdk.transforms.display.HasDisplayData
    Enclosing class:
    KafkaIO

    public abstract static class KafkaIO.ReadSourceDescriptors<K,​V>
    extends org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PCollection<KafkaSourceDescriptor>,​org.apache.beam.sdk.values.PCollection<KafkaRecord<K,​V>>>
    A PTransform to read from KafkaSourceDescriptor. See KafkaIO for more information on usage and configuration. See ReadFromKafkaDoFn for more implementation details.

    During expansion, if isCommitOffsetEnabled() is true, the transform will expand to:

    
     PCollection<KafkaSourceDescriptor> --> ParDo(ReadFromKafkaDoFn<KafkaSourceDescriptor, KV<KafkaSourceDescriptor, KafkaRecord>>) --> Map(output KafkaRecord)
                                                                                                              |
                                                                                                              --> KafkaCommitOffset
     
    . Note that this expansion is not supported when running with x-lang on Dataflow.
    See Also:
    Serialized Form
    • Constructor Detail

      • ReadSourceDescriptors

        public ReadSourceDescriptors()
    • Method Detail

      • withBootstrapServers

        public KafkaIO.ReadSourceDescriptors<K,​V> withBootstrapServers​(java.lang.String bootstrapServers)
        Sets the bootstrap servers to use for the Kafka consumer if unspecified via KafkaSourceDescriptor#getBootStrapServers()}.
      • withKeyDeserializer

        public KafkaIO.ReadSourceDescriptors<K,​V> withKeyDeserializer​(java.lang.Class<? extends org.apache.kafka.common.serialization.Deserializer<K>> keyDeserializer)
        Sets a Kafka Deserializer to interpret key bytes read from Kafka.

        In addition, Beam also needs a Coder to serialize and deserialize key objects at runtime. KafkaIO tries to infer a coder for the key based on the Deserializer class, however in case that fails, you can use withKeyDeserializerAndCoder(Class, Coder) to provide the key coder explicitly.

      • withValueDeserializer

        public KafkaIO.ReadSourceDescriptors<K,​V> withValueDeserializer​(java.lang.Class<? extends org.apache.kafka.common.serialization.Deserializer<V>> valueDeserializer)
        Sets a Kafka Deserializer to interpret value bytes read from Kafka.

        In addition, Beam also needs a Coder to serialize and deserialize value objects at runtime. KafkaIO tries to infer a coder for the value based on the Deserializer class, however in case that fails, you can use withValueDeserializerAndCoder(Class, Coder) to provide the value coder explicitly.

      • withKeyDeserializerAndCoder

        public KafkaIO.ReadSourceDescriptors<K,​V> withKeyDeserializerAndCoder​(java.lang.Class<? extends org.apache.kafka.common.serialization.Deserializer<K>> keyDeserializer,
                                                                                    org.apache.beam.sdk.coders.Coder<K> keyCoder)
        Sets a Kafka Deserializer for interpreting key bytes read from Kafka along with a Coder for helping the Beam runner materialize key objects at runtime if necessary.

        Use this method to override the coder inference performed within withKeyDeserializer(Class).

      • withValueDeserializerAndCoder

        public KafkaIO.ReadSourceDescriptors<K,​V> withValueDeserializerAndCoder​(java.lang.Class<? extends org.apache.kafka.common.serialization.Deserializer<V>> valueDeserializer,
                                                                                      org.apache.beam.sdk.coders.Coder<V> valueCoder)
        Sets a Kafka Deserializer for interpreting value bytes read from Kafka along with a Coder for helping the Beam runner materialize value objects at runtime if necessary.

        Use this method to override the coder inference performed within withValueDeserializer(Class).

      • withConsumerFactoryFn

        public KafkaIO.ReadSourceDescriptors<K,​V> withConsumerFactoryFn​(org.apache.beam.sdk.transforms.SerializableFunction<java.util.Map<java.lang.String,​java.lang.Object>,​org.apache.kafka.clients.consumer.Consumer<byte[],​byte[]>> consumerFactoryFn)
        A factory to create Kafka Consumer from consumer configuration. This is useful for supporting another version of Kafka consumer. Default is KafkaConsumer.
      • withCheckStopReadingFn

        public KafkaIO.ReadSourceDescriptors<K,​V> withCheckStopReadingFn​(@Nullable org.apache.beam.sdk.transforms.SerializableFunction<org.apache.kafka.common.TopicPartition,​java.lang.Boolean> checkStopReadingFn)
        A custom SerializableFunction that determines whether the ReadFromKafkaDoFn should stop reading from the given TopicPartition.
      • withCreatWatermarkEstimatorFn

        public KafkaIO.ReadSourceDescriptors<K,​V> withCreatWatermarkEstimatorFn​(org.apache.beam.sdk.transforms.SerializableFunction<org.joda.time.Instant,​org.apache.beam.sdk.transforms.splittabledofn.WatermarkEstimator<org.joda.time.Instant>> fn)
        A function to create a WatermarkEstimator. The default value is WatermarkEstimators.MonotonicallyIncreasing.
      • withWallTimeWatermarkEstimator

        public KafkaIO.ReadSourceDescriptors<K,​V> withWallTimeWatermarkEstimator()
        Use the WatermarkEstimators.WallTime as the watermark estimator.
      • withMonotonicallyIncreasingWatermarkEstimator

        public KafkaIO.ReadSourceDescriptors<K,​V> withMonotonicallyIncreasingWatermarkEstimator()
        Use the WatermarkEstimators.MonotonicallyIncreasing as the watermark estimator.
      • withManualWatermarkEstimator

        public KafkaIO.ReadSourceDescriptors<K,​V> withManualWatermarkEstimator()
        Use the WatermarkEstimators.Manual as the watermark estimator.
      • withReadCommitted

        public KafkaIO.ReadSourceDescriptors<K,​V> withReadCommitted()
        Sets "isolation_level" to "read_committed" in Kafka consumer configuration. This ensures that the consumer does not read uncommitted messages. Kafka version 0.11 introduced transactional writes. Applications requiring end-to-end exactly-once semantics should only read committed messages. See JavaDoc for KafkaConsumer for more description.
      • withOffsetConsumerConfigOverrides

        public KafkaIO.ReadSourceDescriptors<K,​V> withOffsetConsumerConfigOverrides​(@Nullable java.util.Map<java.lang.String,​java.lang.Object> offsetConsumerConfig)
        Set additional configuration for the offset consumer. It may be required for a secured Kafka cluster, especially when you see similar WARN log message exception while fetching latest offset for partition {}. will be retried.

        In ReadFromKafkaDoFn, there are two consumers running in the backend:

        1. the main consumer which reads data from kafka.
        2. the secondary offset consumer which is used to estimate the backlog by fetching the latest offset.

        By default, offset consumer inherits the configuration from main consumer, with an auto-generated ConsumerConfig.GROUP_ID_CONFIG. This may not work in a secured Kafka which requires additional configuration.

        See withConsumerConfigUpdates(java.util.Map<java.lang.String, java.lang.Object>) for configuring the main consumer.

      • withConsumerConfigOverrides

        public KafkaIO.ReadSourceDescriptors<K,​V> withConsumerConfigOverrides​(java.util.Map<java.lang.String,​java.lang.Object> consumerConfig)
        Replaces the configuration for the main consumer.

        In ReadFromKafkaDoFn, there are two consumers running in the backend:

        1. the main consumer which reads data from kafka.
        2. the secondary offset consumer which is used to estimate the backlog by fetching the latest offset.

        By default, main consumer uses the configuration from KafkaIOUtils.DEFAULT_CONSUMER_PROPERTIES.

        See withConsumerConfigUpdates(java.util.Map<java.lang.String, java.lang.Object>) for updating the configuration instead of overriding it.

      • withBadRecordErrorHandler

        public KafkaIO.ReadSourceDescriptors<K,​V> withBadRecordErrorHandler​(org.apache.beam.sdk.transforms.errorhandling.ErrorHandler<org.apache.beam.sdk.transforms.errorhandling.BadRecord,​?> errorHandler)
      • withConsumerPollingTimeout

        public KafkaIO.ReadSourceDescriptors<K,​V> withConsumerPollingTimeout​(long duration)
        Sets the timeout time in seconds for Kafka consumer polling request in the ReadFromKafkaDoFn. A lower timeout optimizes for latency. Increase the timeout if the consumer is not fetching any records. The default is 2 seconds.
      • expand

        public org.apache.beam.sdk.values.PCollection<KafkaRecord<K,​V>> expand​(org.apache.beam.sdk.values.PCollection<KafkaSourceDescriptor> input)
        Specified by:
        expand in class org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PCollection<KafkaSourceDescriptor>,​org.apache.beam.sdk.values.PCollection<KafkaRecord<K,​V>>>