Class KafkaToPubsub


  • public class KafkaToPubsub
    extends java.lang.Object
    The KafkaToPubsub pipeline is a streaming pipeline which ingests data in JSON format from Kafka, and outputs the resulting records to PubSub. Input topics, output topic, Bootstrap servers are specified by the user as template parameters.
    Kafka may be configured with SASL/SCRAM security mechanism, in this case a Vault secret storage with credentials should be provided. URL to credentials and Vault token are specified by the user as template parameters.

    Pipeline Requirements

    • Kafka Bootstrap Server(s).
    • Kafka Topic(s) exists.
    • The PubSub output topic exists.
    • (Optional) An existing HashiCorp Vault secret storage
    • (Optional) A configured secure SSL connection for Kafka

    Example Usage

     # Gradle preparation
    
     To run this example your build.gradle file should contain the following task
     to execute the pipeline:
     
     task execute (type:JavaExec) {
         mainClass = System.getProperty("mainClass")
         classpath = sourceSets.main.runtimeClasspath
         systemProperties System.getProperties()
         args System.getProperty("exec.args", "").split()
     }
     
    
     This task allows to run the pipeline via the following command:
     
     gradle clean execute -DmainClass=org.apache.beam.examples.complete.kafkatopubsub.KafkaToPubsub \
          -Dexec.args="--<argument>=<value> --<argument>=<value>"
     
    
     # Running the pipeline
     To execute this pipeline, specify the parameters:
    
     - Kafka Bootstrap servers
     - Kafka input topics
     - Pub/Sub output topic
     - Output format
    
     in the following format:
     
     --bootstrapServers=host:port \
     --inputTopics=your-input-topic \
     --outputTopic=projects/your-project-id/topics/your-topic-pame \
     --outputFormat=AVRO|PUBSUB
     
    
     Optionally, to retrieve Kafka credentials for SASL/SCRAM,
     specify a URL to the credentials in HashiCorp Vault and the vault access token:
     
     --secretStoreUrl=http(s)://host:port/path/to/credentials
     --vaultToken=your-token
     
    
     Optionally, to configure secure SSL connection between the Beam pipeline and Kafka,
     specify the parameters:
     - A path to a truststore file (it can be a local path or a GCS path, which should start with `gs://`)
     - A path to a keystore file (it can be a local path or a GCS path, which should start with `gs://`)
     - Truststore password
     - Keystore password
     - Key password
     
     --truststorePath=path/to/kafka.truststore.jks
     --keystorePath=path/to/kafka.keystore.jks
     --truststorePassword=your-truststore-password
     --keystorePassword=your-keystore-password
     --keyPassword=your-key-password
     
     By default this will run the pipeline locally with the DirectRunner. To change the runner, specify:
     
     --runner=YOUR_SELECTED_RUNNER
     
     

    Example Avro usage

     This template contains an example Class to deserialize AVRO from Kafka and serialize it to AVRO in Pub/Sub.
    
     To use this example in the specific case, follow the few steps:
     
    • Create your own class to describe AVRO schema. As an example use AvroDataClass. Just define necessary fields.
    • Create your own Avro Deserializer class. As an example use AvroDataClassKafkaAvroDeserializer. Just rename it, and put your own Schema class as the necessary types.
    • Modify the FormatTransform. Put your Schema class and Deserializer to the related parameter.
    • Modify write step in the KafkaToPubsub by put your Schema class to "writeAvrosToPubSub" step.
    • Constructor Summary

      Constructors 
      Constructor Description
      KafkaToPubsub()  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static void main​(java.lang.String[] args)
      Main entry point for pipeline execution.
      static org.apache.beam.sdk.PipelineResult run​(org.apache.beam.sdk.Pipeline pipeline, KafkaToPubsubOptions options)
      Runs a pipeline which reads message from Kafka and writes it to GCS.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • KafkaToPubsub

        public KafkaToPubsub()
    • Method Detail

      • main

        public static void main​(java.lang.String[] args)
        Main entry point for pipeline execution.
        Parameters:
        args - Command line arguments to the pipeline.
      • run

        public static org.apache.beam.sdk.PipelineResult run​(org.apache.beam.sdk.Pipeline pipeline,
                                                             KafkaToPubsubOptions options)
        Runs a pipeline which reads message from Kafka and writes it to GCS.
        Parameters:
        options - arguments to the pipeline