Class KafkaToPubsub
- java.lang.Object
-
- org.apache.beam.examples.complete.kafkatopubsub.KafkaToPubsub
-
public class KafkaToPubsub extends java.lang.ObjectTheKafkaToPubsubpipeline is a streaming pipeline which ingests data in JSON format from Kafka, and outputs the resulting records to PubSub. Input topics, output topic, Bootstrap servers are specified by the user as template parameters.
Kafka may be configured with SASL/SCRAM security mechanism, in this case a Vault secret storage with credentials should be provided. URL to credentials and Vault token are specified by the user as template parameters.Pipeline Requirements
- Kafka Bootstrap Server(s).
- Kafka Topic(s) exists.
- The PubSub output topic exists.
- (Optional) An existing HashiCorp Vault secret storage
- (Optional) A configured secure SSL connection for Kafka
Example Usage
# Gradle preparation To run this example your
build.gradlefile should contain the following task to execute the pipeline:task execute (type:JavaExec) { mainClass = System.getProperty("mainClass") classpath = sourceSets.main.runtimeClasspath systemProperties System.getProperties() args System.getProperty("exec.args", "").split() }This task allows to run the pipeline via the following command:gradle clean execute -DmainClass=org.apache.beam.examples.complete.kafkatopubsub.KafkaToPubsub \ -Dexec.args="--<argument>=<value> --<argument>=<value>"# Running the pipeline To execute this pipeline, specify the parameters: - Kafka Bootstrap servers - Kafka input topics - Pub/Sub output topic - Output format in the following format:--bootstrapServers=host:port \ --inputTopics=your-input-topic \ --outputTopic=projects/your-project-id/topics/your-topic-pame \ --outputFormat=AVRO|PUBSUBOptionally, to retrieve Kafka credentials for SASL/SCRAM, specify a URL to the credentials in HashiCorp Vault and the vault access token:--secretStoreUrl=http(s)://host:port/path/to/credentials --vaultToken=your-tokenOptionally, to configure secure SSL connection between the Beam pipeline and Kafka, specify the parameters: - A path to a truststore file (it can be a local path or a GCS path, which should start with `gs://`) - A path to a keystore file (it can be a local path or a GCS path, which should start with `gs://`) - Truststore password - Keystore password - Key password--truststorePath=path/to/kafka.truststore.jks --keystorePath=path/to/kafka.keystore.jks --truststorePassword=your-truststore-password --keystorePassword=your-keystore-password --keyPassword=your-key-passwordBy default this will run the pipeline locally with the DirectRunner. To change the runner, specify:--runner=YOUR_SELECTED_RUNNERExample Avro usage
This template contains an example Class to deserialize AVRO from Kafka and serialize it to AVRO in Pub/Sub. To use this example in the specific case, follow the few steps:
- Create your own class to describe AVRO schema. As an example use
AvroDataClass. Just define necessary fields. - Create your own Avro Deserializer class. As an example use
AvroDataClassKafkaAvroDeserializer. Just rename it, and put your own Schema class as the necessary types. - Modify the
FormatTransform. Put your Schema class and Deserializer to the related parameter. - Modify write step in the
KafkaToPubsubby put your Schema class to "writeAvrosToPubSub" step.
-
-
Constructor Summary
Constructors Constructor Description KafkaToPubsub()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static voidmain(java.lang.String[] args)Main entry point for pipeline execution.static org.apache.beam.sdk.PipelineResultrun(org.apache.beam.sdk.Pipeline pipeline, KafkaToPubsubOptions options)Runs a pipeline which reads message from Kafka and writes it to GCS.
-
-
-
Method Detail
-
main
public static void main(java.lang.String[] args)
Main entry point for pipeline execution.- Parameters:
args- Command line arguments to the pipeline.
-
run
public static org.apache.beam.sdk.PipelineResult run(org.apache.beam.sdk.Pipeline pipeline, KafkaToPubsubOptions options)Runs a pipeline which reads message from Kafka and writes it to GCS.- Parameters:
options- arguments to the pipeline
-
-