public class KafkaToPubsub
extends java.lang.Object
KafkaToPubsub pipeline is a streaming pipeline which ingests data in JSON format from
Kafka, and outputs the resulting records to PubSub. Input topics, output topic, Bootstrap servers
are specified by the user as template parameters. Pipeline Requirements
Example Usage
# Gradle preparation To run this example yourbuild.gradlefile should contain the following task to execute the pipeline:task execute (type:JavaExec) { mainClass = System.getProperty("mainClass") classpath = sourceSets.main.runtimeClasspath systemProperties System.getProperties() args System.getProperty("exec.args", "").split() }This task allows to run the pipeline via the following command:gradle clean execute -DmainClass=org.apache.beam.examples.complete.kafkatopubsub.KafkaToPubsub \ -Dexec.args="--<argument>=<value> --<argument>=<value>"# Running the pipeline To execute this pipeline, specify the parameters: - Kafka Bootstrap servers - Kafka input topics - Pub/Sub output topic - Output format in the following format:--bootstrapServers=host:port \ --inputTopics=your-input-topic \ --outputTopic=projects/your-project-id/topics/your-topic-pame \ --outputFormat=AVRO|PUBSUBOptionally, to retrieve Kafka credentials for SASL/SCRAM, specify a URL to the credentials in HashiCorp Vault and the vault access token:--secretStoreUrl=http(s)://host:port/path/to/credentials --vaultToken=your-tokenOptionally, to configure secure SSL connection between the Beam pipeline and Kafka, specify the parameters: - A path to a truststore file (it can be a local path or a GCS path, which should start with `gs://`) - A path to a keystore file (it can be a local path or a GCS path, which should start with `gs://`) - Truststore password - Keystore password - Key password--truststorePath=path/to/kafka.truststore.jks --keystorePath=path/to/kafka.keystore.jks --truststorePassword=your-truststore-password --keystorePassword=your-keystore-password --keyPassword=your-key-passwordBy default this will run the pipeline locally with the DirectRunner. To change the runner, specify:--runner=YOUR_SELECTED_RUNNER
Example Avro usage
This template contains an example Class to deserialize AVRO from Kafka and serialize it to AVRO in Pub/Sub. To use this example in the specific case, follow the few steps:
AvroDataClass. Just define necessary fields.
AvroDataClassKafkaAvroDeserializer. Just rename it, and put your own Schema class as the necessary types.
FormatTransform. Put your Schema class and Deserializer to the related parameter.
KafkaToPubsub by put your Schema class to "writeAvrosToPubSub" step.
| Constructor and Description |
|---|
KafkaToPubsub() |
| Modifier and Type | Method and Description |
|---|---|
static void |
main(java.lang.String[] args)
Main entry point for pipeline execution.
|
static org.apache.beam.sdk.PipelineResult |
run(org.apache.beam.sdk.Pipeline pipeline,
KafkaToPubsubOptions options)
Runs a pipeline which reads message from Kafka and writes it to GCS.
|
public static void main(java.lang.String[] args)
args - Command line arguments to the pipeline.public static org.apache.beam.sdk.PipelineResult run(org.apache.beam.sdk.Pipeline pipeline,
KafkaToPubsubOptions options)
options - arguments to the pipeline