Class SparkStructuredStreamingRunner
- java.lang.Object
-
- org.apache.beam.sdk.PipelineRunner<SparkStructuredStreamingPipelineResult>
-
- org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingRunner
-
public final class SparkStructuredStreamingRunner extends org.apache.beam.sdk.PipelineRunner<SparkStructuredStreamingPipelineResult>
A Spark runner build on top of Spark's SQL Engine (Structured Streaming framework).This runner is experimental, its coverage of the Beam model is still partial. Due to limitations of the Structured Streaming framework (e.g. lack of support for multiple stateful operators), streaming mode is not yet supported by this runner.
The runner translates transforms defined on a Beam pipeline to Spark `Dataset` transformations (leveraging the high level Dataset API) and then submits these to Spark to be executed.
To run a Beam pipeline with the default options using Spark's local mode, we would do the following:
Pipeline p = [logic for pipeline creation] PipelineResult result = p.run();To create a pipeline runner to run against a different spark cluster, with a custom master url we would do the following:
Pipeline p = [logic for pipeline creation] SparkCommonPipelineOptions options = p.getOptions.as(SparkCommonPipelineOptions.class); options.setSparkMaster("spark://host:port"); PipelineResult result = p.run();
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static SparkStructuredStreamingRunnercreate()Creates and returns a new SparkStructuredStreamingRunner with default options.static SparkStructuredStreamingRunnercreate(SparkStructuredStreamingPipelineOptions options)Creates and returns a new SparkStructuredStreamingRunner with specified options.static SparkStructuredStreamingRunnerfromOptions(org.apache.beam.sdk.options.PipelineOptions options)Creates and returns a new SparkStructuredStreamingRunner with specified options.SparkStructuredStreamingPipelineResultrun(org.apache.beam.sdk.Pipeline pipeline)
-
-
-
Method Detail
-
create
public static SparkStructuredStreamingRunner create()
Creates and returns a new SparkStructuredStreamingRunner with default options. In particular, against a spark instance running in local mode.- Returns:
- A pipeline runner with default options.
-
create
public static SparkStructuredStreamingRunner create(SparkStructuredStreamingPipelineOptions options)
Creates and returns a new SparkStructuredStreamingRunner with specified options.- Parameters:
options- The SparkStructuredStreamingPipelineOptions to use when executing the job.- Returns:
- A pipeline runner that will execute with specified options.
-
fromOptions
public static SparkStructuredStreamingRunner fromOptions(org.apache.beam.sdk.options.PipelineOptions options)
Creates and returns a new SparkStructuredStreamingRunner with specified options.- Parameters:
options- The PipelineOptions to use when executing the job.- Returns:
- A pipeline runner that will execute with specified options.
-
run
public SparkStructuredStreamingPipelineResult run(org.apache.beam.sdk.Pipeline pipeline)
- Specified by:
runin classorg.apache.beam.sdk.PipelineRunner<SparkStructuredStreamingPipelineResult>
-
-