Class SparkStructuredStreamingRunner


  • public final class SparkStructuredStreamingRunner
    extends org.apache.beam.sdk.PipelineRunner<SparkStructuredStreamingPipelineResult>
    A Spark runner build on top of Spark's SQL Engine (Structured Streaming framework).

    This runner is experimental, its coverage of the Beam model is still partial. Due to limitations of the Structured Streaming framework (e.g. lack of support for multiple stateful operators), streaming mode is not yet supported by this runner.

    The runner translates transforms defined on a Beam pipeline to Spark `Dataset` transformations (leveraging the high level Dataset API) and then submits these to Spark to be executed.

    To run a Beam pipeline with the default options using Spark's local mode, we would do the following:

    
     Pipeline p = [logic for pipeline creation]
     PipelineResult result = p.run();
     

    To create a pipeline runner to run against a different spark cluster, with a custom master url we would do the following:

    
     Pipeline p = [logic for pipeline creation]
     SparkCommonPipelineOptions options = p.getOptions.as(SparkCommonPipelineOptions.class);
     options.setSparkMaster("spark://host:port");
     PipelineResult result = p.run();
     
    • Method Detail

      • create

        public static SparkStructuredStreamingRunner create()
        Creates and returns a new SparkStructuredStreamingRunner with default options. In particular, against a spark instance running in local mode.
        Returns:
        A pipeline runner with default options.
      • create

        public static SparkStructuredStreamingRunner create​(SparkStructuredStreamingPipelineOptions options)
        Creates and returns a new SparkStructuredStreamingRunner with specified options.
        Parameters:
        options - The SparkStructuredStreamingPipelineOptions to use when executing the job.
        Returns:
        A pipeline runner that will execute with specified options.
      • fromOptions

        public static SparkStructuredStreamingRunner fromOptions​(org.apache.beam.sdk.options.PipelineOptions options)
        Creates and returns a new SparkStructuredStreamingRunner with specified options.
        Parameters:
        options - The PipelineOptions to use when executing the job.
        Returns:
        A pipeline runner that will execute with specified options.