Package org.apache.beam.runners.spark
Class SparkRunner
- java.lang.Object
-
- org.apache.beam.sdk.PipelineRunner<SparkPipelineResult>
-
- org.apache.beam.runners.spark.SparkRunner
-
public final class SparkRunner extends org.apache.beam.sdk.PipelineRunner<SparkPipelineResult>
The SparkRunner translate operations defined on a pipeline to a representation executable by Spark, and then submitting the job to Spark to be executed. If we wanted to run a Beam pipeline with the default options of a single threaded spark instance in local mode, we would do the following:Pipeline p = [logic for pipeline creation] SparkPipelineResult result = (SparkPipelineResult) p.run();To create a pipeline runner to run against a different spark cluster, with a custom master url we would do the following:
Pipeline p = [logic for pipeline creation] SparkPipelineOptions options = SparkPipelineOptionsFactory.create(); options.setSparkMaster("spark://host:port"); SparkPipelineResult result = (SparkPipelineResult) p.run();
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classSparkRunner.EvaluatorEvaluator on the pipeline.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static SparkRunnercreate()Creates and returns a new SparkRunner with default options.static SparkRunnercreate(SparkPipelineOptions options)Creates and returns a new SparkRunner with specified options.static SparkRunnerfromOptions(org.apache.beam.sdk.options.PipelineOptions options)Creates and returns a new SparkRunner with specified options.static voidinitAccumulators(SparkPipelineOptions opts, org.apache.spark.api.java.JavaSparkContext jsc)Init Metrics/Aggregators accumulators.SparkPipelineResultrun(org.apache.beam.sdk.Pipeline pipeline)static voidupdateCacheCandidates(org.apache.beam.sdk.Pipeline pipeline, SparkPipelineTranslator translator, EvaluationContext evaluationContext)Evaluator that update/populate the cache candidates.
-
-
-
Method Detail
-
create
public static SparkRunner create()
Creates and returns a new SparkRunner with default options. In particular, against a spark instance running in local mode.- Returns:
- A pipeline runner with default options.
-
create
public static SparkRunner create(SparkPipelineOptions options)
Creates and returns a new SparkRunner with specified options.- Parameters:
options- The SparkPipelineOptions to use when executing the job.- Returns:
- A pipeline runner that will execute with specified options.
-
fromOptions
public static SparkRunner fromOptions(org.apache.beam.sdk.options.PipelineOptions options)
Creates and returns a new SparkRunner with specified options.- Parameters:
options- The PipelineOptions to use when executing the job.- Returns:
- A pipeline runner that will execute with specified options.
-
run
public SparkPipelineResult run(org.apache.beam.sdk.Pipeline pipeline)
- Specified by:
runin classorg.apache.beam.sdk.PipelineRunner<SparkPipelineResult>
-
initAccumulators
public static void initAccumulators(SparkPipelineOptions opts, org.apache.spark.api.java.JavaSparkContext jsc)
Init Metrics/Aggregators accumulators. This method is idempotent.
-
updateCacheCandidates
public static void updateCacheCandidates(org.apache.beam.sdk.Pipeline pipeline, SparkPipelineTranslator translator, EvaluationContext evaluationContext)Evaluator that update/populate the cache candidates.
-
-