Class PipelineTranslator
- java.lang.Object
-
- org.apache.beam.runners.spark.structuredstreaming.translation.PipelineTranslator
-
- Direct Known Subclasses:
PipelineTranslatorBatch
@Internal public abstract class PipelineTranslator extends java.lang.ObjectThe pipeline translator translates a BeamPipelineinto a Spark correspondence, that can then be evaluated.The translation involves traversing the hierarchy of a pipeline multiple times:
- Detect if
streamingmode is required. - Identify datasets that are repeatedly used as input and should be cached.
- And finally, translate each primitive or composite
PTransformthat isknownandsupportedinto its Spark correspondence. If a composite is not supported, it will be expanded further into its parts and translated then.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static interfacePipelineTranslator.TranslationStateShared, mutable state during the translation of a pipeline and omitted afterwards.static interfacePipelineTranslator.UnresolvedTranslation<InT,T>Unresolved translation, allowing to optimize the generated Spark DAG.
-
Constructor Summary
Constructors Constructor Description PipelineTranslator()
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description static voiddetectStreamingMode(org.apache.beam.sdk.Pipeline pipeline, org.apache.beam.sdk.options.StreamingOptions options)Analyse the pipeline to determine if we have to switch to streaming mode for the pipeline translation and updateStreamingOptionsaccordingly.protected abstract <InT extends org.apache.beam.sdk.values.PInput,OutT extends org.apache.beam.sdk.values.POutput,TransformT extends org.apache.beam.sdk.transforms.PTransform<InT,OutT>>
TransformTranslator<InT,OutT,TransformT>getTransformTranslator(TransformT transform)Returns aTransformTranslatorfor the givenPTransformif known.static voidreplaceTransforms(org.apache.beam.sdk.Pipeline pipeline, org.apache.beam.sdk.options.StreamingOptions options)EvaluationContexttranslate(org.apache.beam.sdk.Pipeline pipeline, org.apache.spark.sql.SparkSession session, SparkCommonPipelineOptions options)Translates a Beam pipeline into its Spark correspondence using the Spark SQL / Dataset API.
-
-
-
Method Detail
-
replaceTransforms
public static void replaceTransforms(org.apache.beam.sdk.Pipeline pipeline, org.apache.beam.sdk.options.StreamingOptions options)
-
detectStreamingMode
public static void detectStreamingMode(org.apache.beam.sdk.Pipeline pipeline, org.apache.beam.sdk.options.StreamingOptions options)Analyse the pipeline to determine if we have to switch to streaming mode for the pipeline translation and updateStreamingOptionsaccordingly.
-
getTransformTranslator
@Nullable protected abstract <InT extends org.apache.beam.sdk.values.PInput,OutT extends org.apache.beam.sdk.values.POutput,TransformT extends org.apache.beam.sdk.transforms.PTransform<InT,OutT>> TransformTranslator<InT,OutT,TransformT> getTransformTranslator(TransformT transform)
Returns aTransformTranslatorfor the givenPTransformif known.
-
translate
public EvaluationContext translate(org.apache.beam.sdk.Pipeline pipeline, org.apache.spark.sql.SparkSession session, SparkCommonPipelineOptions options)
Translates a Beam pipeline into its Spark correspondence using the Spark SQL / Dataset API.Note, in some cases this involves the early evaluation of some parts of the pipeline. For example, in order to use a side-input
PCollectionViewin a translation the corresponding SparkDatasetmight have to be collected and broadcasted to be able to continue with the translation.- Returns:
- The result of the translation is an
EvaluationContextthat can trigger the evaluation of the Spark pipeline.
-
-