Class SparkTranslationContext

  • Direct Known Subclasses:
    SparkStreamingTranslationContext

    public class SparkTranslationContext
    extends java.lang.Object
    Translation context used to lazily store Spark data sets during portable pipeline translation and compute them after translation.
    • Constructor Summary

      Constructors 
      Constructor Description
      SparkTranslationContext​(org.apache.spark.api.java.JavaSparkContext jsc, org.apache.beam.sdk.options.PipelineOptions options, org.apache.beam.runners.fnexecution.provisioning.JobInfo jobInfo)  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void computeOutputs()
      Compute the outputs for all RDDs that are leaves in the DAG.
      org.apache.beam.runners.core.construction.SerializablePipelineOptions getSerializableOptions()  
      org.apache.spark.api.java.JavaSparkContext getSparkContext()  
      int nextSinkId()
      Generate a unique pCollection id number to identify runner-generated sinks.
      Dataset popDataset​(java.lang.String pCollectionId)
      Retrieve the dataset for the pCollection id and remove it from the DAG's leaves.
      void pushDataset​(java.lang.String pCollectionId, Dataset dataset)
      Add output of transform to context.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • SparkTranslationContext

        public SparkTranslationContext​(org.apache.spark.api.java.JavaSparkContext jsc,
                                       org.apache.beam.sdk.options.PipelineOptions options,
                                       org.apache.beam.runners.fnexecution.provisioning.JobInfo jobInfo)
    • Method Detail

      • getSparkContext

        public org.apache.spark.api.java.JavaSparkContext getSparkContext()
      • getSerializableOptions

        public org.apache.beam.runners.core.construction.SerializablePipelineOptions getSerializableOptions()
      • pushDataset

        public void pushDataset​(java.lang.String pCollectionId,
                                Dataset dataset)
        Add output of transform to context.
      • popDataset

        public Dataset popDataset​(java.lang.String pCollectionId)
        Retrieve the dataset for the pCollection id and remove it from the DAG's leaves.
      • computeOutputs

        public void computeOutputs()
        Compute the outputs for all RDDs that are leaves in the DAG.
      • nextSinkId

        public int nextSinkId()
        Generate a unique pCollection id number to identify runner-generated sinks.