Class EvaluationContext


  • public class EvaluationContext
    extends java.lang.Object
    The EvaluationContext allows us to define pipeline instructions and translate between PObject<T>s or PCollection<T>s and Ts or DStreams/RDDs of Ts.
    • Constructor Summary

      Constructors 
      Constructor Description
      EvaluationContext​(org.apache.spark.api.java.JavaSparkContext jsc, org.apache.beam.sdk.Pipeline pipeline, org.apache.beam.sdk.options.PipelineOptions options)  
      EvaluationContext​(org.apache.spark.api.java.JavaSparkContext jsc, org.apache.beam.sdk.Pipeline pipeline, org.apache.beam.sdk.options.PipelineOptions options, org.apache.spark.streaming.api.java.JavaStreamingContext jssc)  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      Dataset borrowDataset​(org.apache.beam.sdk.transforms.PTransform<? extends org.apache.beam.sdk.values.PValue,​?> transform)  
      Dataset borrowDataset​(org.apache.beam.sdk.values.PValue pvalue)  
      void computeOutputs()
      Computes the outputs for all RDDs that are leaves in the DAG and do not have any actions (like saving to a file) registered on them (i.e.
      <T> T get​(org.apache.beam.sdk.values.PValue value)
      Retrieve an object of Type T associated with the PValue passed in.
      java.util.Map<org.apache.beam.sdk.values.PCollection,​java.lang.Long> getCacheCandidates()
      Get the map of cache candidates hold by the evaluation context.
      org.apache.beam.sdk.runners.AppliedPTransform<?,​?,​?> getCurrentTransform()  
      <T extends org.apache.beam.sdk.values.PValue>
      T
      getInput​(org.apache.beam.sdk.transforms.PTransform<T,​?> transform)  
      <T> java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,​org.apache.beam.sdk.values.PCollection<?>> getInputs​(org.apache.beam.sdk.transforms.PTransform<?,​?> transform)  
      org.apache.beam.sdk.options.PipelineOptions getOptions()  
      <T extends org.apache.beam.sdk.values.PValue>
      T
      getOutput​(org.apache.beam.sdk.transforms.PTransform<?,​T> transform)  
      java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,​org.apache.beam.sdk.coders.Coder<?>> getOutputCoders()  
      java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,​org.apache.beam.sdk.values.PCollection<?>> getOutputs​(org.apache.beam.sdk.transforms.PTransform<?,​?> transform)  
      org.apache.beam.sdk.Pipeline getPipeline()  
      SparkPCollectionView getPViews()
      Return the current views creates in the pipeline.
      org.apache.beam.runners.core.construction.SerializablePipelineOptions getSerializableOptions()  
      org.apache.spark.api.java.JavaSparkContext getSparkContext()  
      org.apache.spark.streaming.api.java.JavaStreamingContext getStreamingContext()  
      void putDataset​(org.apache.beam.sdk.transforms.PTransform<?,​? extends org.apache.beam.sdk.values.PValue> transform, Dataset dataset)
      Add single output of transform to context map and possibly cache if it conforms shouldCache(PTransform, PValue).
      void putDataset​(org.apache.beam.sdk.values.PValue pvalue, Dataset dataset)
      Add output of transform to context map and possibly cache if it conforms shouldCache(PTransform, PValue).
      void putPView​(org.apache.beam.sdk.values.PCollectionView<?> view, java.lang.Iterable<org.apache.beam.sdk.util.WindowedValue<?>> value, org.apache.beam.sdk.coders.Coder<java.lang.Iterable<org.apache.beam.sdk.util.WindowedValue<?>>> coder)
      Adds/Replaces a view to the current views creates in the pipeline.
      void setCurrentTransform​(org.apache.beam.sdk.runners.AppliedPTransform<?,​?,​?> transform)  
      boolean shouldCache​(org.apache.beam.sdk.transforms.PTransform<?,​? extends org.apache.beam.sdk.values.PValue> transform, org.apache.beam.sdk.values.PValue pvalue)
      Cache PCollection if SparkPipelineOptions.isCacheDisabled is false or transform isn't GroupByKey transformation and PCollection is used more then once in Pipeline.
      java.lang.String storageLevel()  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • EvaluationContext

        public EvaluationContext​(org.apache.spark.api.java.JavaSparkContext jsc,
                                 org.apache.beam.sdk.Pipeline pipeline,
                                 org.apache.beam.sdk.options.PipelineOptions options)
      • EvaluationContext

        public EvaluationContext​(org.apache.spark.api.java.JavaSparkContext jsc,
                                 org.apache.beam.sdk.Pipeline pipeline,
                                 org.apache.beam.sdk.options.PipelineOptions options,
                                 org.apache.spark.streaming.api.java.JavaStreamingContext jssc)
    • Method Detail

      • getSparkContext

        public org.apache.spark.api.java.JavaSparkContext getSparkContext()
      • getStreamingContext

        public org.apache.spark.streaming.api.java.JavaStreamingContext getStreamingContext()
      • getPipeline

        public org.apache.beam.sdk.Pipeline getPipeline()
      • getOptions

        public org.apache.beam.sdk.options.PipelineOptions getOptions()
      • getSerializableOptions

        public org.apache.beam.runners.core.construction.SerializablePipelineOptions getSerializableOptions()
      • setCurrentTransform

        public void setCurrentTransform​(org.apache.beam.sdk.runners.AppliedPTransform<?,​?,​?> transform)
      • getCurrentTransform

        public org.apache.beam.sdk.runners.AppliedPTransform<?,​?,​?> getCurrentTransform()
      • getInput

        public <T extends org.apache.beam.sdk.values.PValue> T getInput​(org.apache.beam.sdk.transforms.PTransform<T,​?> transform)
      • getInputs

        public <T> java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,​org.apache.beam.sdk.values.PCollection<?>> getInputs​(org.apache.beam.sdk.transforms.PTransform<?,​?> transform)
      • getOutput

        public <T extends org.apache.beam.sdk.values.PValue> T getOutput​(org.apache.beam.sdk.transforms.PTransform<?,​T> transform)
      • getOutputs

        public java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,​org.apache.beam.sdk.values.PCollection<?>> getOutputs​(org.apache.beam.sdk.transforms.PTransform<?,​?> transform)
      • getOutputCoders

        public java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,​org.apache.beam.sdk.coders.Coder<?>> getOutputCoders()
      • shouldCache

        public boolean shouldCache​(org.apache.beam.sdk.transforms.PTransform<?,​? extends org.apache.beam.sdk.values.PValue> transform,
                                   org.apache.beam.sdk.values.PValue pvalue)
        Cache PCollection if SparkPipelineOptions.isCacheDisabled is false or transform isn't GroupByKey transformation and PCollection is used more then once in Pipeline.

        PCollection is not cached in GroupByKey transformation, because Spark automatically persists some intermediate data in shuffle operations, even without users calling persist.

        Parameters:
        pvalue - output of transform
        transform - the transform to check
        Returns:
        if PCollection will be cached
      • putDataset

        public void putDataset​(org.apache.beam.sdk.transforms.PTransform<?,​? extends org.apache.beam.sdk.values.PValue> transform,
                               Dataset dataset)
        Add single output of transform to context map and possibly cache if it conforms shouldCache(PTransform, PValue).
        Parameters:
        transform - from which Dataset was created
        dataset - created Dataset from transform
      • putDataset

        public void putDataset​(org.apache.beam.sdk.values.PValue pvalue,
                               Dataset dataset)
        Add output of transform to context map and possibly cache if it conforms shouldCache(PTransform, PValue). Used when PTransform has multiple outputs.
        Parameters:
        pvalue - one of multiple outputs of transform
        dataset - created Dataset from transform
      • borrowDataset

        public Dataset borrowDataset​(org.apache.beam.sdk.transforms.PTransform<? extends org.apache.beam.sdk.values.PValue,​?> transform)
      • borrowDataset

        public Dataset borrowDataset​(org.apache.beam.sdk.values.PValue pvalue)
      • computeOutputs

        public void computeOutputs()
        Computes the outputs for all RDDs that are leaves in the DAG and do not have any actions (like saving to a file) registered on them (i.e. they are performed for side effects).
      • get

        public <T> T get​(org.apache.beam.sdk.values.PValue value)
        Retrieve an object of Type T associated with the PValue passed in.
        Type Parameters:
        T - Type of object to return.
        Parameters:
        value - PValue to retrieve associated data for.
        Returns:
        Native object.
      • getPViews

        public SparkPCollectionView getPViews()
        Return the current views creates in the pipeline.
        Returns:
        SparkPCollectionView
      • putPView

        public void putPView​(org.apache.beam.sdk.values.PCollectionView<?> view,
                             java.lang.Iterable<org.apache.beam.sdk.util.WindowedValue<?>> value,
                             org.apache.beam.sdk.coders.Coder<java.lang.Iterable<org.apache.beam.sdk.util.WindowedValue<?>>> coder)
        Adds/Replaces a view to the current views creates in the pipeline.
        Parameters:
        view - - Identifier of the view
        value - - Actual value of the view
        coder - - Coder of the value
      • getCacheCandidates

        public java.util.Map<org.apache.beam.sdk.values.PCollection,​java.lang.Long> getCacheCandidates()
        Get the map of cache candidates hold by the evaluation context.
        Returns:
        The current Map of cache candidates.
      • storageLevel

        public java.lang.String storageLevel()