Class EvaluationContext
- java.lang.Object
-
- org.apache.beam.runners.spark.translation.EvaluationContext
-
public class EvaluationContext extends java.lang.ObjectThe EvaluationContext allows us to define pipeline instructions and translate betweenPObject<T>s orPCollection<T>s and Ts or DStreams/RDDs of Ts.
-
-
Constructor Summary
Constructors Constructor Description EvaluationContext(org.apache.spark.api.java.JavaSparkContext jsc, org.apache.beam.sdk.Pipeline pipeline, org.apache.beam.sdk.options.PipelineOptions options)EvaluationContext(org.apache.spark.api.java.JavaSparkContext jsc, org.apache.beam.sdk.Pipeline pipeline, org.apache.beam.sdk.options.PipelineOptions options, org.apache.spark.streaming.api.java.JavaStreamingContext jssc)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description DatasetborrowDataset(org.apache.beam.sdk.transforms.PTransform<? extends org.apache.beam.sdk.values.PValue,?> transform)DatasetborrowDataset(org.apache.beam.sdk.values.PValue pvalue)voidcomputeOutputs()Computes the outputs for all RDDs that are leaves in the DAG and do not have any actions (like saving to a file) registered on them (i.e.<T> Tget(org.apache.beam.sdk.values.PValue value)Retrieve an object of Type T associated with the PValue passed in.java.util.Map<org.apache.beam.sdk.values.PCollection,java.lang.Long>getCacheCandidates()Get the map of cache candidates hold by the evaluation context.org.apache.beam.sdk.runners.AppliedPTransform<?,?,?>getCurrentTransform()<T extends org.apache.beam.sdk.values.PValue>
TgetInput(org.apache.beam.sdk.transforms.PTransform<T,?> transform)<T> java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,org.apache.beam.sdk.values.PCollection<?>>getInputs(org.apache.beam.sdk.transforms.PTransform<?,?> transform)org.apache.beam.sdk.options.PipelineOptionsgetOptions()<T extends org.apache.beam.sdk.values.PValue>
TgetOutput(org.apache.beam.sdk.transforms.PTransform<?,T> transform)java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,org.apache.beam.sdk.coders.Coder<?>>getOutputCoders()java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,org.apache.beam.sdk.values.PCollection<?>>getOutputs(org.apache.beam.sdk.transforms.PTransform<?,?> transform)org.apache.beam.sdk.PipelinegetPipeline()SparkPCollectionViewgetPViews()Return the current views creates in the pipeline.org.apache.beam.runners.core.construction.SerializablePipelineOptionsgetSerializableOptions()org.apache.spark.api.java.JavaSparkContextgetSparkContext()org.apache.spark.streaming.api.java.JavaStreamingContextgetStreamingContext()voidputDataset(org.apache.beam.sdk.transforms.PTransform<?,? extends org.apache.beam.sdk.values.PValue> transform, Dataset dataset)Add single output of transform to context map and possibly cache if it conformsshouldCache(PTransform, PValue).voidputDataset(org.apache.beam.sdk.values.PValue pvalue, Dataset dataset)Add output of transform to context map and possibly cache if it conformsshouldCache(PTransform, PValue).voidputPView(org.apache.beam.sdk.values.PCollectionView<?> view, java.lang.Iterable<org.apache.beam.sdk.util.WindowedValue<?>> value, org.apache.beam.sdk.coders.Coder<java.lang.Iterable<org.apache.beam.sdk.util.WindowedValue<?>>> coder)Adds/Replaces a view to the current views creates in the pipeline.voidsetCurrentTransform(org.apache.beam.sdk.runners.AppliedPTransform<?,?,?> transform)booleanshouldCache(org.apache.beam.sdk.transforms.PTransform<?,? extends org.apache.beam.sdk.values.PValue> transform, org.apache.beam.sdk.values.PValue pvalue)Cache PCollection if SparkPipelineOptions.isCacheDisabled is false or transform isn't GroupByKey transformation and PCollection is used more then once in Pipeline.java.lang.StringstorageLevel()
-
-
-
Constructor Detail
-
EvaluationContext
public EvaluationContext(org.apache.spark.api.java.JavaSparkContext jsc, org.apache.beam.sdk.Pipeline pipeline, org.apache.beam.sdk.options.PipelineOptions options)
-
EvaluationContext
public EvaluationContext(org.apache.spark.api.java.JavaSparkContext jsc, org.apache.beam.sdk.Pipeline pipeline, org.apache.beam.sdk.options.PipelineOptions options, org.apache.spark.streaming.api.java.JavaStreamingContext jssc)
-
-
Method Detail
-
getSparkContext
public org.apache.spark.api.java.JavaSparkContext getSparkContext()
-
getStreamingContext
public org.apache.spark.streaming.api.java.JavaStreamingContext getStreamingContext()
-
getPipeline
public org.apache.beam.sdk.Pipeline getPipeline()
-
getOptions
public org.apache.beam.sdk.options.PipelineOptions getOptions()
-
getSerializableOptions
public org.apache.beam.runners.core.construction.SerializablePipelineOptions getSerializableOptions()
-
setCurrentTransform
public void setCurrentTransform(org.apache.beam.sdk.runners.AppliedPTransform<?,?,?> transform)
-
getCurrentTransform
public org.apache.beam.sdk.runners.AppliedPTransform<?,?,?> getCurrentTransform()
-
getInput
public <T extends org.apache.beam.sdk.values.PValue> T getInput(org.apache.beam.sdk.transforms.PTransform<T,?> transform)
-
getInputs
public <T> java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,org.apache.beam.sdk.values.PCollection<?>> getInputs(org.apache.beam.sdk.transforms.PTransform<?,?> transform)
-
getOutput
public <T extends org.apache.beam.sdk.values.PValue> T getOutput(org.apache.beam.sdk.transforms.PTransform<?,T> transform)
-
getOutputs
public java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,org.apache.beam.sdk.values.PCollection<?>> getOutputs(org.apache.beam.sdk.transforms.PTransform<?,?> transform)
-
getOutputCoders
public java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,org.apache.beam.sdk.coders.Coder<?>> getOutputCoders()
-
shouldCache
public boolean shouldCache(org.apache.beam.sdk.transforms.PTransform<?,? extends org.apache.beam.sdk.values.PValue> transform, org.apache.beam.sdk.values.PValue pvalue)Cache PCollection if SparkPipelineOptions.isCacheDisabled is false or transform isn't GroupByKey transformation and PCollection is used more then once in Pipeline.PCollection is not cached in GroupByKey transformation, because Spark automatically persists some intermediate data in shuffle operations, even without users calling persist.
- Parameters:
pvalue- output of transformtransform- the transform to check- Returns:
- if PCollection will be cached
-
putDataset
public void putDataset(org.apache.beam.sdk.transforms.PTransform<?,? extends org.apache.beam.sdk.values.PValue> transform, Dataset dataset)Add single output of transform to context map and possibly cache if it conformsshouldCache(PTransform, PValue).- Parameters:
transform- from which Dataset was createddataset- created Dataset from transform
-
putDataset
public void putDataset(org.apache.beam.sdk.values.PValue pvalue, Dataset dataset)Add output of transform to context map and possibly cache if it conformsshouldCache(PTransform, PValue). Used when PTransform has multiple outputs.- Parameters:
pvalue- one of multiple outputs of transformdataset- created Dataset from transform
-
borrowDataset
public Dataset borrowDataset(org.apache.beam.sdk.transforms.PTransform<? extends org.apache.beam.sdk.values.PValue,?> transform)
-
borrowDataset
public Dataset borrowDataset(org.apache.beam.sdk.values.PValue pvalue)
-
computeOutputs
public void computeOutputs()
Computes the outputs for all RDDs that are leaves in the DAG and do not have any actions (like saving to a file) registered on them (i.e. they are performed for side effects).
-
get
public <T> T get(org.apache.beam.sdk.values.PValue value)
Retrieve an object of Type T associated with the PValue passed in.- Type Parameters:
T- Type of object to return.- Parameters:
value- PValue to retrieve associated data for.- Returns:
- Native object.
-
getPViews
public SparkPCollectionView getPViews()
Return the current views creates in the pipeline.- Returns:
- SparkPCollectionView
-
putPView
public void putPView(org.apache.beam.sdk.values.PCollectionView<?> view, java.lang.Iterable<org.apache.beam.sdk.util.WindowedValue<?>> value, org.apache.beam.sdk.coders.Coder<java.lang.Iterable<org.apache.beam.sdk.util.WindowedValue<?>>> coder)Adds/Replaces a view to the current views creates in the pipeline.- Parameters:
view- - Identifier of the viewvalue- - Actual value of the viewcoder- - Coder of the value
-
getCacheCandidates
public java.util.Map<org.apache.beam.sdk.values.PCollection,java.lang.Long> getCacheCandidates()
Get the map of cache candidates hold by the evaluation context.- Returns:
- The current
Mapof cache candidates.
-
storageLevel
public java.lang.String storageLevel()
-
-