Class TranslationUtils
- java.lang.Object
-
- org.apache.beam.runners.spark.translation.TranslationUtils
-
public final class TranslationUtils extends java.lang.ObjectA set of utilities to help translating Beam transformations into Spark transformations.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classTranslationUtils.CombineGroupedValues<K,InputT,OutputT>A SparkCombineFn function applied to grouped KVs.static classTranslationUtils.TupleTagFilter<V>A utility class to filterTupleTags.
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static <T1,T2>
org.apache.spark.streaming.api.java.JavaDStream<T2>dStreamValues(org.apache.spark.streaming.api.java.JavaPairDStream<T1,T2> pairDStream)Transform a pair stream into a value stream.static <T> org.apache.spark.api.java.function.VoidFunction<T>emptyVoidFunction()static <InputT,OutputT>
org.apache.spark.api.java.function.FlatMapFunction<java.util.Iterator<InputT>,OutputT>functionToFlatMapFunction(org.apache.spark.api.java.function.Function<InputT,OutputT> func)A utility method that adaptsFunctionto aFlatMapFunctionwith anIteratorinput.static java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,org.apache.beam.sdk.values.KV<org.apache.beam.sdk.values.WindowingStrategy<?,?>,SideInputBroadcast<?>>>getSideInputs(java.lang.Iterable<org.apache.beam.sdk.values.PCollectionView<?>> views, org.apache.spark.api.java.JavaSparkContext context, SparkPCollectionView pviews)Create SideInputs as Broadcast variables.static java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,org.apache.beam.sdk.coders.Coder<org.apache.beam.sdk.util.WindowedValue<?>>>getTupleTagCoders(java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,org.apache.beam.sdk.values.PCollection<?>> outputs)Utility to get mapping between TupleTag and a coder.static org.apache.spark.api.java.function.PairFunction<scala.Tuple2<org.apache.beam.sdk.values.TupleTag<?>,ValueAndCoderLazySerializable<org.apache.beam.sdk.util.WindowedValue<?>>>,org.apache.beam.sdk.values.TupleTag<?>,org.apache.beam.sdk.util.WindowedValue<?>>getTupleTagDecodeFunction(java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,org.apache.beam.sdk.coders.Coder<org.apache.beam.sdk.util.WindowedValue<?>>> coderMap)Returns a pair function to convert bytes to value via coder.static org.apache.spark.api.java.function.PairFunction<scala.Tuple2<org.apache.beam.sdk.values.TupleTag<?>,org.apache.beam.sdk.util.WindowedValue<?>>,org.apache.beam.sdk.values.TupleTag<?>,ValueAndCoderLazySerializable<org.apache.beam.sdk.util.WindowedValue<?>>>getTupleTagEncodeFunction(java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,org.apache.beam.sdk.coders.Coder<org.apache.beam.sdk.util.WindowedValue<?>>> coderMap)Returns a pair function to convert value to bytes via coder.static <T,K,V>
org.apache.spark.api.java.function.PairFlatMapFunction<java.util.Iterator<T>,K,V>pairFunctionToPairFlatMapFunction(org.apache.spark.api.java.function.PairFunction<T,K,V> pairFunction)A utility method that adaptsPairFunctionto aPairFlatMapFunctionwith anIteratorinput.static voidrejectStateAndTimers(org.apache.beam.sdk.transforms.DoFn<?,?> doFn)Reject state and timersDoFn.static <T,W extends org.apache.beam.sdk.transforms.windowing.BoundedWindow>
booleanskipAssignWindows(org.apache.beam.sdk.transforms.windowing.Window.Assign<T> transform, EvaluationContext context)Checks if the window transformation should be applied or skipped.static <K,V>
org.apache.spark.api.java.function.PairFunction<org.apache.beam.sdk.util.WindowedValue<org.apache.beam.sdk.values.KV<K,V>>,ByteArray,org.apache.beam.sdk.util.WindowedValue<org.apache.beam.sdk.values.KV<K,V>>>toPairByKeyInWindowedValue(org.apache.beam.sdk.coders.Coder<K> keyCoder)Extract key from aWindowedValueKVinto a pair.static <K,V>
org.apache.spark.api.java.function.PairFlatMapFunction<java.util.Iterator<org.apache.beam.sdk.values.KV<K,V>>,K,V>toPairFlatMapFunction()KVto pair flatmap function.static <K,V>
org.apache.spark.api.java.function.PairFunction<org.apache.beam.sdk.values.KV<K,V>,K,V>toPairFunction()KVto pair function.
-
-
-
Method Detail
-
skipAssignWindows
public static <T,W extends org.apache.beam.sdk.transforms.windowing.BoundedWindow> boolean skipAssignWindows(org.apache.beam.sdk.transforms.windowing.Window.Assign<T> transform, EvaluationContext context)Checks if the window transformation should be applied or skipped.Avoid running assign windows if both source and destination are global window or if the user has not specified the WindowFn (meaning they are just messing with triggering or allowed lateness).
- Type Parameters:
T- PCollection type.W-BoundedWindowtype.- Parameters:
transform- TheWindow.Assigntransformation.context- TheEvaluationContext.- Returns:
- if to apply the transformation.
-
dStreamValues
public static <T1,T2> org.apache.spark.streaming.api.java.JavaDStream<T2> dStreamValues(org.apache.spark.streaming.api.java.JavaPairDStream<T1,T2> pairDStream)
Transform a pair stream into a value stream.
-
toPairFunction
public static <K,V> org.apache.spark.api.java.function.PairFunction<org.apache.beam.sdk.values.KV<K,V>,K,V> toPairFunction()
KVto pair function.
-
toPairFlatMapFunction
public static <K,V> org.apache.spark.api.java.function.PairFlatMapFunction<java.util.Iterator<org.apache.beam.sdk.values.KV<K,V>>,K,V> toPairFlatMapFunction()
KVto pair flatmap function.
-
toPairByKeyInWindowedValue
public static <K,V> org.apache.spark.api.java.function.PairFunction<org.apache.beam.sdk.util.WindowedValue<org.apache.beam.sdk.values.KV<K,V>>,ByteArray,org.apache.beam.sdk.util.WindowedValue<org.apache.beam.sdk.values.KV<K,V>>> toPairByKeyInWindowedValue(org.apache.beam.sdk.coders.Coder<K> keyCoder)
Extract key from aWindowedValueKVinto a pair.
-
getSideInputs
public static java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,org.apache.beam.sdk.values.KV<org.apache.beam.sdk.values.WindowingStrategy<?,?>,SideInputBroadcast<?>>> getSideInputs(java.lang.Iterable<org.apache.beam.sdk.values.PCollectionView<?>> views, org.apache.spark.api.java.JavaSparkContext context, SparkPCollectionView pviews)
Create SideInputs as Broadcast variables.- Parameters:
views- ThePCollectionViews.context- TheJavaSparkContext.pviews- TheSparkPCollectionView.- Returns:
- a map of tagged
SideInputBroadcasts and theirWindowingStrategy.
-
rejectStateAndTimers
public static void rejectStateAndTimers(org.apache.beam.sdk.transforms.DoFn<?,?> doFn)
Reject state and timersDoFn.- Parameters:
doFn- theDoFnto possibly reject.
-
emptyVoidFunction
public static <T> org.apache.spark.api.java.function.VoidFunction<T> emptyVoidFunction()
-
pairFunctionToPairFlatMapFunction
public static <T,K,V> org.apache.spark.api.java.function.PairFlatMapFunction<java.util.Iterator<T>,K,V> pairFunctionToPairFlatMapFunction(org.apache.spark.api.java.function.PairFunction<T,K,V> pairFunction)
A utility method that adaptsPairFunctionto aPairFlatMapFunctionwith anIteratorinput. This is particularly useful because it allows to use functions written for mapToPair functions in flatmapToPair functions.- Type Parameters:
T- the input type.K- the output key type.V- the output value type.- Parameters:
pairFunction- thePairFunctionto adapt.- Returns:
- a
PairFlatMapFunctionthat accepts anIteratoras an input and applies thePairFunctionon every element.
-
functionToFlatMapFunction
public static <InputT,OutputT> org.apache.spark.api.java.function.FlatMapFunction<java.util.Iterator<InputT>,OutputT> functionToFlatMapFunction(org.apache.spark.api.java.function.Function<InputT,OutputT> func)
A utility method that adaptsFunctionto aFlatMapFunctionwith anIteratorinput. This is particularly useful because it allows to use functions written for map functions in flatmap functions.- Type Parameters:
InputT- the input type.OutputT- the output type.- Parameters:
func- theFunctionto adapt.- Returns:
- a
FlatMapFunctionthat accepts anIteratoras an input and applies theFunctionon every element.
-
getTupleTagCoders
public static java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,org.apache.beam.sdk.coders.Coder<org.apache.beam.sdk.util.WindowedValue<?>>> getTupleTagCoders(java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,org.apache.beam.sdk.values.PCollection<?>> outputs)
Utility to get mapping between TupleTag and a coder.- Parameters:
outputs- - A map of tuple tags and pcollections- Returns:
- mapping between TupleTag and a coder
-
getTupleTagEncodeFunction
public static org.apache.spark.api.java.function.PairFunction<scala.Tuple2<org.apache.beam.sdk.values.TupleTag<?>,org.apache.beam.sdk.util.WindowedValue<?>>,org.apache.beam.sdk.values.TupleTag<?>,ValueAndCoderLazySerializable<org.apache.beam.sdk.util.WindowedValue<?>>> getTupleTagEncodeFunction(java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,org.apache.beam.sdk.coders.Coder<org.apache.beam.sdk.util.WindowedValue<?>>> coderMap)
Returns a pair function to convert value to bytes via coder.- Parameters:
coderMap- - mapping between TupleTag and a coder- Returns:
- a pair function to convert value to bytes via coder
-
getTupleTagDecodeFunction
public static org.apache.spark.api.java.function.PairFunction<scala.Tuple2<org.apache.beam.sdk.values.TupleTag<?>,ValueAndCoderLazySerializable<org.apache.beam.sdk.util.WindowedValue<?>>>,org.apache.beam.sdk.values.TupleTag<?>,org.apache.beam.sdk.util.WindowedValue<?>> getTupleTagDecodeFunction(java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,org.apache.beam.sdk.coders.Coder<org.apache.beam.sdk.util.WindowedValue<?>>> coderMap)
Returns a pair function to convert bytes to value via coder.- Parameters:
coderMap- - mapping between TupleTag and a coder- Returns:
- a pair function to convert bytes to value via coder
-
-