Class TranslationUtils


  • public final class TranslationUtils
    extends java.lang.Object
    A set of utilities to help translating Beam transformations into Spark transformations.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static <T1,​T2>
      org.apache.spark.streaming.api.java.JavaDStream<T2>
      dStreamValues​(org.apache.spark.streaming.api.java.JavaPairDStream<T1,​T2> pairDStream)
      Transform a pair stream into a value stream.
      static <T> org.apache.spark.api.java.function.VoidFunction<T> emptyVoidFunction()  
      static <InputT,​OutputT>
      org.apache.spark.api.java.function.FlatMapFunction<java.util.Iterator<InputT>,​OutputT>
      functionToFlatMapFunction​(org.apache.spark.api.java.function.Function<InputT,​OutputT> func)
      A utility method that adapts Function to a FlatMapFunction with an Iterator input.
      static java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,​org.apache.beam.sdk.values.KV<org.apache.beam.sdk.values.WindowingStrategy<?,​?>,​SideInputBroadcast<?>>> getSideInputs​(java.lang.Iterable<org.apache.beam.sdk.values.PCollectionView<?>> views, org.apache.spark.api.java.JavaSparkContext context, SparkPCollectionView pviews)
      Create SideInputs as Broadcast variables.
      static java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,​org.apache.beam.sdk.coders.Coder<org.apache.beam.sdk.util.WindowedValue<?>>> getTupleTagCoders​(java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,​org.apache.beam.sdk.values.PCollection<?>> outputs)
      Utility to get mapping between TupleTag and a coder.
      static org.apache.spark.api.java.function.PairFunction<scala.Tuple2<org.apache.beam.sdk.values.TupleTag<?>,​ValueAndCoderLazySerializable<org.apache.beam.sdk.util.WindowedValue<?>>>,​org.apache.beam.sdk.values.TupleTag<?>,​org.apache.beam.sdk.util.WindowedValue<?>> getTupleTagDecodeFunction​(java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,​org.apache.beam.sdk.coders.Coder<org.apache.beam.sdk.util.WindowedValue<?>>> coderMap)
      Returns a pair function to convert bytes to value via coder.
      static org.apache.spark.api.java.function.PairFunction<scala.Tuple2<org.apache.beam.sdk.values.TupleTag<?>,​org.apache.beam.sdk.util.WindowedValue<?>>,​org.apache.beam.sdk.values.TupleTag<?>,​ValueAndCoderLazySerializable<org.apache.beam.sdk.util.WindowedValue<?>>> getTupleTagEncodeFunction​(java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,​org.apache.beam.sdk.coders.Coder<org.apache.beam.sdk.util.WindowedValue<?>>> coderMap)
      Returns a pair function to convert value to bytes via coder.
      static <T,​K,​V>
      org.apache.spark.api.java.function.PairFlatMapFunction<java.util.Iterator<T>,​K,​V>
      pairFunctionToPairFlatMapFunction​(org.apache.spark.api.java.function.PairFunction<T,​K,​V> pairFunction)
      A utility method that adapts PairFunction to a PairFlatMapFunction with an Iterator input.
      static void rejectStateAndTimers​(org.apache.beam.sdk.transforms.DoFn<?,​?> doFn)
      Reject state and timers DoFn.
      static <T,​W extends org.apache.beam.sdk.transforms.windowing.BoundedWindow>
      boolean
      skipAssignWindows​(org.apache.beam.sdk.transforms.windowing.Window.Assign<T> transform, EvaluationContext context)
      Checks if the window transformation should be applied or skipped.
      static <K,​V>
      org.apache.spark.api.java.function.PairFunction<org.apache.beam.sdk.util.WindowedValue<org.apache.beam.sdk.values.KV<K,​V>>,​ByteArray,​org.apache.beam.sdk.util.WindowedValue<org.apache.beam.sdk.values.KV<K,​V>>>
      toPairByKeyInWindowedValue​(org.apache.beam.sdk.coders.Coder<K> keyCoder)
      Extract key from a WindowedValue KV into a pair.
      static <K,​V>
      org.apache.spark.api.java.function.PairFlatMapFunction<java.util.Iterator<org.apache.beam.sdk.values.KV<K,​V>>,​K,​V>
      toPairFlatMapFunction()
      KV to pair flatmap function.
      static <K,​V>
      org.apache.spark.api.java.function.PairFunction<org.apache.beam.sdk.values.KV<K,​V>,​K,​V>
      toPairFunction()
      KV to pair function.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • skipAssignWindows

        public static <T,​W extends org.apache.beam.sdk.transforms.windowing.BoundedWindow> boolean skipAssignWindows​(org.apache.beam.sdk.transforms.windowing.Window.Assign<T> transform,
                                                                                                                           EvaluationContext context)
        Checks if the window transformation should be applied or skipped.

        Avoid running assign windows if both source and destination are global window or if the user has not specified the WindowFn (meaning they are just messing with triggering or allowed lateness).

        Type Parameters:
        T - PCollection type.
        W - BoundedWindow type.
        Parameters:
        transform - The Window.Assign transformation.
        context - The EvaluationContext.
        Returns:
        if to apply the transformation.
      • dStreamValues

        public static <T1,​T2> org.apache.spark.streaming.api.java.JavaDStream<T2> dStreamValues​(org.apache.spark.streaming.api.java.JavaPairDStream<T1,​T2> pairDStream)
        Transform a pair stream into a value stream.
      • toPairFunction

        public static <K,​V> org.apache.spark.api.java.function.PairFunction<org.apache.beam.sdk.values.KV<K,​V>,​K,​V> toPairFunction()
        KV to pair function.
      • toPairFlatMapFunction

        public static <K,​V> org.apache.spark.api.java.function.PairFlatMapFunction<java.util.Iterator<org.apache.beam.sdk.values.KV<K,​V>>,​K,​V> toPairFlatMapFunction()
        KV to pair flatmap function.
      • toPairByKeyInWindowedValue

        public static <K,​V> org.apache.spark.api.java.function.PairFunction<org.apache.beam.sdk.util.WindowedValue<org.apache.beam.sdk.values.KV<K,​V>>,​ByteArray,​org.apache.beam.sdk.util.WindowedValue<org.apache.beam.sdk.values.KV<K,​V>>> toPairByKeyInWindowedValue​(org.apache.beam.sdk.coders.Coder<K> keyCoder)
        Extract key from a WindowedValue KV into a pair.
      • getSideInputs

        public static java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,​org.apache.beam.sdk.values.KV<org.apache.beam.sdk.values.WindowingStrategy<?,​?>,​SideInputBroadcast<?>>> getSideInputs​(java.lang.Iterable<org.apache.beam.sdk.values.PCollectionView<?>> views,
                                                                                                                                                                                                                   org.apache.spark.api.java.JavaSparkContext context,
                                                                                                                                                                                                                   SparkPCollectionView pviews)
        Create SideInputs as Broadcast variables.
        Parameters:
        views - The PCollectionViews.
        context - The JavaSparkContext.
        pviews - The SparkPCollectionView.
        Returns:
        a map of tagged SideInputBroadcasts and their WindowingStrategy.
      • rejectStateAndTimers

        public static void rejectStateAndTimers​(org.apache.beam.sdk.transforms.DoFn<?,​?> doFn)
        Reject state and timers DoFn.
        Parameters:
        doFn - the DoFn to possibly reject.
      • emptyVoidFunction

        public static <T> org.apache.spark.api.java.function.VoidFunction<T> emptyVoidFunction()
      • pairFunctionToPairFlatMapFunction

        public static <T,​K,​V> org.apache.spark.api.java.function.PairFlatMapFunction<java.util.Iterator<T>,​K,​V> pairFunctionToPairFlatMapFunction​(org.apache.spark.api.java.function.PairFunction<T,​K,​V> pairFunction)
        A utility method that adapts PairFunction to a PairFlatMapFunction with an Iterator input. This is particularly useful because it allows to use functions written for mapToPair functions in flatmapToPair functions.
        Type Parameters:
        T - the input type.
        K - the output key type.
        V - the output value type.
        Parameters:
        pairFunction - the PairFunction to adapt.
        Returns:
        a PairFlatMapFunction that accepts an Iterator as an input and applies the PairFunction on every element.
      • functionToFlatMapFunction

        public static <InputT,​OutputT> org.apache.spark.api.java.function.FlatMapFunction<java.util.Iterator<InputT>,​OutputT> functionToFlatMapFunction​(org.apache.spark.api.java.function.Function<InputT,​OutputT> func)
        A utility method that adapts Function to a FlatMapFunction with an Iterator input. This is particularly useful because it allows to use functions written for map functions in flatmap functions.
        Type Parameters:
        InputT - the input type.
        OutputT - the output type.
        Parameters:
        func - the Function to adapt.
        Returns:
        a FlatMapFunction that accepts an Iterator as an input and applies the Function on every element.
      • getTupleTagCoders

        public static java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,​org.apache.beam.sdk.coders.Coder<org.apache.beam.sdk.util.WindowedValue<?>>> getTupleTagCoders​(java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,​org.apache.beam.sdk.values.PCollection<?>> outputs)
        Utility to get mapping between TupleTag and a coder.
        Parameters:
        outputs - - A map of tuple tags and pcollections
        Returns:
        mapping between TupleTag and a coder
      • getTupleTagEncodeFunction

        public static org.apache.spark.api.java.function.PairFunction<scala.Tuple2<org.apache.beam.sdk.values.TupleTag<?>,​org.apache.beam.sdk.util.WindowedValue<?>>,​org.apache.beam.sdk.values.TupleTag<?>,​ValueAndCoderLazySerializable<org.apache.beam.sdk.util.WindowedValue<?>>> getTupleTagEncodeFunction​(java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,​org.apache.beam.sdk.coders.Coder<org.apache.beam.sdk.util.WindowedValue<?>>> coderMap)
        Returns a pair function to convert value to bytes via coder.
        Parameters:
        coderMap - - mapping between TupleTag and a coder
        Returns:
        a pair function to convert value to bytes via coder
      • getTupleTagDecodeFunction

        public static org.apache.spark.api.java.function.PairFunction<scala.Tuple2<org.apache.beam.sdk.values.TupleTag<?>,​ValueAndCoderLazySerializable<org.apache.beam.sdk.util.WindowedValue<?>>>,​org.apache.beam.sdk.values.TupleTag<?>,​org.apache.beam.sdk.util.WindowedValue<?>> getTupleTagDecodeFunction​(java.util.Map<org.apache.beam.sdk.values.TupleTag<?>,​org.apache.beam.sdk.coders.Coder<org.apache.beam.sdk.util.WindowedValue<?>>> coderMap)
        Returns a pair function to convert bytes to value via coder.
        Parameters:
        coderMap - - mapping between TupleTag and a coder
        Returns:
        a pair function to convert bytes to value via coder