Class StateSpecFunctions


  • public class StateSpecFunctions
    extends java.lang.Object
    A class containing StateSpec mappingFunctions.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static <T,​CheckpointMarkT extends org.apache.beam.sdk.io.UnboundedSource.CheckpointMark>
      scala.Function3<org.apache.beam.sdk.io.Source<T>,​scala.Option<CheckpointMarkT>,​org.apache.spark.streaming.State<scala.Tuple2<byte[],​org.joda.time.Instant>>,​scala.Tuple2<java.lang.Iterable<byte[]>,​SparkUnboundedSource.Metadata>>
      mapSourceFunction​(org.apache.beam.runners.core.construction.SerializablePipelineOptions options, java.lang.String stepName)
      A StateSpec function to support reading from an UnboundedSource.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • StateSpecFunctions

        public StateSpecFunctions()
    • Method Detail

      • mapSourceFunction

        public static <T,​CheckpointMarkT extends org.apache.beam.sdk.io.UnboundedSource.CheckpointMark> scala.Function3<org.apache.beam.sdk.io.Source<T>,​scala.Option<CheckpointMarkT>,​org.apache.spark.streaming.State<scala.Tuple2<byte[],​org.joda.time.Instant>>,​scala.Tuple2<java.lang.Iterable<byte[]>,​SparkUnboundedSource.Metadata>> mapSourceFunction​(org.apache.beam.runners.core.construction.SerializablePipelineOptions options,
                                                                                                                                                                                                                                                                                                                                                                                                  java.lang.String stepName)
        A StateSpec function to support reading from an UnboundedSource.

        This StateSpec function expects the following:

        • Key: The (partitioned) Source to read from.
        • Value: An optional UnboundedSource.CheckpointMark to start from.
        • State: A byte representation of the (previously) persisted CheckpointMark.
        And returns an iterator over all read values (for the micro-batch).

        This stateful operation could be described as a flatMap over a single-element stream, which outputs all the elements read from the UnboundedSource for this micro-batch. Since micro-batches are bounded, the provided UnboundedSource is wrapped by a MicrobatchSource that applies bounds in the form of duration and max records (per micro-batch).

        In order to avoid using Spark Guava's classes which pollute the classpath, we use the StateSpec.function(scala.Function3) signature which employs scala's native Option, instead of the StateSpec.function(org.apache.spark.api.java.function.Function3) signature, which employs Guava's Optional.

        See also SPARK-4819.

        Type Parameters:
        T - The type of the input stream elements.
        CheckpointMarkT - The type of the UnboundedSource.CheckpointMark.
        Parameters:
        options - A serializable SerializablePipelineOptions.
        Returns:
        The appropriate StateSpec function.