Class CreateStream<T>
- java.lang.Object
-
- org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,org.apache.beam.sdk.values.PCollection<T>>
-
- org.apache.beam.runners.spark.io.CreateStream<T>
-
- Type Parameters:
T- The type of the element in this stream.
- All Implemented Interfaces:
java.io.Serializable,org.apache.beam.sdk.transforms.display.HasDisplayData
public final class CreateStream<T> extends org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,org.apache.beam.sdk.values.PCollection<T>>Create an input stream from Queue. For SparkRunner tests only.To properly compose a stream of micro-batches with their Watermarks, please keep in mind that eventually there a two queues here - one for batches and another for Watermarks.
While both queues advance according to Spark's batch-interval, there is a slight difference in how data is pushed into the stream compared to the advancement of Watermarks since Watermarks advance onBatchCompleted hook call so if you'd want to set the watermark advance for a specific batch it should be called before that batch. Also keep in mind that being a queue that is polled per batch interval, if there is a need to "hold" the same Watermark without advancing it, it should be stated explicitly or the Watermark will advance as soon as it can (in the next batch completed hook).
Example 1:
The first batch will see the default start-of-time WM ofCreateStream.of(StringUtf8Coder.of(), batchDuration) .nextBatch( TimestampedValue.of("foo", endOfGlobalWindow), TimestampedValue.of("bar", endOfGlobalWindow)) .advanceNextBatchWatermarkToInfinity();BoundedWindow.TIMESTAMP_MIN_VALUEand any following batch will see the end-of-time WMBoundedWindow.TIMESTAMP_MAX_VALUE.Example 2:
CreateStream.of(VarIntCoder.of(), batchDuration) .nextBatch( TimestampedValue.of(1, instant)) .advanceWatermarkForNextBatch(instant.plus(Duration.standardMinutes(20))) .nextBatch( TimestampedValue.of(2, instant)) .nextBatch( TimestampedValue.of(3, instant)) .advanceWatermarkForNextBatch(instant.plus(Duration.standardMinutes(30)))The first batch will see the start-of-time WM and the second will see the advanced (+20 min.) WM. The third WM will see the WM advanced to +30 min, because this is the next advancement of the WM regardless of where it ws called in the construction of CreateStream.
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.StringTRANSFORM_URN
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description CreateStream<T>advanceNextBatchWatermarkToInfinity()Advances the watermark in the next batch to the end-of-time.CreateStream<T>advanceWatermarkForNextBatch(org.joda.time.Instant newWatermark)Advances the watermark in the next batch.CreateStream<T>emptyBatch()Adds an empty batch.org.apache.beam.sdk.values.PCollection<T>expand(org.apache.beam.sdk.values.PBegin input)longgetBatchDuration()java.util.Queue<java.lang.Iterable<org.apache.beam.sdk.values.TimestampedValue<T>>>getBatches()Get the underlying queue representing the mock stream of micro-batches.java.util.Queue<GlobalWatermarkHolder.SparkWatermarks>getTimes()Get times so they can be pushed into theGlobalWatermarkHolder.CreateStream<T>initialSystemTimeAt(org.joda.time.Instant initialSystemTime)Set the initial synchronized processing time.booleanisForceWatermarkSync()CreateStream<T>nextBatch(org.apache.beam.sdk.values.TimestampedValue<T>... batchElements)Enqueue next micro-batch elements.CreateStream<T>nextBatch(T... batchElements)For non-timestamped elements.static <T> CreateStream<T>of(org.apache.beam.sdk.coders.Coder<T> coder, org.joda.time.Duration batchDuration)Creates a new Spark based stream without forced watermark sync, intended for test purposes.static <T> CreateStream<T>of(org.apache.beam.sdk.coders.Coder<T> coder, org.joda.time.Duration batchDuration, boolean forceWatermarkSync)Creates a new Spark based stream intended for test purposes.-
Methods inherited from class org.apache.beam.sdk.transforms.PTransform
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, populateDisplayData, setDisplayData, setResourceHints, toString, validate, validate
-
-
-
-
Field Detail
-
TRANSFORM_URN
public static final java.lang.String TRANSFORM_URN
- See Also:
- Constant Field Values
-
-
Method Detail
-
of
public static <T> CreateStream<T> of(org.apache.beam.sdk.coders.Coder<T> coder, org.joda.time.Duration batchDuration, boolean forceWatermarkSync)
Creates a new Spark based stream intended for test purposes.- Parameters:
batchDuration- the batch duration (interval) to be used for creating this stream.coder- the coder to be used for this stream.forceWatermarkSync- whether this stream should be synced with the advancement of the watermark maintained by theGlobalWatermarkHolder.
-
of
public static <T> CreateStream<T> of(org.apache.beam.sdk.coders.Coder<T> coder, org.joda.time.Duration batchDuration)
Creates a new Spark based stream without forced watermark sync, intended for test purposes. See alsoof(Coder, Duration, boolean).
-
nextBatch
@SafeVarargs public final CreateStream<T> nextBatch(org.apache.beam.sdk.values.TimestampedValue<T>... batchElements)
Enqueue next micro-batch elements. This is backed by aQueueso stream input order would keep the population order (FIFO).
-
nextBatch
@SafeVarargs public final CreateStream<T> nextBatch(T... batchElements)
For non-timestamped elements.
-
emptyBatch
public CreateStream<T> emptyBatch()
Adds an empty batch.
-
initialSystemTimeAt
public CreateStream<T> initialSystemTimeAt(org.joda.time.Instant initialSystemTime)
Set the initial synchronized processing time.
-
advanceWatermarkForNextBatch
public CreateStream<T> advanceWatermarkForNextBatch(org.joda.time.Instant newWatermark)
Advances the watermark in the next batch.
-
advanceNextBatchWatermarkToInfinity
public CreateStream<T> advanceNextBatchWatermarkToInfinity()
Advances the watermark in the next batch to the end-of-time.
-
getBatchDuration
public long getBatchDuration()
-
getBatches
public java.util.Queue<java.lang.Iterable<org.apache.beam.sdk.values.TimestampedValue<T>>> getBatches()
Get the underlying queue representing the mock stream of micro-batches.
-
getTimes
public java.util.Queue<GlobalWatermarkHolder.SparkWatermarks> getTimes()
Get times so they can be pushed into theGlobalWatermarkHolder.
-
isForceWatermarkSync
public boolean isForceWatermarkSync()
-
-