package window
- Alphabetic
- Public
- All
Type Members
-
class
AlwaysFireOnElementTrigger extends Trigger[Map[String, Any], TimeWindow]
Custom Flink Trigger that fires on every event received.
-
class
BufferedProcessingTimeTrigger extends Trigger[Map[String, Any], TimeWindow]
BufferedProcessingTimeTrigger is a custom Trigger that fires at most every 'bufferSizeMillis' within a window.
BufferedProcessingTimeTrigger is a custom Trigger that fires at most every 'bufferSizeMillis' within a window. It is intended for incremental window aggregations using event-time semantics.
Purpose: This trigger exists as an optimization to reduce the number of writes to our online store and better handle contention that arises from having hot keys.
Details:
- The buffer timers are NOT aligned with the UNIX Epoch, they can fire at any timestamp. e.g., if the first event arrives at 14ms, and the buffer size is 100ms, the timer will fire at 114ms.
- Buffer timers are only scheduled when events come in. If there's a gap in events, this trigger won't fire.
Edge cases handled: - If the (event-time) window closes before the last (processing-time) buffer fires, this trigger will fire the remaining buffered elements before closing.
Example: Window size = 300,000 ms (5 minutes) BufferSizeMillis = 100 ms. Assume we are using this trigger on a GroupBy that counts the number unique IDs see. For simplicity, assume event time and processing time are synchronized (although in practice this is never true)
Event 1: ts = 14 ms, ID = A. preAggregate (a Set that keeps track of all unique IDs seen) = [A] this causes a timer to be set for timestamp = 114 ms. Event 2: ts = 38 ms, ID = B. preAggregate = [A, B] Event 3: ts = 77 ms, ID = B. preAggregate = [A, B] Timer set for 114ms fires. we emit the preAggregate [A, B]. Event 4: ts = 400ms, ID = C. preAggregate = [A,B,C] (we don't purge the previous events when the time fires!) this causes a timer to be set for timestamp = 500 ms Timer set for 500ms fires. we emit the preAggregate [A, B, C].
- class FlinkRowAggProcessFunction extends ProcessWindowFunction[TimestampedIR, TimestampedTile, List[Any], TimeWindow]
-
class
FlinkRowAggregationFunction extends AggregateFunction[Map[String, Any], TimestampedIR, TimestampedIR]
Wrapper Flink aggregator around Chronon's RowAggregator.
Wrapper Flink aggregator around Chronon's RowAggregator. Relies on Flink to pass in the correct set of events for the tile. As the aggregates produced by this function are used on the serving side along with other pre-aggregates, we don't 'finalize' the Chronon RowAggregator and instead return the intermediate representation.
(This cannot be a RichAggregateFunction because Flink does not support Rich functions in windows.)
-
case class
TimestampedIR(ir: Array[Any], latestTsMillis: Option[Long]) extends Product with Serializable
TimestampedIR combines the current Intermediate Result with the timestamp of the event being processed.
TimestampedIR combines the current Intermediate Result with the timestamp of the event being processed. We need to keep track of the timestamp of the event processed so we can calculate processing lag down the line.
Example: for a GroupBy with 2 windows, we'd have TimestampedTile( [IR for window 1, IR for window 2], timestamp ).
- ir
the array of partial aggregates
- latestTsMillis
timestamp of the current event being processed
-
case class
TimestampedTile(keys: List[Any], tileBytes: Array[Byte], latestTsMillis: Long) extends Product with Serializable
TimestampedTile combines the entity keys, the encoded Intermediate Result, and the timestamp of the event being processed.
TimestampedTile combines the entity keys, the encoded Intermediate Result, and the timestamp of the event being processed.
We need the timestamp of the event processed so we can calculate processing lag down the line.
- keys
the GroupBy entity keys
- tileBytes
encoded tile IR
- latestTsMillis
timestamp of the current event being processed
Value Members
-
object
KeySelector
A KeySelector is what Flink uses to determine how to partition a DataStream.
A KeySelector is what Flink uses to determine how to partition a DataStream. In a distributed environment, the KeySelector guarantees that events with the same key always end up in the same machine. If invoked multiple times on the same object, the returned key must be the same.