public class TriggerExample
extends java.lang.Object
windows to be processed, and demonstrates using various kinds of
triggers to control when the results for
each window are emitted.
This example uses a portion of real traffic data from San Diego freeways. It contains readings from sensor stations set up along each freeway. Each sensor reading includes a calculation of the 'total flow' across all lanes in that freeway direction.
Concepts:
1. The default triggering behavior 2. Late data with the default trigger 3. How to get speculative estimates 4. Combining late data and speculative estimates
Before running this example, it will be useful to familiarize yourself with Beam triggers and understand the concept of 'late data', See: https://beam.apache.org/documentation/programming-guide/#triggers
The example is configured to use the default BigQuery table from the example common package
(there are no defaults for a general Beam pipeline). You can override them by using the --bigQueryDataset, and --bigQueryTable options. If the BigQuery table do not exist, the
example will try to create them.
The pipeline outputs its results to a BigQuery table. Here are some queries you can use to see
interesting results: Replace <enter_table_name> in the query below with the name of the
BigQuery table. Replace <enter_window_interval> in the query below with the window
interval.
To see the results of the default trigger, Note: When you start up your pipeline, you'll
initially see results from 'late' data. Wait after the window duration, until the first pane of
non-late data has been emitted, to see more interesting results. SELECT * FROM
enter_table_name WHERE trigger_type = "default" ORDER BY window DESC
To see the late data i.e. dropped by the default trigger, SELECT * FROM
<enter_table_name> WHERE trigger_type = "withAllowedLateness" and (timing = "LATE" or timing =
"ON_TIME") and freeway = "5" ORDER BY window DESC, processing_time
To see the difference between accumulation mode and discarding mode, SELECT * FROM
<enter_table_name> WHERE (timing = "LATE" or timing = "ON_TIME") AND (trigger_type =
"withAllowedLateness" or trigger_type = "sequential") and freeway = "5" ORDER BY window DESC,
processing_time
To see speculative results every minute, SELECT * FROM <enter_table_name> WHERE
trigger_type = "speculative" and freeway = "5" ORDER BY window DESC, processing_time
To see speculative results every five minutes after the end of the window SELECT * FROM
<enter_table_name> WHERE trigger_type = "sequential" and timing != "EARLY" and freeway = "5"
ORDER BY window DESC, processing_time
To see the first and the last pane for a freeway in a window for all the trigger types, SELECT * FROM <enter_table_name> WHERE (isFirst = true or isLast = true) ORDER BY window
To reduce the number of results for each query we can add additional where clauses. For
examples, To see the results of the default trigger, SELECT * FROM <enter_table_name>
WHERE trigger_type = "default" AND freeway = "5" AND window = "<enter_window_interval>"
The example will try to cancel the pipelines on the signal to terminate the process (CTRL-C) and then exits.
| Modifier and Type | Class and Description |
|---|---|
static class |
TriggerExample.InsertDelays
Add current time to each record.
|
static interface |
TriggerExample.TrafficFlowOptions
Inherits standard configuration options.
|
| Modifier and Type | Field and Description |
|---|---|
static org.joda.time.Duration |
FIVE_MINUTES |
static org.joda.time.Duration |
ONE_DAY |
static org.joda.time.Duration |
ONE_MINUTE |
static int |
WINDOW_DURATION |
| Constructor and Description |
|---|
TriggerExample() |
| Modifier and Type | Method and Description |
|---|---|
static void |
main(java.lang.String[] args) |
public static final int WINDOW_DURATION
public static final org.joda.time.Duration ONE_MINUTE
public static final org.joda.time.Duration FIVE_MINUTES
public static final org.joda.time.Duration ONE_DAY