public class HourlyTeamScore extends UserScore
UserScore. In addition to the concepts introduced in UserScore, new
concepts include: windowing and element timestamps; use of Filter.by().
This pipeline processes data collected from gaming events in batch, building on UserScore but using fixed windows. It calculates the sum of scores per team, for each window,
optionally allowing specification of two timestamps before and after which data is filtered out.
This allows a model where late data collected after the intended analysis window can be included,
and any late-arriving data prior to the beginning of the analysis window can be removed as well.
By using windowing and adding element timestamps, we can do finer-grained analysis than with the
UserScore pipeline. However, our batch processing is high-latency, in that we don't get
results from plays at the beginning of the batch's time period until the batch is processed.
To execute this pipeline, specify the pipeline configuration like this:
--tempLocation=YOUR_TEMP_DIRECTORY
--runner=YOUR_RUNNER
--output=YOUR_OUTPUT_DIRECTORY
(possibly options specific to your runner or permissions for your temp/output locations)
Optionally include --input to specify the batch input file path. To indicate a time
after which the data should be filtered out, include the --stopMin arg. E.g., --stopMin=2015-10-18-23-59 indicates that any data timestamped after 23:59 PST on 2015-10-18
should not be included in the analysis. To indicate a time before which data should be filtered
out, include the --startMin arg. If you're using the default input specified in UserScore, "gs://apache-beam-samples/game/gaming_data*.csv", then --startMin=2015-11-16-16-10 --stopMin=2015-11-17-16-10 are good values.
| Modifier and Type | Class and Description |
|---|---|
static interface |
HourlyTeamScore.Options
Options supported by
HourlyTeamScore. |
UserScore.ExtractAndSumScore| Constructor and Description |
|---|
HourlyTeamScore() |
| Modifier and Type | Method and Description |
|---|---|
protected static java.util.Map<java.lang.String,WriteToText.FieldFn<org.apache.beam.sdk.values.KV<java.lang.String,java.lang.Integer>>> |
configureOutput()
Create a map of information that describes how to write pipeline output to text.
|
static void |
main(java.lang.String[] args)
Run a batch pipeline to do windowed analysis of the data.
|
protected static java.util.Map<java.lang.String,WriteToText.FieldFn<org.apache.beam.sdk.values.KV<java.lang.String,java.lang.Integer>>> configureOutput()
WriteToText constructor to write team score sums and includes information
about window start time.public static void main(java.lang.String[] args)
throws java.lang.Exception
java.lang.Exception