Class WindowedWordCount
- java.lang.Object
-
- org.apache.beam.examples.WindowedWordCount
-
public class WindowedWordCount extends java.lang.ObjectAn example that counts words in text, and can run over either unbounded or bounded input collections.This class,
WindowedWordCount, is the last in a series of four successively more detailed 'word count' examples. First take a look atMinimalWordCount,WordCount, andDebuggingWordCount.Basic concepts, also in the MinimalWordCount, WordCount, and DebuggingWordCount examples: Reading text files; counting a PCollection; writing to GCS; executing a Pipeline both locally and using a selected runner; defining DoFns; user-defined PTransforms; defining PipelineOptions.
New Concepts:
1. Unbounded and bounded pipeline input modes 2. Adding timestamps to data 3. Windowing 4. Re-using PTransforms over windowed PCollections 5. Accessing the window of an element 6. Writing data to per-window text files
By default, the examples will run with the
DirectRunner. To change the runner, specify:
See examples/java/README.md for instructions about how to configure different runners.--runner=YOUR_SELECTED_RUNNERTo execute this pipeline locally, specify a local output file (if using the
DirectRunner) or output prefix on a supported distributed file system.--output=[YOUR_LOCAL_FILE | YOUR_OUTPUT_PREFIX]The input file defaults to a public data set containing the text of King Lear, by William Shakespeare. You can override it and choose your own input with
--inputFile.By default, the pipeline will do fixed windowing, on 10-minute windows. You can change this interval by setting the
--windowSizeparameter, e.g.--windowSize=15for 15-minute windows.The example will try to cancel the pipeline on the signal to terminate the process (CTRL-C).
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classWindowedWordCount.DefaultToCurrentSystemTimeADefaultValueFactorythat returns the current system time.static classWindowedWordCount.DefaultToMinTimestampPlusOneHourADefaultValueFactorythat returns the minimum timestamp plus one hour.static interfaceWindowedWordCount.OptionsOptions forWindowedWordCount.
-
Constructor Summary
Constructors Constructor Description WindowedWordCount()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static voidmain(java.lang.String[] args)
-