public class WordCount
extends java.lang.Object
This class, WordCount, is the second in a series of four successively more detailed
'word count' examples. You may first want to take a look at MinimalWordCount. After
you've looked at this example, then see the DebuggingWordCount pipeline, for introduction
of additional concepts.
For a detailed walkthrough of this example, see https://beam.apache.org/get-started/wordcount-example/
Basic concepts, also in the MinimalWordCount example: Reading text files; counting a PCollection; writing to text files
New Concepts:
1. Executing a Pipeline both locally and using the selected runner 2. Using ParDo with static DoFns defined out-of-line 3. Building a composite transform 4. Defining your own pipeline options
Concept #1: you can execute this pipeline either locally or using by selecting another runner. These are now command-line options and not hard-coded as they were in the MinimalWordCount example.
To change the runner, specify:
--runner=YOUR_SELECTED_RUNNER
To execute this pipeline, specify a local output file (if using the DirectRunner) or
output prefix on a supported distributed file system.
--output=[YOUR_LOCAL_FILE | YOUR_OUTPUT_PREFIX]
The input file defaults to a public data set containing the text of King Lear, by William
Shakespeare. You can override it and choose your own input with --inputFile.
| Modifier and Type | Class and Description |
|---|---|
static class |
WordCount.CountWords
A PTransform that converts a PCollection containing lines of text into a PCollection of
formatted word counts.
|
static class |
WordCount.FormatAsTextFn
A SimpleFunction that converts a Word and Count into a printable string.
|
static interface |
WordCount.WordCountOptions
Options supported by
WordCount. |
| Constructor and Description |
|---|
WordCount() |
| Modifier and Type | Method and Description |
|---|---|
static void |
main(java.lang.String[] args) |