Class WordCount


  • public class WordCount
    extends java.lang.Object
    An example that counts words in Shakespeare and includes Beam best practices.

    This class, WordCount, is the second in a series of four successively more detailed 'word count' examples. You may first want to take a look at MinimalWordCount. After you've looked at this example, then see the DebuggingWordCount pipeline, for introduction of additional concepts.

    For a detailed walkthrough of this example, see https://beam.apache.org/get-started/wordcount-example/

    Basic concepts, also in the MinimalWordCount example: Reading text files; counting a PCollection; writing to text files

    New Concepts:

       1. Executing a Pipeline both locally and using the selected runner
       2. Using ParDo with static DoFns defined out-of-line
       3. Building a composite transform
       4. Defining your own pipeline options
     

    Concept #1: you can execute this pipeline either locally or using by selecting another runner. These are now command-line options and not hard-coded as they were in the MinimalWordCount example.

    To change the runner, specify:

    
     --runner=YOUR_SELECTED_RUNNER
     

    To execute this pipeline, specify a local output file (if using the DirectRunner) or output prefix on a supported distributed file system.

    
     --output=[YOUR_LOCAL_FILE | YOUR_OUTPUT_PREFIX]
     

    The input file defaults to a public data set containing the text of King Lear, by William Shakespeare. You can override it and choose your own input with --inputFile.

    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  WordCount.CountWords
      A PTransform that converts a PCollection containing lines of text into a PCollection of formatted word counts.
      static class  WordCount.FormatAsTextFn
      A SimpleFunction that converts a Word and Count into a printable string.
      static interface  WordCount.WordCountOptions
      Options supported by WordCount.
    • Constructor Summary

      Constructors 
      Constructor Description
      WordCount()  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static void main​(java.lang.String[] args)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • WordCount

        public WordCount()
    • Method Detail

      • main

        public static void main​(java.lang.String[] args)