Class TfIdf


  • public class TfIdf
    extends java.lang.Object
    An example that computes a basic TF-IDF search table for a directory or GCS prefix.

    Concepts: joining data; side inputs; logging

    To execute this pipeline locally, specify a local output file or output prefix on GCS:

    
     --output=[YOUR_LOCAL_FILE | gs://YOUR_OUTPUT_PREFIX]
     

    To change the runner, specify:

    
     --runner=YOUR_SELECTED_RUNNER
     
    See examples/java/README.md for instructions about how to configure different runners.

    The default input is gs://apache-beam-samples/shakespeare/ and can be overridden with --input.

    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  TfIdf.ComputeTfIdf
      A transform containing a basic TF-IDF pipeline.
      static interface  TfIdf.Options
      Options supported by TfIdf.
      static class  TfIdf.ReadDocuments
      Reads the documents at the provided uris and returns all lines from the documents tagged with which document they are from.
      static class  TfIdf.WriteTfIdf
      A PTransform to write, in CSV format, a mapping from term and URI to score.
    • Constructor Summary

      Constructors 
      Constructor Description
      TfIdf()  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static java.util.Set<java.net.URI> listInputDocuments​(TfIdf.Options options)
      Lists documents contained beneath the options.input prefix/directory.
      static void main​(java.lang.String[] args)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • TfIdf

        public TfIdf()
    • Method Detail

      • listInputDocuments

        public static java.util.Set<java.net.URI> listInputDocuments​(TfIdf.Options options)
                                                              throws java.net.URISyntaxException,
                                                                     java.io.IOException
        Lists documents contained beneath the options.input prefix/directory.
        Throws:
        java.net.URISyntaxException
        java.io.IOException
      • main

        public static void main​(java.lang.String[] args)
                         throws java.lang.Exception
        Throws:
        java.lang.Exception