Class TfIdf
- java.lang.Object
-
- org.apache.beam.examples.complete.TfIdf
-
public class TfIdf extends java.lang.ObjectAn example that computes a basic TF-IDF search table for a directory or GCS prefix.Concepts: joining data; side inputs; logging
To execute this pipeline locally, specify a local output file or output prefix on GCS:
--output=[YOUR_LOCAL_FILE | gs://YOUR_OUTPUT_PREFIX]To change the runner, specify:
See examples/java/README.md for instructions about how to configure different runners.--runner=YOUR_SELECTED_RUNNERThe default input is
gs://apache-beam-samples/shakespeare/and can be overridden with--input.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classTfIdf.ComputeTfIdfA transform containing a basic TF-IDF pipeline.static interfaceTfIdf.OptionsOptions supported byTfIdf.static classTfIdf.ReadDocumentsReads the documents at the provided uris and returns all lines from the documents tagged with which document they are from.static classTfIdf.WriteTfIdfAPTransformto write, in CSV format, a mapping from term and URI to score.
-
Constructor Summary
Constructors Constructor Description TfIdf()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static java.util.Set<java.net.URI>listInputDocuments(TfIdf.Options options)Lists documents contained beneath theoptions.inputprefix/directory.static voidmain(java.lang.String[] args)
-
-
-
Method Detail
-
listInputDocuments
public static java.util.Set<java.net.URI> listInputDocuments(TfIdf.Options options) throws java.net.URISyntaxException, java.io.IOException
Lists documents contained beneath theoptions.inputprefix/directory.- Throws:
java.net.URISyntaxExceptionjava.io.IOException
-
main
public static void main(java.lang.String[] args) throws java.lang.Exception- Throws:
java.lang.Exception
-
-