public final class CollocDriver extends AbstractJob
| Modifier and Type | Field and Description |
|---|---|
static boolean |
DEFAULT_EMIT_UNIGRAMS |
static String |
EMIT_UNIGRAMS |
static String |
NGRAM_OUTPUT_DIRECTORY |
static String |
SUBGRAM_OUTPUT_DIRECTORY |
argMap, inputFile, inputPath, outputFile, outputPath, tempPath| Constructor and Description |
|---|
CollocDriver() |
| Modifier and Type | Method and Description |
|---|---|
static void |
generateAllGrams(org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
org.apache.hadoop.conf.Configuration baseConf,
int maxNGramSize,
int minSupport,
float minLLRValue,
int reduceTasks)
Generate all ngrams for the
DictionaryVectorizer job |
static void |
main(String[] args) |
int |
run(String[] args) |
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, buildOption, buildOption, getAnalyzerClassFromOption, getCLIOption, getConf, getDimensions, getFloat, getFloat, getGroup, getInputFile, getInputPath, getInt, getInt, getOption, getOption, getOption, getOptions, getOutputFile, getOutputPath, getOutputPath, getTempPath, getTempPath, hasOption, keyFor, maybePut, parseArguments, parseArguments, parseDirectories, prepareJob, prepareJob, prepareJob, prepareJob, setConf, setS3SafeCombinedInputPath, shouldRunNextPhasepublic static final String SUBGRAM_OUTPUT_DIRECTORY
public static final String NGRAM_OUTPUT_DIRECTORY
public static final String EMIT_UNIGRAMS
public static final boolean DEFAULT_EMIT_UNIGRAMS
public static void generateAllGrams(org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
org.apache.hadoop.conf.Configuration baseConf,
int maxNGramSize,
int minSupport,
float minLLRValue,
int reduceTasks)
throws IOException,
InterruptedException,
ClassNotFoundException
DictionaryVectorizer jobinput - input path containing tokenized documentsoutput - output path where ngrams are generated including unigramsbaseConf - job configurationmaxNGramSize - minValue = 2.minSupport - minimum support to prune ngrams including unigramsminLLRValue - minimum threshold to prune ngramsreduceTasks - number of reducers usedIOExceptionInterruptedExceptionClassNotFoundExceptionCopyright © 2008–2017 The Apache Software Foundation. All rights reserved.