public class ClueCollectionPathConstants extends Object
| Modifier and Type | Method and Description |
|---|---|
static void |
addEnglishCollectionPart(org.apache.hadoop.mapred.JobConf conf,
String base,
int i)
Adds a part (segment) of the Clue Web English collection to a Hadoop
JobConf object. |
static void |
addEnglishCompleteCollection(org.apache.hadoop.mapred.JobConf conf,
String base)
Adds the complete Clue Web English collection to a Hadoop
JobConf object. |
static void |
addEnglishSmallCollection(org.apache.hadoop.mapred.JobConf conf,
String base)
Adds the first part (segment) of the Clue Web English collection to a
Hadoop
JobConf object. |
static void |
addEnglishTestFile(org.apache.hadoop.mapred.JobConf conf,
String base)
Adds a sample compressed WARC archive to a Hadoop
JobConf
object. |
static void |
addEnglishTinyCollection(org.apache.hadoop.mapred.JobConf conf,
String base)
Adds the first section of the Clue Web English collection to a Hadoop
JobConf object. |
public static void addEnglishTestFile(org.apache.hadoop.mapred.JobConf conf,
String base)
JobConf
object. The specific archive is
ClueWeb09_English_1/en0000/00.warc.gz, which contains
35,582 Web pages.conf - Hadoop JobConfbase - base path for the Clue Web collectionpublic static void addEnglishTinyCollection(org.apache.hadoop.mapred.JobConf conf,
String base)
JobConf object. Specifically, this method adds the
contents of ClueWeb09_English_1/en0000/, which contains
3,382,356 pages.conf - Hadoop JobConfbase - base path for the Clue Web collectionpublic static void addEnglishSmallCollection(org.apache.hadoop.mapred.JobConf conf,
String base)
JobConf object. Specifically, this method adds the
contents of ClueWeb09_English_1/, which contains
50,220,423 pages.conf - Hadoop JobConfbase - base path for the Clue Web collectionpublic static void addEnglishCompleteCollection(org.apache.hadoop.mapred.JobConf conf,
String base)
JobConf object. Specifically, this method adds the
contents of ClueWeb09_English_1/ through
ClueWeb09_English_10/, which contains 503,903,810 pages.conf - Hadoop JobConfbase - base path for the Clue Web collectionpublic static void addEnglishCollectionPart(org.apache.hadoop.mapred.JobConf conf,
String base,
int i)
JobConf object. Part 1 corresponds to the contents of
ClueWeb09_English_1/ (i.e., the "small" collection), all
the way through part 10. Note that adding all ten parts is equivalent to
adding the complete English collection.conf - Hadoop JobConfbase - base path for the Clue Web collectionCopyright © 2015. All rights reserved.