| Package | Description |
|---|---|
| edu.umd.cloud9.collection.clue |
Provides classes for working with
the ClueWeb09
collection.
|
| edu.umd.cloud9.webgraph |
| Modifier and Type | Method and Description |
|---|---|
ClueWarcRecord |
ClueWarcInputFormat.ClueWarcRecordReader.createValue() |
ClueWarcRecord |
ClueWarcForwardIndex.getDocument(int docno) |
ClueWarcRecord |
ClueWarcForwardIndex.getDocument(String docid) |
static ClueWarcRecord |
ClueWarcRecord.readNextWarcRecord(DataInputStream in)
Reads in a WARC record from a data input stream
|
| Modifier and Type | Method and Description |
|---|---|
org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.LongWritable,ClueWarcRecord> |
ClueWarcInputFormat.getRecordReader(org.apache.hadoop.mapred.InputSplit split,
org.apache.hadoop.mapred.JobConf conf,
org.apache.hadoop.mapred.Reporter reporter)
Just return the record reader
|
| Modifier and Type | Method and Description |
|---|---|
boolean |
ClueWarcInputFormat.ClueWarcRecordReader.next(org.apache.hadoop.io.LongWritable key,
ClueWarcRecord value) |
void |
ClueWarcRecord.set(ClueWarcRecord o)
Sets the record content (copy)
|
| Constructor and Description |
|---|
ClueWarcRecord(ClueWarcRecord o)
Copy Constructor
|
| Modifier and Type | Method and Description |
|---|---|
void |
ClueExtractLinks.Map.map(org.apache.hadoop.io.IntWritable key,
ClueWarcRecord doc,
org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,tl.lin.data.array.ArrayListWritable<AnchorText>> output,
org.apache.hadoop.mapred.Reporter reporter) |
Copyright © 2015. All rights reserved.