public class TextDocumentInputFormat extends org.apache.hadoop.mapred.FileInputFormat<org.apache.hadoop.io.LongWritable,TextDocument> implements org.apache.hadoop.mapred.JobConfigurable
InputFormat for processing a simple collection. Each
document of the collection consists of a single line of text: the docid,
followed by a tab, followed by the document contents. Note that the document
content cannot contain embedded tabs or newlines.| Modifier and Type | Class and Description |
|---|---|
static class |
TextDocumentInputFormat.TextDocumentLineRecordReader |
| Constructor and Description |
|---|
TextDocumentInputFormat() |
| Modifier and Type | Method and Description |
|---|---|
void |
configure(org.apache.hadoop.mapred.JobConf conf) |
org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.LongWritable,TextDocument> |
getRecordReader(org.apache.hadoop.mapred.InputSplit genericSplit,
org.apache.hadoop.mapred.JobConf job,
org.apache.hadoop.mapred.Reporter reporter) |
protected boolean |
isSplitable(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path file) |
public void configure(org.apache.hadoop.mapred.JobConf conf)
configure in interface org.apache.hadoop.mapred.JobConfigurableprotected boolean isSplitable(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path file)
isSplitable in class org.apache.hadoop.mapred.FileInputFormat<org.apache.hadoop.io.LongWritable,TextDocument>public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.LongWritable,TextDocument> getRecordReader(org.apache.hadoop.mapred.InputSplit genericSplit, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter) throws IOException
getRecordReader in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.LongWritable,TextDocument>getRecordReader in class org.apache.hadoop.mapred.FileInputFormat<org.apache.hadoop.io.LongWritable,TextDocument>IOExceptionCopyright © 2015. All rights reserved.