- Type Parameters:
K - K for the sequence file.
V - The proto message type stored in the sequence file. Just to keep java compiler happy.
- All Implemented Interfaces:
- org.apache.hadoop.mapred.InputFormat<K,ProtoMessageWritable<V>>
public class ProtobufMessageInputFormat<K,V extends com.google.protobuf.MessageLite>
extends org.apache.hadoop.mapred.SequenceFileInputFormat<K,ProtoMessageWritable<V>>
InputFormat to support reading ProtoWritable stored in a sequence file. You cannot use the
sequence file directly since the createValue method uses default constructor. But ProtoWritable
has a package protected constructor which takes a parser.
By reading the proto class name from job conf which is copied from table properties by Hive this
class manages to give a generic implementation where only can set the proto.class in table
properties and load the file.
It is also enhanced to ignore EOF exception while opening a file, so as to ignore 0 bytes files
in the table. Maybe we should allow this to be configured.