public class HoodieCombineHiveInputFormat<K extends org.apache.hadoop.io.WritableComparable,V extends org.apache.hadoop.io.Writable>
extends org.apache.hadoop.hive.ql.io.HiveInputFormat<K,V>
CombineHiveInputFormat is a parameterized InputFormat which looks at the path name and determine the correct InputFormat for that path name from mapredPlan.pathToPartitionInfo(). It can be used to read files with different input format in the same map-reduce job. NOTE : This class is implemented to work with Hive 2.x +
| Modifier and Type | Class and Description |
|---|---|
static interface |
HoodieCombineHiveInputFormat.AvoidSplitCombination
This is a marker interface that is used to identify the formats where combine split generation is not applicable.
|
static class |
HoodieCombineHiveInputFormat.CombineHiveInputSplit
CombineHiveInputSplit encapsulates an InputSplit with its corresponding inputFormatClassName.
|
static class |
HoodieCombineHiveInputFormat.HoodieCombineFileInputFormatShim<K,V>
**MOD** This is the implementation of CombineFileInputFormat which is a copy of
org.apache.hadoop.hive.shims.HadoopShimsSecure.CombineFileInputFormatShim with changes in listStatus.
|
| Modifier and Type | Field and Description |
|---|---|
static org.apache.log4j.Logger |
LOG |
| Constructor and Description |
|---|
HoodieCombineHiveInputFormat() |
| Modifier and Type | Method and Description |
|---|---|
protected HoodieCombineHiveInputFormat.HoodieCombineFileInputFormatShim |
createInputFormatShim() |
Set<Integer> |
getNonCombinablePathIndices(org.apache.hadoop.mapred.JobConf job,
org.apache.hadoop.fs.Path[] paths,
int numThreads)
Gets all the path indices that should not be combined.
|
protected String |
getParquetInputFormatClassName() |
protected String |
getParquetRealtimeInputFormatClassName() |
org.apache.hadoop.mapred.RecordReader |
getRecordReader(org.apache.hadoop.mapred.InputSplit split,
org.apache.hadoop.mapred.JobConf job,
org.apache.hadoop.mapred.Reporter reporter)
Create a generic Hive RecordReader than can iterate over all chunks in a CombinedFileSplit.
|
org.apache.hadoop.mapred.InputSplit[] |
getSplits(org.apache.hadoop.mapred.JobConf job,
int numSplits)
Create Hive splits based on CombineFileSplit.
|
protected String getParquetInputFormatClassName()
protected String getParquetRealtimeInputFormatClassName()
protected HoodieCombineHiveInputFormat.HoodieCombineFileInputFormatShim createInputFormatShim()
public Set<Integer> getNonCombinablePathIndices(org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.fs.Path[] paths, int numThreads) throws ExecutionException, InterruptedException
public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf job,
int numSplits)
throws IOException
getSplits in interface org.apache.hadoop.mapred.InputFormat<K extends org.apache.hadoop.io.WritableComparable,V extends org.apache.hadoop.io.Writable>getSplits in class org.apache.hadoop.hive.ql.io.HiveInputFormat<K extends org.apache.hadoop.io.WritableComparable,V extends org.apache.hadoop.io.Writable>IOExceptionpublic org.apache.hadoop.mapred.RecordReader getRecordReader(org.apache.hadoop.mapred.InputSplit split,
org.apache.hadoop.mapred.JobConf job,
org.apache.hadoop.mapred.Reporter reporter)
throws IOException
getRecordReader in interface org.apache.hadoop.mapred.InputFormat<K extends org.apache.hadoop.io.WritableComparable,V extends org.apache.hadoop.io.Writable>getRecordReader in class org.apache.hadoop.hive.ql.io.HiveInputFormat<K extends org.apache.hadoop.io.WritableComparable,V extends org.apache.hadoop.io.Writable>IOExceptionCopyright © 2022 The Apache Software Foundation. All rights reserved.