package hive
- Alphabetic
- Public
- All
Type Members
-
class
CaseInsensitiveMap[T] extends Map[String, T] with Serializable
Builds a map in which keys are case insensitive.
Builds a map in which keys are case insensitive. Input map can be accessed for cases where case-sensitive information is required. The primary constructor is marked private to avoid nested case-insensitive map creation, otherwise the keys in the original map will become case-insensitive in this scenario.
-
class
DeltaInputFormat extends FileInputFormat[NullWritable, ArrayWritable]
A special InputFormat to wrap ParquetInputFormat to read a Delta table.
A special InputFormat to wrap ParquetInputFormat to read a Delta table.
The underlying files in a Delta table are in Parquet format. However, we cannot use the existing ParquetInputFormat to read them directly because they only store data for data columns. The values of partition columns are in Delta's metadata. Hence, we need to read them from Delta's metadata and re-assemble rows to include partition values and data values from the raw Parquet files.
Note: We cannot use the file name to infer partition values because Delta Transaction Log Protocol requires "Actual partition values for a file must be read from the transaction log".
In the current implementation, when listing files, we also read the partition values and put them into an
Array[PartitionColumnInfo]. Then create a tempMapto store the mapping from the file path toPartitionColumnInfos. When creating an InputSplit, we will create a special FileSplit called DeltaInputSplit to carry overPartitionColumnInfos.For each reader created from a DeltaInputSplit, we can get all partition column types, the locations of a partition column in the schema, and their string values. The reader can build org.apache.hadoop.io.Writable for all partition values, and insert them to the raw row returned by org.apache.parquet.hadoop.ParquetRecordReader.
-
class
DeltaInputSplit extends FileSplit
A special
FileSplitthat holds the corresponding partition information of the file.A special
FileSplitthat holds the corresponding partition information of the file.This file is written in Java because we need to call two different constructors of
FileSplitbut Scala doesn't support it. -
class
DeltaOutputFormat extends OutputFormat[NullWritable, ArrayWritable]
This class is not a real implementation.
This class is not a real implementation. We use it to prevent from writing to a Delta table in Hive before we support it.
-
class
DeltaRecordReaderWrapper extends ParquetRecordReaderWrapper
A record reader that reads data from the underlying Parquet reader and inserts partition values which don't exist in the Parquet files.
A record reader that reads data from the underlying Parquet reader and inserts partition values which don't exist in the Parquet files.
As we have verified the Hive schema in metastore is consistent with the Delta schema, the row returned by the underlying Parquet reader will match the Delta schema except that it leaves all partition columns as
nullsince they are not in the raw parquet files. Hence, for the missing partition values, we need to use the partition information in DeltaInputSplit to create the corresponding Writables, and insert them into the corresponding positions when reading a row. - class DeltaStorageHandler extends DefaultStorageHandler with HiveMetaHook with HiveStoragePredicateHandler
- class HiveInputFormat extends org.apache.hadoop.hive.ql.io.HiveInputFormat[Nothing, Nothing]
-
class
IndexPredicateAnalyzer extends AnyRef
Copy from Hive org.apache.hadoop.hive.ql.index.IndexPredicateAnalyzer IndexPredicateAnalyzer decomposes predicates, separating the parts which can be satisfied by an index from the parts which cannot.
Copy from Hive org.apache.hadoop.hive.ql.index.IndexPredicateAnalyzer IndexPredicateAnalyzer decomposes predicates, separating the parts which can be satisfied by an index from the parts which cannot. Currently, it only supports pure conjunctions over binary expressions comparing a column reference with a constant value. It is assumed that all column aliases encountered refer to the same table.
-
case class
PartitionColumnInfo(index: Int, tpe: String, value: String) extends Writable with Product with Serializable
- index
the index of a partition column in the schema.
- tpe
the Hive type of a partition column.
- value
the string value of a partition column. The actual partition value should be parsed according to its type.
Value Members
- object CaseInsensitiveMap extends Serializable
- object DeltaHelper
- object DeltaStorageHandler
- object SchemaUtils