package orc
- Alphabetic
- Public
- All
Type Members
-
class
OrcArrayColumnVector extends OrcColumnVector
A column vector implementation for Spark's
ArrayType. -
class
OrcAtomicColumnVector extends OrcColumnVector
A column vector implementation for Spark's AtomicType.
-
class
OrcColumnStatistics extends AnyRef
Columns statistics interface wrapping ORC
ColumnStatisticss.Columns statistics interface wrapping ORC
ColumnStatisticss.Because ORC
ColumnStatisticss are stored as an flatten array in ORC file footer, this class is used to convert ORCColumnStatisticss from array to nested tree structure, according to data types. The flatten array stores all data types (including nested types) in tree pre-ordering. This is used for aggregate push down in ORC.For nested data types (array, map and struct), the sub-field statistics are stored recursively inside parent column's children field. Here is an example of
OrcColumnStatistics:Data schema: c1: int c2: struct<f1: int, f2: float> c3: map<key: int, value: string> c4: array<int>
OrcColumnStatistics | (children) --------------------------------------------- / | \ \ c1 c2 c3 c4 (integer) (struct) (map) (array) (min:1, | (children) | (children) | (children) max:10) ----- ----- element / \ / \ (integer) c2.f1 c2.f2 key value (integer) (float) (integer) (string) (min:0.1, (min:"a", max:100.5) max:"zzz")
-
abstract
class
OrcColumnVector extends ColumnVector
A column vector interface wrapping Hive's
ColumnVector.A column vector interface wrapping Hive's
ColumnVector.Because Spark
ColumnarBatchonly accepts Spark's vectorized.ColumnVector, this column vector is used to adapt Hive ColumnVector with Spark ColumnarVector. -
class
OrcColumnarBatchReader extends RecordReader[Void, ColumnarBatch]
To support vectorization in WholeStageCodeGen, this reader returns ColumnarBatch.
To support vectorization in WholeStageCodeGen, this reader returns ColumnarBatch. After creating,
initializeandinitBatchshould be called sequentially. -
class
OrcDeserializer extends AnyRef
A deserializer to deserialize ORC structs to Spark rows.
-
class
OrcFileFormat extends FileFormat with DataSourceRegister with Serializable
New ORC File Format based on Apache ORC.
-
trait
OrcFiltersBase extends AnyRef
Methods that can be shared when upgrading the built-in Hive.
-
class
OrcFooterReader extends AnyRef
OrcFooterReaderis a util class which encapsulates the helper methods of reading ORC file footer. -
class
OrcMapColumnVector extends OrcColumnVector
A column vector implementation for Spark's
MapType. -
class
OrcOptions extends FileSourceOptions
Options for the ORC data source.
-
class
OrcSerializer extends AnyRef
A serializer to serialize Spark rows to ORC structs.
-
class
OrcStructColumnVector extends OrcColumnVector
A column vector implementation for Spark's
StructType.
Value Members
- object OrcOptions extends DataSourceOptions with Serializable
- object OrcUtils extends Logging