class ParquetToSparkSchemaConverter extends AnyRef
This converter class is used to convert Parquet MessageType to Spark SQL StructType
(via the convert method) as well as ParquetColumn (via the convertParquetColumn
method). The latter contains richer information about the Parquet type, including its
associated repetition & definition level, column path, column descriptor etc.
Parquet format backwards-compatibility rules are respected when converting Parquet MessageType schemas.
- See also
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md
- Alphabetic
- By Inheritance
- ParquetToSparkSchemaConverter
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
- new ParquetToSparkSchemaConverter(conf: Configuration)
- new ParquetToSparkSchemaConverter(conf: SQLConf)
- new ParquetToSparkSchemaConverter(assumeBinaryIsString: Boolean = SQLConf.PARQUET_BINARY_AS_STRING.defaultValue.get, assumeInt96IsTimestamp: Boolean = SQLConf.PARQUET_INT96_AS_TIMESTAMP.defaultValue.get, caseSensitive: Boolean = SQLConf.CASE_SENSITIVE.defaultValue.get, nanosAsLong: Boolean = SQLConf.LEGACY_PARQUET_NANOS_AS_LONG.defaultValue.get)
- assumeBinaryIsString
Whether unannotated BINARY fields should be assumed to be Spark SQL StringType fields.
- assumeInt96IsTimestamp
Whether unannotated INT96 fields should be assumed to be Spark SQL TimestampType fields.
- caseSensitive
Whether use case sensitive analysis when comparing Spark catalyst read schema with Parquet schema
- nanosAsLong
Whether timestamps with nanos are converted to long.
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- def convert(parquetSchema: MessageType): StructType
Converts Parquet MessageType
parquetSchemato a Spark SQL StructType. - def convertField(field: ColumnIO, sparkReadType: Option[DataType] = None): ParquetColumn
Converts a Parquet Type to a ParquetColumn which wraps a Spark SQL DataType with additional information such as the Parquet column's repetition & definition level, column path, column descriptor etc.
- def convertParquetColumn(parquetSchema: MessageType, sparkReadSchema: Option[StructType] = None): ParquetColumn
Convert
parquetSchemainto a ParquetColumn which contains its corresponding Spark SQL StructType along with other information such as the maximum repetition and definition level of each node, column descriptor for the leave nodes, etc.Convert
parquetSchemainto a ParquetColumn which contains its corresponding Spark SQL StructType along with other information such as the maximum repetition and definition level of each node, column descriptor for the leave nodes, etc.If
sparkReadSchemais not empty, when deriving Spark SQL type from a Parquet field this will check if the same field also exists in the schema. If so, it will use the Spark SQL type. This is necessary since conversion from Parquet to Spark could cause precision loss. For instance, Spark read schema is smallint/tinyint but Parquet only support int. - final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()