class ParquetToSparkSchemaConverter extends AnyRef
This converter class is used to convert Parquet MessageType to Spark SQL StructType
(via the convert method) as well as ParquetColumn (via the convertParquetColumn
method). The latter contains richer information about the Parquet type, including its
associated repetition & definition level, column path, column descriptor etc.
Parquet format backwards-compatibility rules are respected when converting Parquet MessageType schemas.
- See also
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md
- Alphabetic
- By Inheritance
- ParquetToSparkSchemaConverter
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
- new ParquetToSparkSchemaConverter(conf: Configuration)
- new ParquetToSparkSchemaConverter(conf: SQLConf)
-
new
ParquetToSparkSchemaConverter(assumeBinaryIsString: Boolean = ..., assumeInt96IsTimestamp: Boolean = ..., caseSensitive: Boolean = ..., nanosAsLong: Boolean = ...)
- assumeBinaryIsString
Whether unannotated BINARY fields should be assumed to be Spark SQL StringType fields.
- assumeInt96IsTimestamp
Whether unannotated INT96 fields should be assumed to be Spark SQL TimestampType fields.
- caseSensitive
Whether use case sensitive analysis when comparing Spark catalyst read schema with Parquet schema
- nanosAsLong
Whether timestamps with nanos are converted to long.
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
convert(parquetSchema: MessageType): StructType
Converts Parquet MessageType
parquetSchemato a Spark SQL StructType. -
def
convertField(field: ColumnIO, sparkReadType: Option[DataType] = None): ParquetColumn
Converts a Parquet Type to a ParquetColumn which wraps a Spark SQL DataType with additional information such as the Parquet column's repetition & definition level, column path, column descriptor etc.
-
def
convertParquetColumn(parquetSchema: MessageType, sparkReadSchema: Option[StructType] = None): ParquetColumn
Convert
parquetSchemainto a ParquetColumn which contains its corresponding Spark SQL StructType along with other information such as the maximum repetition and definition level of each node, column descriptor for the leave nodes, etc.Convert
parquetSchemainto a ParquetColumn which contains its corresponding Spark SQL StructType along with other information such as the maximum repetition and definition level of each node, column descriptor for the leave nodes, etc.If
sparkReadSchemais not empty, when deriving Spark SQL type from a Parquet field this will check if the same field also exists in the schema. If so, it will use the Spark SQL type. This is necessary since conversion from Parquet to Spark could cause precision loss. For instance, Spark read schema is smallint/tinyint but Parquet only support int. -
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()