object SchemaMergingUtils
Utils to merge table schema with data schema. This is split from SchemaUtils, because finalSchema is introduced into DeltaMergeInto, and resolving the final schema is now part of ResolveDeltaMergeInto.resolveReferencesAndSchema.
- Alphabetic
- By Inheritance
- SchemaMergingUtils
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- val DELTA_COL_RESOLVER: (String, String) ⇒ Boolean
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
checkColumnNameDuplication(schema: StructType, colType: String, caseSensitive: Boolean = false): Unit
Checks if input column names have duplicate identifiers.
Checks if input column names have duplicate identifiers. This throws an exception if the duplication exists.
- schema
the schema to check for duplicates
- colType
column type name, used in an exception message
- caseSensitive
Whether we should exception if two columns have casing conflicts. This should default to false for Delta.
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
equalsIgnoreCaseAndCompatibleNullability(from: DataType, to: DataType): Boolean
Taken from DataType
Taken from DataType
Compares two types, ignoring compatible nullability of ArrayType, MapType, StructType, and ignoring case sensitivity of field names in StructType.
Compatible nullability is defined as follows:
- If
fromandtoare ArrayTypes,fromhas a compatible nullability withtoif and only ifto.containsNullis true, or both offrom.containsNullandto.containsNullare false. - If
fromandtoare MapTypes,fromhas a compatible nullability withtoif and only ifto.valueContainsNullis true, or both offrom.valueContainsNullandto.valueContainsNullare false. - If
fromandtoare StructTypes,fromhas a compatible nullability withtoif and only if for all every pair of fields,to.nullableis true, or both offromField.nullableandtoField.nullableare false.
- If
-
def
explode(schema: StructType): Seq[(Seq[String], StructField)]
Returns pairs of (full column name path, field) in this schema as a list.
Returns pairs of (full column name path, field) in this schema as a list. For example, a schema like: <field a> | - a <field 1> | | - 1 <field 2> | | - 2 <field b> | - b <field c> | - c <field
foo.bar> | | -foo.bar<field 3> | | - 3 will return [ ([a], <field a>), ([a, 1], <field 1>), ([a, 2], <field 2>), ([b], <field b>), ([c], <field c>), ([c, foo.bar], <field foo.bar>), ([c, foo.bar, 3], <field 3>) ] -
def
explodeNestedFieldNames(schema: StructType): Seq[String]
Returns all column names in this schema as a flat list.
Returns all column names in this schema as a flat list. For example, a schema like: | - a | | - 1 | | - 2 | - b | - c | | - nest | | - 3 will get flattened to: "a", "a.1", "a.2", "b", "c", "c.nest", "c.nest.3"
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
mergeDataTypes(current: DataType, update: DataType, allowImplicitConversions: Boolean, keepExistingType: Boolean, allowTypeWidening: Boolean, caseSensitive: Boolean, allowOverride: Boolean): DataType
- current
The current data type.
- update
The data type of the new data being written.
- allowImplicitConversions
Whether to allow Spark SQL implicit conversions. By default, we merge according to Parquet write compatibility - for example, an integer type data field will throw when merged to a string type table field, because int and string aren't stored the same way in Parquet files. With this flag enabled, the merge will succeed, because once we get to write time Spark SQL will support implicitly converting the int to a string.
- keepExistingType
Whether to keep existing types instead of trying to merge types.
- caseSensitive
Whether we should keep field mapping case-sensitively. This should default to false for Delta, which is case insensitive.
- allowOverride
Whether to let incoming type override the existing type if unmatched.
-
def
mergeSchemas(tableSchema: StructType, dataSchema: StructType, allowImplicitConversions: Boolean = false, keepExistingType: Boolean = false, allowTypeWidening: Boolean = false, caseSensitive: Boolean = false): StructType
A variant of mergeDataTypes with common default values and enforce struct type as inputs for Delta table operation.
A variant of mergeDataTypes with common default values and enforce struct type as inputs for Delta table operation.
Check whether we can write to the Delta table, which has
tableSchema, using a query that hasdataSchema. Our rules are that:dataSchemamay be missing columns or have additional columns- We don't trust the nullability in
dataSchema. Assume fields are nullable. - We only allow nested StructType expansions. For all other complex types, we check for strict equality
dataSchemacan't have duplicate column names. Columns that only differ by case are also not allowed. The following merging strategy is applied:
- The name of the current field is used.
- The data types are merged by calling this function.
- We respect the current field's nullability.
- The metadata is current field's metadata.
Schema merging occurs in a case insensitive manner. Hence, column names that only differ by case are not accepted in the
dataSchema. -
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
pruneEmptyStructs(dataType: DataType): Option[DataType]
Prune all nested empty structs from the schema.
Prune all nested empty structs from the schema. Return None if top level struct is also empty.
- dataType
the data type to prune.
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
- def toFieldMap(fields: Seq[StructField], caseSensitive: Boolean = false): Map[String, StructField]
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
transformColumns(schema: StructType, other: StructType)(tf: (Seq[String], StructField, Option[StructField], Resolver) ⇒ StructField): StructType
Transform (nested) columns in
schemaby walking downschemaandothersimultaneously.Transform (nested) columns in
schemaby walking downschemaandothersimultaneously. This allows comparing the two schemas and transformingschemabased on the comparison. Columns or fields present only inotherare ignored whileNoneis passed to the transform function for columns or fields missing inother.- schema
Schema to transform.
- other
Schema to compare with.
- tf
Function to apply. The function arguments are the full path of the current field to transform, the current field in
schemaand, if present, the corresponding field inother.
-
def
transformColumns[T <: DataType](schema: T)(tf: (Seq[String], StructField, Resolver) ⇒ StructField): T
Transform (nested) columns in a schema.
Transform (nested) columns in a schema.
- schema
to transform.
- tf
function to apply.
- returns
the transformed schema.
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()