Packages

o

org.apache.spark.sql.delta.schema

SchemaMergingUtils

object SchemaMergingUtils

Utils to merge table schema with data schema. This is split from SchemaUtils, because finalSchema is introduced into DeltaMergeInto, and resolving the final schema is now part of ResolveDeltaMergeInto.resolveReferencesAndSchema.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. SchemaMergingUtils
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. val DELTA_COL_RESOLVER: (String, String) ⇒ Boolean
  5. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  6. def checkColumnNameDuplication(schema: StructType, colType: String, caseSensitive: Boolean = false): Unit

    Checks if input column names have duplicate identifiers.

    Checks if input column names have duplicate identifiers. This throws an exception if the duplication exists.

    schema

    the schema to check for duplicates

    colType

    column type name, used in an exception message

    caseSensitive

    Whether we should exception if two columns have casing conflicts. This should default to false for Delta.

  7. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  8. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  9. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  10. def equalsIgnoreCaseAndCompatibleNullability(from: DataType, to: DataType): Boolean

    Taken from DataType

    Taken from DataType

    Compares two types, ignoring compatible nullability of ArrayType, MapType, StructType, and ignoring case sensitivity of field names in StructType.

    Compatible nullability is defined as follows:

    • If from and to are ArrayTypes, from has a compatible nullability with to if and only if to.containsNull is true, or both of from.containsNull and to.containsNull are false.
    • If from and to are MapTypes, from has a compatible nullability with to if and only if to.valueContainsNull is true, or both of from.valueContainsNull and to.valueContainsNull are false.
    • If from and to are StructTypes, from has a compatible nullability with to if and only if for all every pair of fields, to.nullable is true, or both of fromField.nullable and toField.nullable are false.
  11. def explode(schema: StructType): Seq[(Seq[String], StructField)]

    Returns pairs of (full column name path, field) in this schema as a list.

    Returns pairs of (full column name path, field) in this schema as a list. For example, a schema like: <field a> | - a <field 1> | | - 1 <field 2> | | - 2 <field b> | - b <field c> | - c <field foo.bar> | | - foo.bar <field 3> | | - 3 will return [ ([a], <field a>), ([a, 1], <field 1>), ([a, 2], <field 2>), ([b], <field b>), ([c], <field c>), ([c, foo.bar], <field foo.bar>), ([c, foo.bar, 3], <field 3>) ]

  12. def explodeNestedFieldNames(schema: StructType): Seq[String]

    Returns all column names in this schema as a flat list.

    Returns all column names in this schema as a flat list. For example, a schema like: | - a | | - 1 | | - 2 | - b | - c | | - nest | | - 3 will get flattened to: "a", "a.1", "a.2", "b", "c", "c.nest", "c.nest.3"

  13. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  14. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  15. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  16. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  17. def mergeDataTypes(current: DataType, update: DataType, allowImplicitConversions: Boolean, keepExistingType: Boolean, allowTypeWidening: Boolean, caseSensitive: Boolean, allowOverride: Boolean): DataType

    current

    The current data type.

    update

    The data type of the new data being written.

    allowImplicitConversions

    Whether to allow Spark SQL implicit conversions. By default, we merge according to Parquet write compatibility - for example, an integer type data field will throw when merged to a string type table field, because int and string aren't stored the same way in Parquet files. With this flag enabled, the merge will succeed, because once we get to write time Spark SQL will support implicitly converting the int to a string.

    keepExistingType

    Whether to keep existing types instead of trying to merge types.

    caseSensitive

    Whether we should keep field mapping case-sensitively. This should default to false for Delta, which is case insensitive.

    allowOverride

    Whether to let incoming type override the existing type if unmatched.

  18. def mergeSchemas(tableSchema: StructType, dataSchema: StructType, allowImplicitConversions: Boolean = false, keepExistingType: Boolean = false, allowTypeWidening: Boolean = false, caseSensitive: Boolean = false): StructType

    A variant of mergeDataTypes with common default values and enforce struct type as inputs for Delta table operation.

    A variant of mergeDataTypes with common default values and enforce struct type as inputs for Delta table operation.

    Check whether we can write to the Delta table, which has tableSchema, using a query that has dataSchema. Our rules are that:

    • dataSchema may be missing columns or have additional columns
    • We don't trust the nullability in dataSchema. Assume fields are nullable.
    • We only allow nested StructType expansions. For all other complex types, we check for strict equality
    • dataSchema can't have duplicate column names. Columns that only differ by case are also not allowed. The following merging strategy is applied:
    • The name of the current field is used.
    • The data types are merged by calling this function.
    • We respect the current field's nullability.
    • The metadata is current field's metadata.

    Schema merging occurs in a case insensitive manner. Hence, column names that only differ by case are not accepted in the dataSchema.

  19. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  20. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  21. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  22. def pruneEmptyStructs(dataType: DataType): Option[DataType]

    Prune all nested empty structs from the schema.

    Prune all nested empty structs from the schema. Return None if top level struct is also empty.

    dataType

    the data type to prune.

  23. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  24. def toFieldMap(fields: Seq[StructField], caseSensitive: Boolean = false): Map[String, StructField]
  25. def toString(): String
    Definition Classes
    AnyRef → Any
  26. def transformColumns(schema: StructType, other: StructType)(tf: (Seq[String], StructField, Option[StructField], Resolver) ⇒ StructField): StructType

    Transform (nested) columns in schema by walking down schema and other simultaneously.

    Transform (nested) columns in schema by walking down schema and other simultaneously. This allows comparing the two schemas and transforming schema based on the comparison. Columns or fields present only in other are ignored while None is passed to the transform function for columns or fields missing in other.

    schema

    Schema to transform.

    other

    Schema to compare with.

    tf

    Function to apply. The function arguments are the full path of the current field to transform, the current field in schema and, if present, the corresponding field in other.

  27. def transformColumns[T <: DataType](schema: T)(tf: (Seq[String], StructField, Resolver) ⇒ StructField): T

    Transform (nested) columns in a schema.

    Transform (nested) columns in a schema.

    schema

    to transform.

    tf

    function to apply.

    returns

    the transformed schema.

  28. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  29. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  30. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()

Inherited from AnyRef

Inherited from Any

Ungrouped