Packages

object SchemaUtils

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. SchemaUtils
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. val DELTA_COL_RESOLVER: (String, String) ⇒ Boolean
  5. def addColumn(schema: StructType, column: StructField, position: Seq[Int]): StructType

    Add column to the specified position in schema.

    Add column to the specified position in schema.

    position

    A Seq of ordinals on where this column should go. It is a Seq to denote positions in nested columns (0-based). For example: tableSchema: <a:STRUCT<a1,a2,a3>, b,c:STRUCT<c1,c3>> column: c2 position: Seq(2, 1) will return result: <a:STRUCT<a1,a2,a3>, b,c:STRUCT<c1,**c2**,c3>>

  6. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  7. def canChangeDataType(from: DataType, to: DataType, resolver: Resolver, columnPath: Seq[String] = Seq.empty): Option[String]

    Check if the two data types can be changed.

    Check if the two data types can be changed.

    returns

    None if the data types can be changed, otherwise Some(err) containing the reason.

  8. def changeDataType(from: DataType, to: DataType, resolver: Resolver): DataType

    Copy the nested data type between two data types.

  9. def checkColumnNameDuplication(schema: StructType, colType: String): Unit

    Checks if input column names have duplicate identifiers.

    Checks if input column names have duplicate identifiers. This throws an exception if the duplication exists.

    schema

    the schema to check for duplicates

    colType

    column type name, used in an exception message

  10. def checkFieldNames(names: Seq[String]): Unit

    Verifies that the column names are acceptable by Parquet and henceforth Delta.

    Verifies that the column names are acceptable by Parquet and henceforth Delta. Parquet doesn't accept the characters ' ,;{}()\n\t'. We ensure that neither the data columns nor the partition columns have these characters.

  11. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  12. def dropColumn(schema: StructType, position: Seq[Int]): (StructType, StructField)

    Drop from the specified position in schema and return with the original column.

    Drop from the specified position in schema and return with the original column.

    position

    A Seq of ordinals on where this column should go. It is a Seq to denote positions in nested columns (0-based). For example: tableSchema: <a:STRUCT<a1,a2,a3>, b,c:STRUCT<c1,c2,c3>> position: Seq(2, 1) will return result: <a:STRUCT<a1,a2,a3>, b,c:STRUCT<c1,c3>>

  13. def dropNullTypeColumns(df: DataFrame): DataFrame

    Drops null types from the DataFrame if they exist.

    Drops null types from the DataFrame if they exist. We don't have easy ways of generating types such as MapType and ArrayType, therefore if these types contain NullType in their elements, we will throw an AnalysisException.

  14. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  15. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  16. def explodeNestedFieldNames(schema: StructType): Seq[String]

    Returns all column names in this schema as a flat list.

    Returns all column names in this schema as a flat list. For example, a schema like: | - a | | - 1 | | - 2 | - b | - c | | - nest | | - 3 will get flattened to: "a", "a.1", "a.2", "b", "c", "c.nest", "c.nest.3"

  17. def filterRecursively(schema: StructType, checkComplexTypes: Boolean)(f: (StructField) ⇒ Boolean): Seq[(Seq[String], StructField)]

    Finds StructFields that match a given check f.

    Finds StructFields that match a given check f. Returns the path to the column, and the field.

    checkComplexTypes

    While StructType is also a complex type, since we're returning StructFields, we definitely recurse into StructTypes. This flag defines whether we should recurse into ArrayType and MapType.

  18. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  19. def findColumnPosition(column: Seq[String], schema: StructType, resolver: Resolver = DELTA_COL_RESOLVER): (Seq[Int], Int)

    Returns the given column's ordinal within the given schema and the size of the last schema size.

    Returns the given column's ordinal within the given schema and the size of the last schema size. The length of the returned position will be as long as how nested the column is.

    For ArrayType: accessing the array's element adds a position 0 to the position list. e.g. accessing a.element.y would have the result -> Seq(..., positionOfA, 0, positionOfY)

    For MapType: accessing the map's key adds a position 0 to the position list. e.g. accessing m.key.y would have the result -> Seq(..., positionOfM, 0, positionOfY)

    For MapType: accessing the map's value adds a position 1 to the position list. e.g. accessing m.key.y would have the result -> Seq(..., positionOfM, 1, positionOfY)

    column

    The column to search for in the given struct. If the length of column is greater than 1, we expect to enter a nested field.

    schema

    The current struct we are looking at.

    resolver

    The resolver to find the column.

  20. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  21. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  22. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  23. def isReadCompatible(existingSchema: StructType, readSchema: StructType): Boolean

    As the Delta snapshots update, the schema may change as well.

    As the Delta snapshots update, the schema may change as well. This method defines whether the new schema of a Delta table can be used with a previously analyzed LogicalPlan. Our rules are to return false if:

    • Dropping any column that was present in the DataFrame schema
    • Converting nullable=false to nullable=true for any column
    • Any change of datatype
  24. def mergeSchemas(tableSchema: StructType, dataSchema: StructType): StructType

    Check whether we can write to the Delta table, which has tableSchema, using a query that has dataSchema.

    Check whether we can write to the Delta table, which has tableSchema, using a query that has dataSchema. Our rules are that:

    • dataSchema may be missing columns or have additional columns
    • We don't trust the nullability in dataSchema. Assume fields are nullable.
    • We only allow nested StructType expansions. For all other complex types, we check for strict equality
    • dataSchema can't have duplicate column names. Columns that only differ by case are also not allowed. The following merging strategy is applied:
    • The name of the current field is used.
    • The data types are merged by calling this function.
    • We respect the current field's nullability.
    • The metadata is current field's metadata.

    Schema merging occurs in a case insensitive manner. Hence, column names that only differ by case are not accepted in the dataSchema.

  25. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  26. def normalizeColumnNames(baseSchema: StructType, data: Dataset[_]): DataFrame

    Rewrite the query field names according to the table schema.

    Rewrite the query field names according to the table schema. This method assumes that all schema validation checks have been made and this is the last operation before writing into Delta.

  27. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  28. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  29. def prettyFieldName(columnPath: Seq[String]): String

    Pretty print the column path passed in.

  30. def reportDifferences(existingSchema: StructType, specifiedSchema: StructType): Seq[String]

    Compare an existing schema to a specified new schema and return a message describing the first difference found, if any:

    Compare an existing schema to a specified new schema and return a message describing the first difference found, if any:

    • different field name or datatype
    • different metadata
  31. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  32. def toString(): String
    Definition Classes
    AnyRef → Any
  33. def transformColumns[E](schema: StructType, input: Seq[(Seq[String], E)])(tf: (Seq[String], StructField, Seq[(Seq[String], E)]) ⇒ StructField): StructType

    Transform (nested) columns in a schema using the given path and parameter pairs.

    Transform (nested) columns in a schema using the given path and parameter pairs. The transform function is only invoked when a field's path matches one of the input paths.

    E

    the type of the payload used for transforming fields.

    schema

    to transform

    input

    paths and parameter pairs. The paths point to fields we want to transform. The parameters will be passed to the transform function for a matching field.

    tf

    function to apply per matched field. This function takes the field path, the field itself and the input names and payload pairs that matched the field name. It should return a new field.

    returns

    the transformed schema.

  34. def transformColumns(schema: StructType)(tf: (Seq[String], StructField, Resolver) ⇒ StructField): StructType

    Transform (nested) columns in a schema.

    Transform (nested) columns in a schema.

    schema

    to transform.

    tf

    function to apply.

    returns

    the transformed schema.

  35. def transformColumnsStructs(schema: StructType, colName: String)(tf: (Seq[String], StructType, Resolver) ⇒ Seq[StructField]): StructType

    Transform (nested) columns in a schema.

    Transform (nested) columns in a schema. Runs the transform function on all nested StructTypes

    schema

    to transform.

    tf

    function to apply on the StructType.

    returns

    the transformed schema.

  36. def typeAsNullable(dt: DataType): DataType

    Turns the data types to nullable in a recursive manner for nested columns.

  37. def typeExistsRecursively(dt: DataType)(f: (DataType) ⇒ Boolean): Boolean

    Copied over from DataType for visibility reasons.

  38. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  39. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  40. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()

Inherited from AnyRef

Inherited from Any

Ungrouped