Packages

object SchemaUtils extends DeltaLogging

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. SchemaUtils
  2. DeltaLogging
  3. DatabricksLogging
  4. DeltaProgressReporter
  5. LoggingShims
  6. Logging
  7. AnyRef
  8. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. implicit class LogStringContext extends AnyRef
    Definition Classes
    LoggingShims

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. val DELTA_COL_RESOLVER: (String, String) ⇒ Boolean
  5. def addColumn[T <: DataType](parent: T, column: StructField, position: Seq[Int]): T

    Add a column to its child.

    Add a column to its child.

    parent

    The parent data type.

    column

    The column to add.

    position

    The position to add the column.

  6. def areLogicalNamesEqual(col1: Seq[String], col2: Seq[String]): Boolean
  7. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  8. def canChangeDataType(from: DataType, to: DataType, resolver: Resolver, columnMappingMode: DeltaColumnMappingMode, columnPath: Seq[String] = Nil, failOnAmbiguousChanges: Boolean = false, allowTypeWidening: Boolean = false): Option[String]

    Check if the two data types can be changed.

    Check if the two data types can be changed.

    failOnAmbiguousChanges

    Throw an error if a StructField both has columns dropped and new columns added. These are ambiguous changes, because we don't know if a column needs to be renamed, dropped, or added.

    allowTypeWidening

    Whether widening type changes as defined in TypeWidening can be applied.

    returns

    None if the data types can be changed, otherwise Some(err) containing the reason.

  9. def changeDataType(from: DataType, to: DataType, resolver: Resolver): DataType

    Copy the nested data type between two data types.

  10. def checkFieldNames(names: Seq[String]): Unit

    Verifies that the column names are acceptable by Parquet and henceforth Delta.

    Verifies that the column names are acceptable by Parquet and henceforth Delta. Parquet doesn't accept the characters ' ,;{}()\n\t='. We ensure that neither the data columns nor the partition columns have these characters.

  11. def checkForTimestampNTZColumnsRecursively(schema: StructType): Boolean

    Find TimestampNTZ columns in the table schema.

  12. def checkForVariantTypeColumnsRecursively(schema: StructType): Boolean

    Returns 'true' if any VariantType exists in the table schema.

  13. def checkSchemaFieldNames(schema: StructType, columnMappingMode: DeltaColumnMappingMode): Unit

    Check if the schema contains invalid char in the column names depending on the mode.

  14. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  15. def containsDependentExpression(spark: SparkSession, columnToChange: Seq[String], exprString: String, schema: StructType, resolver: Resolver): Boolean

    Will a column change, e.g., rename, need to be populated to the expression.

    Will a column change, e.g., rename, need to be populated to the expression. This is true when the column to change itself or any of its descendent column is referenced by expression. For example:

    • a, length(a) -> true
    • b, (b.c + 1) -> true, because renaming b1 will need to change the expr to (b1.c + 1).
    • b.c, (cast b as string) -> true, because change b.c to b.c1 affects (b as string) result.
  16. def deltaAssert(check: ⇒ Boolean, name: String, msg: String, deltaLog: DeltaLog = null, data: AnyRef = null, path: Option[Path] = None): Unit

    Helper method to check invariants in Delta code.

    Helper method to check invariants in Delta code. Fails when running in tests, records a delta assertion event and logs a warning otherwise.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  17. def dropColumn[T <: DataType](parent: T, position: Seq[Int]): (T, StructField)

    Drop a column from its child.

    Drop a column from its child.

    parent

    The parent data type.

    position

    The position to drop the column.

  18. def dropNullTypeColumns(schema: StructType): StructType

    Drops null types from the schema if they exist.

    Drops null types from the schema if they exist. We do not recurse into Array and Map types, because we do not expect null types to exist in those columns, as Delta doesn't allow it during writes.

  19. def dropNullTypeColumns(df: DataFrame): DataFrame

    Drops null types from the DataFrame if they exist.

    Drops null types from the DataFrame if they exist. We don't have easy ways of generating types such as MapType and ArrayType, therefore if these types contain NullType in their elements, we will throw an AnalysisException.

  20. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  21. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  22. def fieldNameToColumn(field: String): Column

    converting field name to column type with quoted back-ticks

  23. def fieldToColumn(field: StructField): Column
  24. def filterRecursively(schema: DataType, checkComplexTypes: Boolean)(f: (StructField) ⇒ Boolean): Seq[(Seq[String], StructField)]

    Finds StructFields that match a given check f.

    Finds StructFields that match a given check f. Returns the path to the column, and the field.

    checkComplexTypes

    While StructType is also a complex type, since we're returning StructFields, we definitely recurse into StructTypes. This flag defines whether we should recurse into ArrayType and MapType.

  25. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  26. def findAnyTypeRecursively(dt: DataType)(f: (DataType) ⇒ Boolean): Option[DataType]
  27. def findColumnPosition(column: Seq[String], schema: DataType, resolver: Resolver = DELTA_COL_RESOLVER): Seq[Int]

    Returns the path of the given column in schema as a list of ordinals (0-based), each value representing the position at the current nesting level starting from the root.

    Returns the path of the given column in schema as a list of ordinals (0-based), each value representing the position at the current nesting level starting from the root.

    For ArrayType: accessing the array's element adds a position 0 to the position list. e.g. accessing a.element.y would have the result -> Seq(..., positionOfA, 0, positionOfY)

    For MapType: accessing the map's key adds a position 0 to the position list. e.g. accessing m.key.y would have the result -> Seq(..., positionOfM, 0, positionOfY)

    For MapType: accessing the map's value adds a position 1 to the position list. e.g. accessing m.key.y would have the result -> Seq(..., positionOfM, 1, positionOfY)

    column

    The column to search for in the given struct. If the length of column is greater than 1, we expect to enter a nested field.

    schema

    The current struct we are looking at.

    resolver

    The resolver to find the column.

  28. def findDependentGeneratedColumns(sparkSession: SparkSession, targetColumn: Seq[String], protocol: Protocol, schema: StructType): Map[String, String]

    Find all the generated columns that depend on the given target column.

    Find all the generated columns that depend on the given target column. Returns a map of generated names to their corresponding expression.

  29. def findInvalidColumnNamesInSchema(schema: StructType): Seq[String]

    Finds columns with invalid names, i.e.

    Finds columns with invalid names, i.e. names containing any of the ' ,;{}()\n\t=' characters.

  30. def findNestedFieldIgnoreCase(schema: StructType, fieldNames: Seq[String], includeCollections: Boolean = false): Option[StructField]

    Copied verbatim from Apache Spark.

    Copied verbatim from Apache Spark.

    Returns a field in this struct and its child structs, case insensitively. This is slightly less performant than the case sensitive version.

    If includeCollections is true, this will return fields that are nested in maps and arrays.

    fieldNames

    The path to the field, in order from the root. For example, the column nested.a.b.c would be Seq("nested", "a", "b", "c").

  31. def findNullTypeColumn(schema: StructType): Option[String]

    Returns the name of the first column/field that has null type (void).

  32. def findUndefinedTypes(dt: DataType): Seq[DataType]

    Recursively find all types not defined in Delta protocol but used in dt

  33. def findUnsupportedDataTypes(schema: StructType): Seq[UnsupportedDataTypeInfo]

    Find the unsupported data type in a table schema.

    Find the unsupported data type in a table schema. Return all columns that are using unsupported data types. For example, findUnsupportedDataType(struct<a: struct<b: unsupported_type>>) will return Some(unsupported_type, Some("a.b")).

  34. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  35. def getCommonTags(deltaLog: DeltaLog, tahoeId: String): Map[TagDefinition, String]
    Definition Classes
    DeltaLogging
  36. def getErrorData(e: Throwable): Map[String, Any]
    Definition Classes
    DeltaLogging
  37. def getNestedFieldFromPosition(parent: StructField, position: Seq[Int]): StructField

    Returns the nested field at the given position in parent.

    Returns the nested field at the given position in parent. See findColumnPosition for the representation used for position.

    parent

    The field used for the lookup.

    position

    A list of ordinals (0-based) representing the path to the nested field in parent.

  38. def getNestedTypeFromPosition(schema: DataType, position: Seq[Int]): DataType

    Returns the nested type at the given position in schema.

    Returns the nested type at the given position in schema. See findColumnPosition for the representation used for position.

    position

    A list of ordinals (0-based) representing the path to the nested field in parent.

  39. def getRawSchemaWithoutCharVarcharMetadata(schema: StructType): StructType

    Converts StringType to CHAR/VARCHAR if that is the true type as per the metadata and also strips this metadata from fields.

  40. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  41. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  42. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  43. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  44. def isPartitionCompatible(newPartitionColumns: Seq[String] = Seq.empty, oldPartitionColumns: Seq[String] = Seq.empty): Boolean

    A helper function to check if partition columns are the same.

    A helper function to check if partition columns are the same. This function only checks for partition column names. Please use with other schema check functions for detecting type change etc.

  45. def isReadCompatible(existingSchema: StructType, readSchema: StructType, forbidTightenNullability: Boolean = false, allowMissingColumns: Boolean = false, allowTypeWidening: Boolean = false, newPartitionColumns: Seq[String] = Seq.empty, oldPartitionColumns: Seq[String] = Seq.empty): Boolean

    As the Delta snapshots update, the schema may change as well.

    As the Delta snapshots update, the schema may change as well. This method defines whether the new schema of a Delta table can be used with a previously analyzed LogicalPlan. Our rules are to return false if:

    • Dropping any column that was present in the existing schema, if not allowMissingColumns
    • Any change of datatype, if not allowTypeWidening. Any non-widening change of datatype otherwise.
    • Change of partition columns. Although analyzed LogicalPlan is not changed, physical structure of data is changed and thus is considered not read compatible.
    • If forbidTightenNullability = true:
      • Forbids tightening the nullability (existing nullable=true -> read nullable=false)
      • Typically Used when the existing schema refers to the schema of written data, such as when a Delta streaming source reads a schema change (existingSchema) which has nullable=true, using the latest schema which has nullable=false, so we should not project nulls from the data into the non-nullable read schema.
    • Otherwise:
      • Forbids relaxing the nullability (existing nullable=false -> read nullable=true)
      • Typically Used when the read schema refers to the schema of written data, such as during Delta scan, the latest schema during execution (readSchema) has nullable=true but during analysis phase the schema (existingSchema) was nullable=false, so we should not project nulls from the later data onto a non-nullable schema analyzed in the past.
  46. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  47. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  48. def logConsole(line: String): Unit
    Definition Classes
    DatabricksLogging
  49. def logDebug(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  50. def logDebug(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  51. def logDebug(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  52. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  53. def logError(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  54. def logError(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  55. def logError(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  56. def logError(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  57. def logInfo(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  58. def logInfo(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  59. def logInfo(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  60. def logInfo(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  61. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  62. def logTrace(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  63. def logTrace(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  64. def logTrace(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  65. def logTrace(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  66. def logWarning(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  67. def logWarning(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  68. def logWarning(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  69. def logWarning(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  70. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  71. def normalizeColumnNames(deltaLog: DeltaLog, baseSchema: StructType, data: Dataset[_]): DataFrame

    Rewrite the query field names according to the table schema.

    Rewrite the query field names according to the table schema. This method assumes that all schema validation checks have been made and this is the last operation before writing into Delta.

  72. def normalizeColumnNamesInDataType(deltaLog: DeltaLog, sourceDataType: DataType, tableDataType: DataType, sourceParentFields: Seq[String], tableSchema: StructType): DataType

    Recursively rewrite the query field names according to the table schema within nested data types.

    Recursively rewrite the query field names according to the table schema within nested data types.

    The same assumptions as in normalizeColumnNames are made.

    sourceDataType

    The data type that needs normalizing.

    tableDataType

    The normalization template from the table's schema.

    sourceParentFields

    The path (starting from the top level) to the nested field with sourceDataType.

    tableSchema

    The entire schema of the table.

    returns

    A normalized version of sourceDataType.

  73. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  74. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  75. def prettyFieldName(columnPath: Seq[String]): String

    Pretty print the column path passed in.

  76. def quoteIdentifier(part: String): String
  77. def recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null, path: Option[Path] = None): Unit

    Used to record the occurrence of a single event or report detailed, operation specific statistics.

    Used to record the occurrence of a single event or report detailed, operation specific statistics.

    path

    Used to log the path of the delta table when deltaLog is null.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  78. def recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A

    Used to report the duration as well as the success or failure of an operation on a deltaLog.

    Used to report the duration as well as the success or failure of an operation on a deltaLog.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  79. def recordDeltaOperationForTablePath[A](tablePath: String, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A

    Used to report the duration as well as the success or failure of an operation on a tahoePath.

    Used to report the duration as well as the success or failure of an operation on a tahoePath.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  80. def recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
    Definition Classes
    DatabricksLogging
  81. def recordFrameProfile[T](group: String, name: String)(thunk: ⇒ T): T
    Attributes
    protected
    Definition Classes
    DeltaLogging
  82. def recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = METRIC_OPERATION_DURATION, silent: Boolean = true)(thunk: ⇒ S): S
    Definition Classes
    DatabricksLogging
  83. def recordProductEvent(metric: MetricDefinition with CentralizableMetric, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
    Definition Classes
    DatabricksLogging
  84. def recordProductUsage(metric: MetricDefinition with CentralizableMetric, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
    Definition Classes
    DatabricksLogging
  85. def recordUndefinedTypes(deltaLog: DeltaLog, schema: StructType): Unit

    Record all types not defined in Delta protocol but used in the schema.

  86. def recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
    Definition Classes
    DatabricksLogging
  87. def removeUnenforceableNotNullConstraints(schema: StructType, conf: SQLConf): StructType

    Go through the schema to look for unenforceable NOT NULL constraints.

    Go through the schema to look for unenforceable NOT NULL constraints. By default we'll throw when they're encountered, but if this is suppressed through SQLConf they'll just be silently removed.

    Note that this should only be applied to schemas created from explicit user DDL - in other scenarios, the nullability information may be inaccurate and Delta should always coerce the nullability flag to true.

  88. def reportDifferences(existingSchema: StructType, specifiedSchema: StructType): Seq[String]

    Compare an existing schema to a specified new schema and return a message describing the first difference found, if any:

    Compare an existing schema to a specified new schema and return a message describing the first difference found, if any:

    • different field name or datatype
    • different metadata
  89. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  90. def toString(): String
    Definition Classes
    AnyRef → Any
  91. def transformColumns[E](schema: StructType, input: Seq[(Seq[String], E)])(tf: (Seq[String], StructField, Seq[(Seq[String], E)]) ⇒ StructField): StructType

    Transform (nested) columns in a schema using the given path and parameter pairs.

    Transform (nested) columns in a schema using the given path and parameter pairs. The transform function is only invoked when a field's path matches one of the input paths.

    E

    the type of the payload used for transforming fields.

    schema

    to transform

    input

    paths and parameter pairs. The paths point to fields we want to transform. The parameters will be passed to the transform function for a matching field.

    tf

    function to apply per matched field. This function takes the field path, the field itself and the input names and payload pairs that matched the field name. It should return a new field.

    returns

    the transformed schema.

  92. def transformSchema(schema: StructType, colName: Option[String] = None)(tf: (Seq[String], DataType, Resolver) ⇒ DataType): StructType

    Runs the transform function tf on all nested StructTypes, MapTypes and ArrayTypes in the schema.

    Runs the transform function tf on all nested StructTypes, MapTypes and ArrayTypes in the schema. If colName is defined, the transform function is only applied to all the fields with the given name. There may be multiple matches if nested fields with the same name exist in the schema, it is the responsibility of the caller to check the full field path before transforming a field.

    schema

    to transform.

    colName

    Optional name to match for

    tf

    function to apply on the StructType.

    returns

    the transformed schema.

  93. def typeAsNullable(dt: DataType): DataType

    Turns the data types to nullable in a recursive manner for nested columns.

  94. def typeExistsRecursively(dt: DataType)(f: (DataType) ⇒ Boolean): Boolean

    Copied over from DataType for visibility reasons.

  95. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  96. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  97. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  98. def withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: ⇒ T): T

    Report a log to indicate some command is running.

    Report a log to indicate some command is running.

    Definition Classes
    DeltaProgressReporter

Inherited from DeltaLogging

Inherited from DatabricksLogging

Inherited from DeltaProgressReporter

Inherited from LoggingShims

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped