object SchemaUtils extends DeltaLogging
- Alphabetic
- By Inheritance
- SchemaUtils
- DeltaLogging
- DatabricksLogging
- DeltaProgressReporter
- LoggingShims
- Logging
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
-
implicit
class
LogStringContext extends AnyRef
- Definition Classes
- LoggingShims
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- val DELTA_COL_RESOLVER: (String, String) ⇒ Boolean
-
def
addColumn[T <: DataType](parent: T, column: StructField, position: Seq[Int]): T
Add a column to its child.
Add a column to its child.
- parent
The parent data type.
- column
The column to add.
- position
The position to add the column.
- def areLogicalNamesEqual(col1: Seq[String], col2: Seq[String]): Boolean
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
canChangeDataType(from: DataType, to: DataType, resolver: Resolver, columnMappingMode: DeltaColumnMappingMode, columnPath: Seq[String] = Nil, failOnAmbiguousChanges: Boolean = false, allowTypeWidening: Boolean = false): Option[String]
Check if the two data types can be changed.
Check if the two data types can be changed.
- failOnAmbiguousChanges
Throw an error if a StructField both has columns dropped and new columns added. These are ambiguous changes, because we don't know if a column needs to be renamed, dropped, or added.
- allowTypeWidening
Whether widening type changes as defined in TypeWidening can be applied.
- returns
None if the data types can be changed, otherwise Some(err) containing the reason.
-
def
changeDataType(from: DataType, to: DataType, resolver: Resolver): DataType
Copy the nested data type between two data types.
-
def
checkFieldNames(names: Seq[String]): Unit
Verifies that the column names are acceptable by Parquet and henceforth Delta.
Verifies that the column names are acceptable by Parquet and henceforth Delta. Parquet doesn't accept the characters ' ,;{}()\n\t='. We ensure that neither the data columns nor the partition columns have these characters.
-
def
checkForTimestampNTZColumnsRecursively(schema: StructType): Boolean
Find TimestampNTZ columns in the table schema.
-
def
checkForVariantTypeColumnsRecursively(schema: StructType): Boolean
Returns 'true' if any VariantType exists in the table schema.
-
def
checkSchemaFieldNames(schema: StructType, columnMappingMode: DeltaColumnMappingMode): Unit
Check if the schema contains invalid char in the column names depending on the mode.
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
containsDependentExpression(spark: SparkSession, columnToChange: Seq[String], exprString: String, schema: StructType, resolver: Resolver): Boolean
Will a column change, e.g., rename, need to be populated to the expression.
Will a column change, e.g., rename, need to be populated to the expression. This is true when the column to change itself or any of its descendent column is referenced by expression. For example:
- a, length(a) -> true
- b, (b.c + 1) -> true, because renaming b1 will need to change the expr to (b1.c + 1).
- b.c, (cast b as string) -> true, because change b.c to b.c1 affects (b as string) result.
-
def
deltaAssert(check: ⇒ Boolean, name: String, msg: String, deltaLog: DeltaLog = null, data: AnyRef = null, path: Option[Path] = None): Unit
Helper method to check invariants in Delta code.
Helper method to check invariants in Delta code. Fails when running in tests, records a delta assertion event and logs a warning otherwise.
- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
dropColumn[T <: DataType](parent: T, position: Seq[Int]): (T, StructField)
Drop a column from its child.
Drop a column from its child.
- parent
The parent data type.
- position
The position to drop the column.
-
def
dropNullTypeColumns(schema: StructType): StructType
Drops null types from the schema if they exist.
Drops null types from the schema if they exist. We do not recurse into Array and Map types, because we do not expect null types to exist in those columns, as Delta doesn't allow it during writes.
-
def
dropNullTypeColumns(df: DataFrame): DataFrame
Drops null types from the DataFrame if they exist.
Drops null types from the DataFrame if they exist. We don't have easy ways of generating types such as MapType and ArrayType, therefore if these types contain NullType in their elements, we will throw an AnalysisException.
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
fieldNameToColumn(field: String): Column
converting field name to column type with quoted back-ticks
- def fieldToColumn(field: StructField): Column
-
def
filterRecursively(schema: DataType, checkComplexTypes: Boolean)(f: (StructField) ⇒ Boolean): Seq[(Seq[String], StructField)]
Finds
StructFields that match a given checkf.Finds
StructFields that match a given checkf. Returns the path to the column, and the field.- checkComplexTypes
While
StructTypeis also a complex type, since we're returning StructFields, we definitely recurse into StructTypes. This flag defines whether we should recurse into ArrayType and MapType.
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
- def findAnyTypeRecursively(dt: DataType)(f: (DataType) ⇒ Boolean): Option[DataType]
-
def
findColumnPosition(column: Seq[String], schema: DataType, resolver: Resolver = DELTA_COL_RESOLVER): Seq[Int]
Returns the path of the given column in
schemaas a list of ordinals (0-based), each value representing the position at the current nesting level starting from the root.Returns the path of the given column in
schemaas a list of ordinals (0-based), each value representing the position at the current nesting level starting from the root.For ArrayType: accessing the array's element adds a position 0 to the position list. e.g. accessing a.element.y would have the result -> Seq(..., positionOfA, 0, positionOfY)
For MapType: accessing the map's key adds a position 0 to the position list. e.g. accessing m.key.y would have the result -> Seq(..., positionOfM, 0, positionOfY)
For MapType: accessing the map's value adds a position 1 to the position list. e.g. accessing m.key.y would have the result -> Seq(..., positionOfM, 1, positionOfY)
- column
The column to search for in the given struct. If the length of
columnis greater than 1, we expect to enter a nested field.- schema
The current struct we are looking at.
- resolver
The resolver to find the column.
-
def
findDependentGeneratedColumns(sparkSession: SparkSession, targetColumn: Seq[String], protocol: Protocol, schema: StructType): Map[String, String]
Find all the generated columns that depend on the given target column.
Find all the generated columns that depend on the given target column. Returns a map of generated names to their corresponding expression.
-
def
findInvalidColumnNamesInSchema(schema: StructType): Seq[String]
Finds columns with invalid names, i.e.
Finds columns with invalid names, i.e. names containing any of the ' ,;{}()\n\t=' characters.
-
def
findNestedFieldIgnoreCase(schema: StructType, fieldNames: Seq[String], includeCollections: Boolean = false): Option[StructField]
Copied verbatim from Apache Spark.
Copied verbatim from Apache Spark.
Returns a field in this struct and its child structs, case insensitively. This is slightly less performant than the case sensitive version.
If includeCollections is true, this will return fields that are nested in maps and arrays.
- fieldNames
The path to the field, in order from the root. For example, the column nested.a.b.c would be Seq("nested", "a", "b", "c").
-
def
findNullTypeColumn(schema: StructType): Option[String]
Returns the name of the first column/field that has null type (void).
-
def
findUndefinedTypes(dt: DataType): Seq[DataType]
Recursively find all types not defined in Delta protocol but used in
dt -
def
findUnsupportedDataTypes(schema: StructType): Seq[UnsupportedDataTypeInfo]
Find the unsupported data type in a table schema.
Find the unsupported data type in a table schema. Return all columns that are using unsupported data types. For example,
findUnsupportedDataType(struct<a: struct<b: unsupported_type>>)will returnSome(unsupported_type, Some("a.b")). -
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getCommonTags(deltaLog: DeltaLog, tahoeId: String): Map[TagDefinition, String]
- Definition Classes
- DeltaLogging
-
def
getErrorData(e: Throwable): Map[String, Any]
- Definition Classes
- DeltaLogging
-
def
getNestedFieldFromPosition(parent: StructField, position: Seq[Int]): StructField
Returns the nested field at the given position in
parent.Returns the nested field at the given position in
parent. See findColumnPosition for the representation used forposition.- parent
The field used for the lookup.
- position
A list of ordinals (0-based) representing the path to the nested field in
parent.
-
def
getNestedTypeFromPosition(schema: DataType, position: Seq[Int]): DataType
Returns the nested type at the given position in
schema.Returns the nested type at the given position in
schema. See findColumnPosition for the representation used forposition.- position
A list of ordinals (0-based) representing the path to the nested field in
parent.
-
def
getRawSchemaWithoutCharVarcharMetadata(schema: StructType): StructType
Converts StringType to CHAR/VARCHAR if that is the true type as per the metadata and also strips this metadata from fields.
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
isPartitionCompatible(newPartitionColumns: Seq[String] = Seq.empty, oldPartitionColumns: Seq[String] = Seq.empty): Boolean
A helper function to check if partition columns are the same.
A helper function to check if partition columns are the same. This function only checks for partition column names. Please use with other schema check functions for detecting type change etc.
-
def
isReadCompatible(existingSchema: StructType, readSchema: StructType, forbidTightenNullability: Boolean = false, allowMissingColumns: Boolean = false, allowTypeWidening: Boolean = false, newPartitionColumns: Seq[String] = Seq.empty, oldPartitionColumns: Seq[String] = Seq.empty): Boolean
As the Delta snapshots update, the schema may change as well.
As the Delta snapshots update, the schema may change as well. This method defines whether the new schema of a Delta table can be used with a previously analyzed LogicalPlan. Our rules are to return false if:
- Dropping any column that was present in the existing schema, if not allowMissingColumns
- Any change of datatype, if not allowTypeWidening. Any non-widening change of datatype otherwise.
- Change of partition columns. Although analyzed LogicalPlan is not changed, physical structure of data is changed and thus is considered not read compatible.
- If
forbidTightenNullability= true:- Forbids tightening the nullability (existing nullable=true -> read nullable=false)
- Typically Used when the existing schema refers to the schema of written data, such as when a Delta streaming source reads a schema change (existingSchema) which has nullable=true, using the latest schema which has nullable=false, so we should not project nulls from the data into the non-nullable read schema.
- Otherwise:
- Forbids relaxing the nullability (existing nullable=false -> read nullable=true)
- Typically Used when the read schema refers to the schema of written data, such as during Delta scan, the latest schema during execution (readSchema) has nullable=true but during analysis phase the schema (existingSchema) was nullable=false, so we should not project nulls from the later data onto a non-nullable schema analyzed in the past.
-
def
isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
log: Logger
- Attributes
- protected
- Definition Classes
- Logging
-
def
logConsole(line: String): Unit
- Definition Classes
- DatabricksLogging
-
def
logDebug(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logDebug(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logDebug(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logError(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logError(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logInfo(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logInfo(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logName: String
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logTrace(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logTrace(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logWarning(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logWarning(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
normalizeColumnNames(deltaLog: DeltaLog, baseSchema: StructType, data: Dataset[_]): DataFrame
Rewrite the query field names according to the table schema.
Rewrite the query field names according to the table schema. This method assumes that all schema validation checks have been made and this is the last operation before writing into Delta.
-
def
normalizeColumnNamesInDataType(deltaLog: DeltaLog, sourceDataType: DataType, tableDataType: DataType, sourceParentFields: Seq[String], tableSchema: StructType): DataType
Recursively rewrite the query field names according to the table schema within nested data types.
Recursively rewrite the query field names according to the table schema within nested data types.
The same assumptions as in normalizeColumnNames are made.
- sourceDataType
The data type that needs normalizing.
- tableDataType
The normalization template from the table's schema.
- sourceParentFields
The path (starting from the top level) to the nested field with
sourceDataType.- tableSchema
The entire schema of the table.
- returns
A normalized version of
sourceDataType.
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
prettyFieldName(columnPath: Seq[String]): String
Pretty print the column path passed in.
- def quoteIdentifier(part: String): String
-
def
recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null, path: Option[Path] = None): Unit
Used to record the occurrence of a single event or report detailed, operation specific statistics.
Used to record the occurrence of a single event or report detailed, operation specific statistics.
- path
Used to log the path of the delta table when
deltaLogis null.
- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A
Used to report the duration as well as the success or failure of an operation on a
deltaLog.Used to report the duration as well as the success or failure of an operation on a
deltaLog.- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordDeltaOperationForTablePath[A](tablePath: String, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A
Used to report the duration as well as the success or failure of an operation on a
tahoePath.Used to report the duration as well as the success or failure of an operation on a
tahoePath.- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
-
def
recordFrameProfile[T](group: String, name: String)(thunk: ⇒ T): T
- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = METRIC_OPERATION_DURATION, silent: Boolean = true)(thunk: ⇒ S): S
- Definition Classes
- DatabricksLogging
-
def
recordProductEvent(metric: MetricDefinition with CentralizableMetric, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
-
def
recordProductUsage(metric: MetricDefinition with CentralizableMetric, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
-
def
recordUndefinedTypes(deltaLog: DeltaLog, schema: StructType): Unit
Record all types not defined in Delta protocol but used in the
schema. -
def
recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
-
def
removeUnenforceableNotNullConstraints(schema: StructType, conf: SQLConf): StructType
Go through the schema to look for unenforceable NOT NULL constraints.
Go through the schema to look for unenforceable NOT NULL constraints. By default we'll throw when they're encountered, but if this is suppressed through SQLConf they'll just be silently removed.
Note that this should only be applied to schemas created from explicit user DDL - in other scenarios, the nullability information may be inaccurate and Delta should always coerce the nullability flag to true.
-
def
reportDifferences(existingSchema: StructType, specifiedSchema: StructType): Seq[String]
Compare an existing schema to a specified new schema and return a message describing the first difference found, if any:
Compare an existing schema to a specified new schema and return a message describing the first difference found, if any:
- different field name or datatype
- different metadata
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
transformColumns[E](schema: StructType, input: Seq[(Seq[String], E)])(tf: (Seq[String], StructField, Seq[(Seq[String], E)]) ⇒ StructField): StructType
Transform (nested) columns in a schema using the given path and parameter pairs.
Transform (nested) columns in a schema using the given path and parameter pairs. The transform function is only invoked when a field's path matches one of the input paths.
- E
the type of the payload used for transforming fields.
- schema
to transform
- input
paths and parameter pairs. The paths point to fields we want to transform. The parameters will be passed to the transform function for a matching field.
- tf
function to apply per matched field. This function takes the field path, the field itself and the input names and payload pairs that matched the field name. It should return a new field.
- returns
the transformed schema.
-
def
transformSchema(schema: StructType, colName: Option[String] = None)(tf: (Seq[String], DataType, Resolver) ⇒ DataType): StructType
Runs the transform function
tfon all nested StructTypes, MapTypes and ArrayTypes in the schema.Runs the transform function
tfon all nested StructTypes, MapTypes and ArrayTypes in the schema. IfcolNameis defined, the transform function is only applied to all the fields with the given name. There may be multiple matches if nested fields with the same name exist in the schema, it is the responsibility of the caller to check the full field path before transforming a field.- schema
to transform.
- colName
Optional name to match for
- tf
function to apply on the StructType.
- returns
the transformed schema.
-
def
typeAsNullable(dt: DataType): DataType
Turns the data types to nullable in a recursive manner for nested columns.
-
def
typeExistsRecursively(dt: DataType)(f: (DataType) ⇒ Boolean): Boolean
Copied over from DataType for visibility reasons.
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: ⇒ T): T
Report a log to indicate some command is running.
Report a log to indicate some command is running.
- Definition Classes
- DeltaProgressReporter