object SchemaUtils
- Alphabetic
- By Inheritance
- SchemaUtils
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- val DELTA_COL_RESOLVER: (String, String) ⇒ Boolean
-
def
addColumn(schema: StructType, column: StructField, position: Seq[Int]): StructType
Add
columnto the specifiedpositioninschema.Add
columnto the specifiedpositioninschema.- position
A Seq of ordinals on where this column should go. It is a Seq to denote positions in nested columns (0-based). For example: tableSchema: <a:STRUCT<a1,a2,a3>, b,c:STRUCT<c1,c3>> column: c2 position: Seq(2, 1) will return result: <a:STRUCT<a1,a2,a3>, b,c:STRUCT<c1,**c2**,c3>>
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
canChangeDataType(from: DataType, to: DataType, resolver: Resolver, columnPath: Seq[String] = Seq.empty): Option[String]
Check if the two data types can be changed.
Check if the two data types can be changed.
- returns
None if the data types can be changed, otherwise Some(err) containing the reason.
-
def
changeDataType(from: DataType, to: DataType, resolver: Resolver): DataType
Copy the nested data type between two data types.
-
def
checkColumnNameDuplication(schema: StructType, colType: String): Unit
Checks if input column names have duplicate identifiers.
Checks if input column names have duplicate identifiers. This throws an exception if the duplication exists.
- schema
the schema to check for duplicates
- colType
column type name, used in an exception message
-
def
checkFieldNames(names: Seq[String]): Unit
Verifies that the column names are acceptable by Parquet and henceforth Delta.
Verifies that the column names are acceptable by Parquet and henceforth Delta. Parquet doesn't accept the characters ' ,;{}()\n\t'. We ensure that neither the data columns nor the partition columns have these characters.
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
dropColumn(schema: StructType, position: Seq[Int]): (StructType, StructField)
Drop from the specified
positioninschemaand return with the original column.Drop from the specified
positioninschemaand return with the original column.- position
A Seq of ordinals on where this column should go. It is a Seq to denote positions in nested columns (0-based). For example: tableSchema: <a:STRUCT<a1,a2,a3>, b,c:STRUCT<c1,c2,c3>> position: Seq(2, 1) will return result: <a:STRUCT<a1,a2,a3>, b,c:STRUCT<c1,c3>>
-
def
dropNullTypeColumns(df: DataFrame): DataFrame
Drops null types from the DataFrame if they exist.
Drops null types from the DataFrame if they exist. We don't have easy ways of generating types such as MapType and ArrayType, therefore if these types contain NullType in their elements, we will throw an AnalysisException.
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
explodeNestedFieldNames(schema: StructType): Seq[String]
Returns all column names in this schema as a flat list.
Returns all column names in this schema as a flat list. For example, a schema like: | - a | | - 1 | | - 2 | - b | - c | | - nest | | - 3 will get flattened to: "a", "a.1", "a.2", "b", "c", "c.nest", "c.nest.3"
-
def
filterRecursively(schema: StructType, checkComplexTypes: Boolean)(f: (StructField) ⇒ Boolean): Seq[(Seq[String], StructField)]
Finds
StructFields that match a given checkf.Finds
StructFields that match a given checkf. Returns the path to the column, and the field.- checkComplexTypes
While
StructTypeis also a complex type, since we're returning StructFields, we definitely recurse into StructTypes. This flag defines whether we should recurse into ArrayType and MapType.
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
findColumnPosition(column: Seq[String], schema: StructType, resolver: Resolver = DELTA_COL_RESOLVER): (Seq[Int], Int)
Returns the given column's ordinal within the given
schemaand the size of the last schema size.Returns the given column's ordinal within the given
schemaand the size of the last schema size. The length of the returned position will be as long as how nested the column is.For ArrayType: accessing the array's element adds a position 0 to the position list. e.g. accessing a.element.y would have the result -> Seq(..., positionOfA, 0, positionOfY)
For MapType: accessing the map's key adds a position 0 to the position list. e.g. accessing m.key.y would have the result -> Seq(..., positionOfM, 0, positionOfY)
For MapType: accessing the map's value adds a position 1 to the position list. e.g. accessing m.key.y would have the result -> Seq(..., positionOfM, 1, positionOfY)
- column
The column to search for in the given struct. If the length of
columnis greater than 1, we expect to enter a nested field.- schema
The current struct we are looking at.
- resolver
The resolver to find the column.
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
isReadCompatible(existingSchema: StructType, readSchema: StructType): Boolean
As the Delta snapshots update, the schema may change as well.
As the Delta snapshots update, the schema may change as well. This method defines whether the new schema of a Delta table can be used with a previously analyzed LogicalPlan. Our rules are to return false if:
- Dropping any column that was present in the DataFrame schema
- Converting nullable=false to nullable=true for any column
- Any change of datatype
-
def
mergeSchemas(tableSchema: StructType, dataSchema: StructType): StructType
Check whether we can write to the Delta table, which has
tableSchema, using a query that hasdataSchema.Check whether we can write to the Delta table, which has
tableSchema, using a query that hasdataSchema. Our rules are that:dataSchemamay be missing columns or have additional columns- We don't trust the nullability in
dataSchema. Assume fields are nullable. - We only allow nested StructType expansions. For all other complex types, we check for strict equality
dataSchemacan't have duplicate column names. Columns that only differ by case are also not allowed. The following merging strategy is applied:
- The name of the current field is used.
- The data types are merged by calling this function.
- We respect the current field's nullability.
- The metadata is current field's metadata.
Schema merging occurs in a case insensitive manner. Hence, column names that only differ by case are not accepted in the
dataSchema. -
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
normalizeColumnNames(baseSchema: StructType, data: Dataset[_]): DataFrame
Rewrite the query field names according to the table schema.
Rewrite the query field names according to the table schema. This method assumes that all schema validation checks have been made and this is the last operation before writing into Delta.
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
prettyFieldName(columnPath: Seq[String]): String
Pretty print the column path passed in.
-
def
reportDifferences(existingSchema: StructType, specifiedSchema: StructType): Seq[String]
Compare an existing schema to a specified new schema and return a message describing the first difference found, if any:
Compare an existing schema to a specified new schema and return a message describing the first difference found, if any:
- different field name or datatype
- different metadata
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
transformColumns[E](schema: StructType, input: Seq[(Seq[String], E)])(tf: (Seq[String], StructField, Seq[(Seq[String], E)]) ⇒ StructField): StructType
Transform (nested) columns in a schema using the given path and parameter pairs.
Transform (nested) columns in a schema using the given path and parameter pairs. The transform function is only invoked when a field's path matches one of the input paths.
- E
the type of the payload used for transforming fields.
- schema
to transform
- input
paths and parameter pairs. The paths point to fields we want to transform. The parameters will be passed to the transform function for a matching field.
- tf
function to apply per matched field. This function takes the field path, the field itself and the input names and payload pairs that matched the field name. It should return a new field.
- returns
the transformed schema.
-
def
transformColumns(schema: StructType)(tf: (Seq[String], StructField, Resolver) ⇒ StructField): StructType
Transform (nested) columns in a schema.
Transform (nested) columns in a schema.
- schema
to transform.
- tf
function to apply.
- returns
the transformed schema.
-
def
transformColumnsStructs(schema: StructType, colName: String)(tf: (Seq[String], StructType, Resolver) ⇒ Seq[StructField]): StructType
Transform (nested) columns in a schema.
Transform (nested) columns in a schema. Runs the transform function on all nested StructTypes
- schema
to transform.
- tf
function to apply on the StructType.
- returns
the transformed schema.
-
def
typeAsNullable(dt: DataType): DataType
Turns the data types to nullable in a recursive manner for nested columns.
-
def
typeExistsRecursively(dt: DataType)(f: (DataType) ⇒ Boolean): Boolean
Copied over from DataType for visibility reasons.
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()