object DeltaColumnMapping extends DeltaColumnMappingBase
- Alphabetic
- By Inheritance
- DeltaColumnMapping
- DeltaColumnMappingBase
- DeltaLogging
- DatabricksLogging
- DeltaProgressReporter
- LoggingShims
- Logging
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
-
implicit
class
LogStringContext extends AnyRef
- Definition Classes
- LoggingShims
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
val
COLUMN_MAPPING_METADATA_ID_KEY: String
- Definition Classes
- DeltaColumnMappingBase
-
val
COLUMN_MAPPING_METADATA_KEYS: Set[String]
The list of column mapping metadata for each column in the schema.
The list of column mapping metadata for each column in the schema.
- Definition Classes
- DeltaColumnMappingBase
-
val
COLUMN_MAPPING_METADATA_NESTED_IDS_KEY: String
- Definition Classes
- DeltaColumnMappingBase
-
val
COLUMN_MAPPING_METADATA_PREFIX: String
- Definition Classes
- DeltaColumnMappingBase
-
val
COLUMN_MAPPING_PHYSICAL_NAME_KEY: String
- Definition Classes
- DeltaColumnMappingBase
-
val
DELTA_INTERNAL_COLUMNS: Set[String]
This list of internal columns (and only this list) is allowed to have missing column mapping metadata such as field id and physical name because they might not be present in user's table schema.
This list of internal columns (and only this list) is allowed to have missing column mapping metadata such as field id and physical name because they might not be present in user's table schema.
These fields, if materialized to parquet, will always be matched by their display name in the downstream parquet reader even under column mapping modes.
For future developers who want to utilize additional internal columns without generating column mapping metadata, please add them here.
This list is case-insensitive.
- Attributes
- protected
- Definition Classes
- DeltaColumnMappingBase
-
val
PARQUET_FIELD_ID_METADATA_KEY: String
- Definition Classes
- DeltaColumnMappingBase
-
val
PARQUET_FIELD_NESTED_IDS_METADATA_KEY: String
- Definition Classes
- DeltaColumnMappingBase
-
val
PARQUET_LIST_ELEMENT_FIELD_NAME: String
- Definition Classes
- DeltaColumnMappingBase
-
val
PARQUET_MAP_KEY_FIELD_NAME: String
- Definition Classes
- DeltaColumnMappingBase
-
val
PARQUET_MAP_VALUE_FIELD_NAME: String
- Definition Classes
- DeltaColumnMappingBase
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
assignColumnIdAndPhysicalName(newMetadata: Metadata, oldMetadata: Metadata, isChangingModeOnExistingTable: Boolean, isOverwritingSchema: Boolean): Metadata
For each column/field in a Metadata's schema, assign id using the current maximum id as the basis and increment from there, and assign physical name using UUID
For each column/field in a Metadata's schema, assign id using the current maximum id as the basis and increment from there, and assign physical name using UUID
- newMetadata
The new metadata to assign Ids and physical names
- oldMetadata
The old metadata
- isChangingModeOnExistingTable
whether this is part of a commit that changes the mapping mode on a existing table
- returns
new metadata with Ids and physical names assigned
- Definition Classes
- DeltaColumnMappingBase
-
def
assignPhysicalName(field: StructField, physicalName: String): StructField
- Definition Classes
- DeltaColumnMappingBase
-
def
assignPhysicalNames(schema: StructType, reuseLogicalName: Boolean = false): StructType
- Definition Classes
- DeltaColumnMappingBase
-
def
checkColumnIdAndPhysicalNameAssignments(metadata: Metadata): Unit
Verify the metadata for valid column mapping metadata assignment.
Verify the metadata for valid column mapping metadata assignment. This is triggered for every commit as a last defense.
1. Ensure column mapping metadata is set for the appropriate mode 2. Ensure no duplicate column id/physical names set 3. Ensure max column id is in a good state (set, and greater than all field ids available)
- Definition Classes
- DeltaColumnMappingBase
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
createPhysicalAttributes(output: Seq[Attribute], referenceSchema: StructType, columnMappingMode: DeltaColumnMappingMode): Seq[Attribute]
Create a list of physical attributes for the given attributes using the table schema as a reference.
Create a list of physical attributes for the given attributes using the table schema as a reference.
- output
the list of attributes (potentially without any metadata)
- referenceSchema
the table schema with all the metadata
- columnMappingMode
column mapping mode of the delta table, which determines which metadata to fill in
- Definition Classes
- DeltaColumnMappingBase
-
def
createPhysicalSchema(schema: StructType, referenceSchema: StructType, columnMappingMode: DeltaColumnMappingMode, checkSupportedMode: Boolean = true): StructType
Create a physical schema for the given schema using the Delta table schema as a reference.
Create a physical schema for the given schema using the Delta table schema as a reference.
- schema
the given logical schema (potentially without any metadata)
- referenceSchema
the schema from the delta log, which has all the metadata
- columnMappingMode
column mapping mode of the delta table, which determines which metadata to fill in
- checkSupportedMode
whether we should check of the column mapping mode is supported
- Definition Classes
- DeltaColumnMappingBase
-
def
deltaAssert(check: ⇒ Boolean, name: String, msg: String, deltaLog: DeltaLog = null, data: AnyRef = null, path: Option[Path] = None): Unit
Helper method to check invariants in Delta code.
Helper method to check invariants in Delta code. Fails when running in tests, records a delta assertion event and logs a warning otherwise.
- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
dropColumnMappingMetadata(schema: StructType): StructType
- Definition Classes
- DeltaColumnMappingBase
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
filterColumnMappingProperties(properties: Map[String, String]): Map[String, String]
- Definition Classes
- DeltaColumnMappingBase
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
findMaxColumnId(schema: StructType): Long
- Definition Classes
- DeltaColumnMappingBase
-
def
generatePhysicalName: String
- Definition Classes
- DeltaColumnMappingBase
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getColumnId(field: StructField): Int
- Definition Classes
- DeltaColumnMappingBase
-
def
getColumnMappingMetadata(field: StructField, mode: DeltaColumnMappingMode): Metadata
Gets the required column metadata for each column based on the column mapping mode.
Gets the required column metadata for each column based on the column mapping mode.
- Definition Classes
- DeltaColumnMappingBase
-
def
getCommonTags(deltaLog: DeltaLog, tahoeId: String): Map[TagDefinition, String]
- Definition Classes
- DeltaLogging
-
def
getErrorData(e: Throwable): Map[String, Any]
- Definition Classes
- DeltaLogging
-
def
getLogicalNameToPhysicalNameMap(schema: StructType): Map[Seq[String], Seq[String]]
Returns a map from the logical name paths to the physical name paths for the given schema.
Returns a map from the logical name paths to the physical name paths for the given schema. The logical name path is the result of splitting a multi-part identifier, and the physical name path is result of replacing all names in the logical name path with their physical names.
- Definition Classes
- DeltaColumnMappingBase
-
def
getNestedColumnIds(field: StructField): Metadata
- Definition Classes
- DeltaColumnMappingBase
-
def
getNestedColumnIdsAsLong(field: StructField): Iterable[Long]
- Definition Classes
- DeltaColumnMappingBase
-
def
getPhysicalName(field: StructField): String
- Definition Classes
- DeltaColumnMappingBase
-
def
getPhysicalNameFieldMap(schema: StructType): Map[Seq[String], StructField]
Returns a map of physicalNamePath -> field for the given
schema, where physicalNamePath is the [$parentPhysicalName, ..., $fieldPhysicalName] list of physical names for every field (including nested) in theschema.Returns a map of physicalNamePath -> field for the given
schema, where physicalNamePath is the [$parentPhysicalName, ..., $fieldPhysicalName] list of physical names for every field (including nested) in theschema.Must be called after
checkColumnIdAndPhysicalNameAssignments, so that we know the schema is valid.- Definition Classes
- DeltaColumnMappingBase
-
def
hasColumnId(field: StructField): Boolean
- Definition Classes
- DeltaColumnMappingBase
-
def
hasNestedColumnIds(field: StructField): Boolean
- Definition Classes
- DeltaColumnMappingBase
-
def
hasNoColumnMappingSchemaChanges(newMetadata: Metadata, oldMetadata: Metadata, allowUnsafeReadOnPartitionChanges: Boolean = false): Boolean
Compare the old metadata's schema with new metadata's schema for column mapping schema changes.
Compare the old metadata's schema with new metadata's schema for column mapping schema changes. Also check for repartition because we need to fail fast when repartition detected.
newMetadata's snapshot version must be >= oldMetadata's snapshot version so we could reliably detect the difference between ADD COLUMN and DROP COLUMN.
As of now,
newMetadatais column mapping read compatible witholdMetadataif no rename column or drop column has happened in-between.- Definition Classes
- DeltaColumnMappingBase
-
def
hasPhysicalName(field: StructField): Boolean
- Definition Classes
- DeltaColumnMappingBase
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
isColumnMappingUpgrade(oldMode: DeltaColumnMappingMode, newMode: DeltaColumnMappingMode): Boolean
- Definition Classes
- DeltaColumnMappingBase
-
def
isDropColumnOperation(newSchema: StructType, currentSchema: StructType): Boolean
- Definition Classes
- DeltaColumnMappingBase
-
def
isDropColumnOperation(newMetadata: Metadata, currentMetadata: Metadata): Boolean
Returns true if Column Mapping mode is enabled and the newMetadata's schema, when compared to the currentMetadata's schema, is indicative of a DROP COLUMN operation.
Returns true if Column Mapping mode is enabled and the newMetadata's schema, when compared to the currentMetadata's schema, is indicative of a DROP COLUMN operation.
We detect DROP COLUMNS by checking if any physical name in
currentSchemais missing innewSchema.- Definition Classes
- DeltaColumnMappingBase
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
isInternalField(field: StructField): Boolean
- Definition Classes
- DeltaColumnMappingBase
-
def
isRenameColumnOperation(newSchema: StructType, currentSchema: StructType): Boolean
- Definition Classes
- DeltaColumnMappingBase
-
def
isRenameColumnOperation(newMetadata: Metadata, currentMetadata: Metadata): Boolean
Returns true if Column Mapping mode is enabled and the newMetadata's schema, when compared to the currentMetadata's schema, is indicative of a RENAME COLUMN operation.
Returns true if Column Mapping mode is enabled and the newMetadata's schema, when compared to the currentMetadata's schema, is indicative of a RENAME COLUMN operation.
We detect RENAME COLUMNS by checking if any two columns with the same physical name have different logical names
- Definition Classes
- DeltaColumnMappingBase
-
def
isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
log: Logger
- Attributes
- protected
- Definition Classes
- Logging
-
def
logConsole(line: String): Unit
- Definition Classes
- DatabricksLogging
-
def
logDebug(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logDebug(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logDebug(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logError(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logError(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logInfo(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logInfo(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logName: String
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logTrace(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logTrace(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logWarning(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logWarning(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null, path: Option[Path] = None): Unit
Used to record the occurrence of a single event or report detailed, operation specific statistics.
Used to record the occurrence of a single event or report detailed, operation specific statistics.
- path
Used to log the path of the delta table when
deltaLogis null.
- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A
Used to report the duration as well as the success or failure of an operation on a
deltaLog.Used to report the duration as well as the success or failure of an operation on a
deltaLog.- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordDeltaOperationForTablePath[A](tablePath: String, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A
Used to report the duration as well as the success or failure of an operation on a
tahoePath.Used to report the duration as well as the success or failure of an operation on a
tahoePath.- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
-
def
recordFrameProfile[T](group: String, name: String)(thunk: ⇒ T): T
- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = METRIC_OPERATION_DURATION, silent: Boolean = true)(thunk: ⇒ S): S
- Definition Classes
- DatabricksLogging
-
def
recordProductEvent(metric: MetricDefinition with CentralizableMetric, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
-
def
recordProductUsage(metric: MetricDefinition with CentralizableMetric, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
-
def
recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
-
def
renameColumns(schema: StructType): StructType
Recursively renames columns in the given schema with their physical schema.
Recursively renames columns in the given schema with their physical schema.
- Definition Classes
- DeltaColumnMappingBase
-
def
rewriteFieldIdsForIceberg(schema: StructType, startId: Long): (StructType, Long)
Adds the nested field IDs required by Iceberg.
Adds the nested field IDs required by Iceberg.
In parquet, list-type columns have a nested, implicitly defined element field and map-type columns have implicitly defined key and value fields. By default, Spark does not write field IDs for these fields in the parquet files. However, Iceberg requires these *nested* field IDs to be present. This method rewrites the specified Spark schema to add those nested field IDs.
As list and map types are not StructFields themselves, nested field IDs are stored in a map as part of the metadata of the *nearest* parent StructField. For example, consider the following schema:
col1 ARRAY(INT) col2 MAP(INT, INT) col3 STRUCT(a INT, b ARRAY(STRUCT(c INT, d MAP(INT, INT))))
col1 is a list and so requires one nested field ID for the element field in parquet. This nested field ID will be stored in a map that is part of col1's StructField.metadata. The same applies to the nested field IDs for col2's implicit key and value fields. col3 itself is a Struct, consisting of an integer field and a list field named 'b'. The nested field ID for the list of 'b' is stored in b's StructField metadata. Finally, the list type itself is again a struct consisting of an integer field and a map field named 'd'. The nested field IDs for the map of 'd' are stored in d's StructField metadata.
- schema
The schema to which nested field IDs should be added
- startId
The first field ID to use for the nested field IDs
- Definition Classes
- DeltaColumnMappingBase
-
def
schemaHasColumnMappingMetadata(schema: StructType): Boolean
Returns whether the schema contains any metadata reserved for column mapping.
Returns whether the schema contains any metadata reserved for column mapping.
- Definition Classes
- DeltaColumnMappingBase
-
def
setPhysicalNames(schema: StructType, fieldPathToPhysicalName: Map[Seq[String], String]): StructType
Set physical name based on field path, skip if field path not found in the map
Set physical name based on field path, skip if field path not found in the map
- Definition Classes
- DeltaColumnMappingBase
-
val
supportedModes: Set[DeltaColumnMappingMode]
- Definition Classes
- DeltaColumnMappingBase
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
verifyAndUpdateMetadataChange(spark: SparkSession, deltaLog: DeltaLog, oldProtocol: Protocol, oldMetadata: Metadata, newMetadata: Metadata, isCreatingNewTable: Boolean, isOverwriteSchema: Boolean): Metadata
If the table is already on the column mapping protocol, we block:
If the table is already on the column mapping protocol, we block:
- changing column mapping config otherwise, we block
- upgrading to the column mapping Protocol through configurations
- Definition Classes
- DeltaColumnMappingBase
-
def
verifyInternalProperties(one: Map[String, String], two: Map[String, String]): Boolean
- Definition Classes
- DeltaColumnMappingBase
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: ⇒ T): T
Report a log to indicate some command is running.
Report a log to indicate some command is running.
- Definition Classes
- DeltaProgressReporter