Packages

t

org.apache.spark.sql.delta

DeltaColumnMappingBase

trait DeltaColumnMappingBase extends DeltaLogging

Linear Supertypes
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DeltaColumnMappingBase
  2. DeltaLogging
  3. DatabricksLogging
  4. DeltaProgressReporter
  5. LoggingShims
  6. Logging
  7. AnyRef
  8. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. implicit class LogStringContext extends AnyRef
    Definition Classes
    LoggingShims

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. val COLUMN_MAPPING_METADATA_ID_KEY: String
  5. val COLUMN_MAPPING_METADATA_KEYS: Set[String]

    The list of column mapping metadata for each column in the schema.

  6. val COLUMN_MAPPING_METADATA_NESTED_IDS_KEY: String
  7. val COLUMN_MAPPING_METADATA_PREFIX: String
  8. val COLUMN_MAPPING_PHYSICAL_NAME_KEY: String
  9. val DELTA_INTERNAL_COLUMNS: Set[String]

    This list of internal columns (and only this list) is allowed to have missing column mapping metadata such as field id and physical name because they might not be present in user's table schema.

    This list of internal columns (and only this list) is allowed to have missing column mapping metadata such as field id and physical name because they might not be present in user's table schema.

    These fields, if materialized to parquet, will always be matched by their display name in the downstream parquet reader even under column mapping modes.

    For future developers who want to utilize additional internal columns without generating column mapping metadata, please add them here.

    This list is case-insensitive.

    Attributes
    protected
  10. val PARQUET_FIELD_ID_METADATA_KEY: String
  11. val PARQUET_FIELD_NESTED_IDS_METADATA_KEY: String
  12. val PARQUET_LIST_ELEMENT_FIELD_NAME: String
  13. val PARQUET_MAP_KEY_FIELD_NAME: String
  14. val PARQUET_MAP_VALUE_FIELD_NAME: String
  15. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  16. def assignColumnIdAndPhysicalName(newMetadata: Metadata, oldMetadata: Metadata, isChangingModeOnExistingTable: Boolean, isOverwritingSchema: Boolean): Metadata

    For each column/field in a Metadata's schema, assign id using the current maximum id as the basis and increment from there, and assign physical name using UUID

    For each column/field in a Metadata's schema, assign id using the current maximum id as the basis and increment from there, and assign physical name using UUID

    newMetadata

    The new metadata to assign Ids and physical names

    oldMetadata

    The old metadata

    isChangingModeOnExistingTable

    whether this is part of a commit that changes the mapping mode on a existing table

    returns

    new metadata with Ids and physical names assigned

  17. def assignPhysicalName(field: StructField, physicalName: String): StructField
  18. def assignPhysicalNames(schema: StructType, reuseLogicalName: Boolean = false): StructType
  19. def checkColumnIdAndPhysicalNameAssignments(metadata: Metadata): Unit

    Verify the metadata for valid column mapping metadata assignment.

    Verify the metadata for valid column mapping metadata assignment. This is triggered for every commit as a last defense.

    1. Ensure column mapping metadata is set for the appropriate mode 2. Ensure no duplicate column id/physical names set 3. Ensure max column id is in a good state (set, and greater than all field ids available)

  20. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  21. def createPhysicalAttributes(output: Seq[Attribute], referenceSchema: StructType, columnMappingMode: DeltaColumnMappingMode): Seq[Attribute]

    Create a list of physical attributes for the given attributes using the table schema as a reference.

    Create a list of physical attributes for the given attributes using the table schema as a reference.

    output

    the list of attributes (potentially without any metadata)

    referenceSchema

    the table schema with all the metadata

    columnMappingMode

    column mapping mode of the delta table, which determines which metadata to fill in

  22. def createPhysicalSchema(schema: StructType, referenceSchema: StructType, columnMappingMode: DeltaColumnMappingMode, checkSupportedMode: Boolean = true): StructType

    Create a physical schema for the given schema using the Delta table schema as a reference.

    Create a physical schema for the given schema using the Delta table schema as a reference.

    schema

    the given logical schema (potentially without any metadata)

    referenceSchema

    the schema from the delta log, which has all the metadata

    columnMappingMode

    column mapping mode of the delta table, which determines which metadata to fill in

    checkSupportedMode

    whether we should check of the column mapping mode is supported

  23. def deltaAssert(check: ⇒ Boolean, name: String, msg: String, deltaLog: DeltaLog = null, data: AnyRef = null, path: Option[Path] = None): Unit

    Helper method to check invariants in Delta code.

    Helper method to check invariants in Delta code. Fails when running in tests, records a delta assertion event and logs a warning otherwise.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  24. def dropColumnMappingMetadata(schema: StructType): StructType
  25. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  26. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  27. def filterColumnMappingProperties(properties: Map[String, String]): Map[String, String]
  28. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  29. def findMaxColumnId(schema: StructType): Long
  30. def generatePhysicalName: String
  31. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  32. def getColumnId(field: StructField): Int
  33. def getColumnMappingMetadata(field: StructField, mode: DeltaColumnMappingMode): Metadata

    Gets the required column metadata for each column based on the column mapping mode.

  34. def getCommonTags(deltaLog: DeltaLog, tahoeId: String): Map[TagDefinition, String]
    Definition Classes
    DeltaLogging
  35. def getErrorData(e: Throwable): Map[String, Any]
    Definition Classes
    DeltaLogging
  36. def getLogicalNameToPhysicalNameMap(schema: StructType): Map[Seq[String], Seq[String]]

    Returns a map from the logical name paths to the physical name paths for the given schema.

    Returns a map from the logical name paths to the physical name paths for the given schema. The logical name path is the result of splitting a multi-part identifier, and the physical name path is result of replacing all names in the logical name path with their physical names.

  37. def getNestedColumnIds(field: StructField): Metadata
  38. def getNestedColumnIdsAsLong(field: StructField): Iterable[Long]
  39. def getPhysicalName(field: StructField): String
  40. def getPhysicalNameFieldMap(schema: StructType): Map[Seq[String], StructField]

    Returns a map of physicalNamePath -> field for the given schema, where physicalNamePath is the [$parentPhysicalName, ..., $fieldPhysicalName] list of physical names for every field (including nested) in the schema.

    Returns a map of physicalNamePath -> field for the given schema, where physicalNamePath is the [$parentPhysicalName, ..., $fieldPhysicalName] list of physical names for every field (including nested) in the schema.

    Must be called after checkColumnIdAndPhysicalNameAssignments, so that we know the schema is valid.

  41. def hasColumnId(field: StructField): Boolean
  42. def hasNestedColumnIds(field: StructField): Boolean
  43. def hasNoColumnMappingSchemaChanges(newMetadata: Metadata, oldMetadata: Metadata, allowUnsafeReadOnPartitionChanges: Boolean = false): Boolean

    Compare the old metadata's schema with new metadata's schema for column mapping schema changes.

    Compare the old metadata's schema with new metadata's schema for column mapping schema changes. Also check for repartition because we need to fail fast when repartition detected.

    newMetadata's snapshot version must be >= oldMetadata's snapshot version so we could reliably detect the difference between ADD COLUMN and DROP COLUMN.

    As of now, newMetadata is column mapping read compatible with oldMetadata if no rename column or drop column has happened in-between.

  44. def hasPhysicalName(field: StructField): Boolean
  45. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  46. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  47. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  48. def isColumnMappingUpgrade(oldMode: DeltaColumnMappingMode, newMode: DeltaColumnMappingMode): Boolean
  49. def isDropColumnOperation(newSchema: StructType, currentSchema: StructType): Boolean
  50. def isDropColumnOperation(newMetadata: Metadata, currentMetadata: Metadata): Boolean

    Returns true if Column Mapping mode is enabled and the newMetadata's schema, when compared to the currentMetadata's schema, is indicative of a DROP COLUMN operation.

    Returns true if Column Mapping mode is enabled and the newMetadata's schema, when compared to the currentMetadata's schema, is indicative of a DROP COLUMN operation.

    We detect DROP COLUMNS by checking if any physical name in currentSchema is missing in newSchema.

  51. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  52. def isInternalField(field: StructField): Boolean
  53. def isRenameColumnOperation(newSchema: StructType, currentSchema: StructType): Boolean
  54. def isRenameColumnOperation(newMetadata: Metadata, currentMetadata: Metadata): Boolean

    Returns true if Column Mapping mode is enabled and the newMetadata's schema, when compared to the currentMetadata's schema, is indicative of a RENAME COLUMN operation.

    Returns true if Column Mapping mode is enabled and the newMetadata's schema, when compared to the currentMetadata's schema, is indicative of a RENAME COLUMN operation.

    We detect RENAME COLUMNS by checking if any two columns with the same physical name have different logical names

  55. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  56. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  57. def logConsole(line: String): Unit
    Definition Classes
    DatabricksLogging
  58. def logDebug(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  59. def logDebug(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  60. def logDebug(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  61. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  62. def logError(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  63. def logError(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  64. def logError(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  65. def logError(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  66. def logInfo(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  67. def logInfo(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  68. def logInfo(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  69. def logInfo(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  70. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  71. def logTrace(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  72. def logTrace(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  73. def logTrace(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  74. def logTrace(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  75. def logWarning(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  76. def logWarning(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  77. def logWarning(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  78. def logWarning(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  79. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  80. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  81. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  82. def recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null, path: Option[Path] = None): Unit

    Used to record the occurrence of a single event or report detailed, operation specific statistics.

    Used to record the occurrence of a single event or report detailed, operation specific statistics.

    path

    Used to log the path of the delta table when deltaLog is null.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  83. def recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A

    Used to report the duration as well as the success or failure of an operation on a deltaLog.

    Used to report the duration as well as the success or failure of an operation on a deltaLog.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  84. def recordDeltaOperationForTablePath[A](tablePath: String, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A

    Used to report the duration as well as the success or failure of an operation on a tahoePath.

    Used to report the duration as well as the success or failure of an operation on a tahoePath.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  85. def recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
    Definition Classes
    DatabricksLogging
  86. def recordFrameProfile[T](group: String, name: String)(thunk: ⇒ T): T
    Attributes
    protected
    Definition Classes
    DeltaLogging
  87. def recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = METRIC_OPERATION_DURATION, silent: Boolean = true)(thunk: ⇒ S): S
    Definition Classes
    DatabricksLogging
  88. def recordProductEvent(metric: MetricDefinition with CentralizableMetric, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
    Definition Classes
    DatabricksLogging
  89. def recordProductUsage(metric: MetricDefinition with CentralizableMetric, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
    Definition Classes
    DatabricksLogging
  90. def recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
    Definition Classes
    DatabricksLogging
  91. def renameColumns(schema: StructType): StructType

    Recursively renames columns in the given schema with their physical schema.

  92. def rewriteFieldIdsForIceberg(schema: StructType, startId: Long): (StructType, Long)

    Adds the nested field IDs required by Iceberg.

    Adds the nested field IDs required by Iceberg.

    In parquet, list-type columns have a nested, implicitly defined element field and map-type columns have implicitly defined key and value fields. By default, Spark does not write field IDs for these fields in the parquet files. However, Iceberg requires these *nested* field IDs to be present. This method rewrites the specified Spark schema to add those nested field IDs.

    As list and map types are not StructFields themselves, nested field IDs are stored in a map as part of the metadata of the *nearest* parent StructField. For example, consider the following schema:

    col1 ARRAY(INT) col2 MAP(INT, INT) col3 STRUCT(a INT, b ARRAY(STRUCT(c INT, d MAP(INT, INT))))

    col1 is a list and so requires one nested field ID for the element field in parquet. This nested field ID will be stored in a map that is part of col1's StructField.metadata. The same applies to the nested field IDs for col2's implicit key and value fields. col3 itself is a Struct, consisting of an integer field and a list field named 'b'. The nested field ID for the list of 'b' is stored in b's StructField metadata. Finally, the list type itself is again a struct consisting of an integer field and a map field named 'd'. The nested field IDs for the map of 'd' are stored in d's StructField metadata.

    schema

    The schema to which nested field IDs should be added

    startId

    The first field ID to use for the nested field IDs

  93. def schemaHasColumnMappingMetadata(schema: StructType): Boolean

    Returns whether the schema contains any metadata reserved for column mapping.

  94. def setPhysicalNames(schema: StructType, fieldPathToPhysicalName: Map[Seq[String], String]): StructType

    Set physical name based on field path, skip if field path not found in the map

  95. val supportedModes: Set[DeltaColumnMappingMode]
  96. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  97. def toString(): String
    Definition Classes
    AnyRef → Any
  98. def verifyAndUpdateMetadataChange(spark: SparkSession, deltaLog: DeltaLog, oldProtocol: Protocol, oldMetadata: Metadata, newMetadata: Metadata, isCreatingNewTable: Boolean, isOverwriteSchema: Boolean): Metadata

    If the table is already on the column mapping protocol, we block:

    If the table is already on the column mapping protocol, we block:

    • changing column mapping config otherwise, we block
    • upgrading to the column mapping Protocol through configurations
  99. def verifyInternalProperties(one: Map[String, String], two: Map[String, String]): Boolean
  100. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  101. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  102. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  103. def withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: ⇒ T): T

    Report a log to indicate some command is running.

    Report a log to indicate some command is running.

    Definition Classes
    DeltaProgressReporter

Inherited from DeltaLogging

Inherited from DatabricksLogging

Inherited from DeltaProgressReporter

Inherited from LoggingShims

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped