object GeneratedColumn extends DeltaLogging with AnalysisHelper
Provide utility methods to implement Generated Columns for Delta. Users can use the following SQL syntax to create a table with generated columns.
CREATE TABLE table_identifier(
column_name column_type,
column_name column_type GENERATED ALWAYS AS ( generation_expr ),
...
)
USING delta
[ PARTITIONED BY (partition_column_name, ...) ]
This is an example:
CREATE TABLE foo(
id bigint,
type string,
subType string GENERATED ALWAYS AS ( SUBSTRING(type FROM 0 FOR 4) ),
data string,
eventTime timestamp,
day date GENERATED ALWAYS AS ( days(eventTime) )
USING delta
PARTITIONED BY (type, day)
When writing to a table, for these generated columns: - If the output is missing a generated column, we will add an expression to generate it. - If a generated column exists in the output, in other words, we will add a constraint to ensure the given value doesn't violate the generation expression.
- Alphabetic
- By Inheritance
- GeneratedColumn
- AnalysisHelper
- DeltaLogging
- DatabricksLogging
- DeltaProgressReporter
- LoggingShims
- Logging
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Type Members
- implicit class LogStringContext extends AnyRef
- Definition Classes
- LoggingShims
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- def deltaAssert(check: => Boolean, name: String, msg: String, deltaLog: DeltaLog = null, data: AnyRef = null, path: Option[Path] = None): Unit
Helper method to check invariants in Delta code.
Helper method to check invariants in Delta code. Fails when running in tests, records a delta assertion event and logs a warning otherwise.
- Attributes
- protected
- Definition Classes
- DeltaLogging
- def enforcesGeneratedColumns(protocol: Protocol, metadata: Metadata): Boolean
Whether the table has generated columns.
Whether the table has generated columns. A table has generated columns only if its protocol satisfies Generated Column (listed in Table Features or supported implicitly) and some of columns in the table schema contain generation expressions.
As Spark will propagate column metadata storing the generation expression through the entire plan, old versions that don't support generated columns may create tables whose schema contain generation expressions. However, since these old versions has a lower writer version, we can use the table's
minWriterVersionto identify such tables and treat them as normal tables.- protocol
the table protocol.
- metadata
the table metadata.
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- def generatePartitionFilters(spark: SparkSession, snapshot: SnapshotDescriptor, dataFilters: Seq[Expression], delta: LogicalPlan): Seq[Expression]
Try to generate partition filters from data filters if possible.
Try to generate partition filters from data filters if possible.
- delta
the logical plan that outputs the same attributes as the table schema. This will be used to resolve auto generated expressions.
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def getCommonTags(deltaLog: DeltaLog, tahoeId: String): Map[TagDefinition, String]
- Definition Classes
- DeltaLogging
- def getErrorData(e: Throwable): Map[String, Any]
- Definition Classes
- DeltaLogging
- def getGeneratedColumns(snapshot: SnapshotDescriptor): Seq[StructField]
Returns the generated columns of a table.
Returns the generated columns of a table. A column is a generated column requires: - The table writer protocol >= GeneratedColumn.MIN_WRITER_VERSION; - It has a generation expression in the column metadata.
- def getGeneratedColumnsAndColumnsUsedByGeneratedColumns(schema: StructType): Set[String]
- def getGenerationExpression(field: StructField): Option[Expression]
Return the generation expression from a field if any.
Return the generation expression from a field if any. This method doesn't check the protocl. The caller should make sure the table writer protocol meets
satisfyGeneratedColumnProtocolbefore calling method. - def getGenerationExpressionStr(metadata: Metadata): Option[String]
Return the generation expression from a field metadata if any.
- def getOptimizablePartitionExpressions(schema: StructType, partitionSchema: StructType): Map[String, Seq[OptimizablePartitionExpression]]
Try to get
OptimizablePartitionExpressions of a data column when a partition column is defined as a generated column and refers to this data column.Try to get
OptimizablePartitionExpressions of a data column when a partition column is defined as a generated column and refers to this data column.- schema
the table schema
- partitionSchema
the partition schema. If a partition column is defined as a generated column, its column metadata should contain the generation expression.
- def hasGeneratedColumns(schema: StructType): Boolean
Whether any generation expressions exist in the schema.
Whether any generation expressions exist in the schema. Note: this doesn't mean the table contains generated columns. A table has generated columns only if its protocol satisfies Generated Column (listed in Table Features or supported implicitly) and some of columns in the table schema contain generation expressions. Use
enforcesGeneratedColumnsto check generated column tables instead. - def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def improveUnsupportedOpError(f: => Unit): Unit
- Attributes
- protected
- Definition Classes
- AnalysisHelper
- def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def isGeneratedColumn(protocol: Protocol, field: StructField): Boolean
Whether a column is a generated column.
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def log: Logger
- Attributes
- protected
- Definition Classes
- Logging
- def logConsole(line: String): Unit
- Definition Classes
- DatabricksLogging
- def logDebug(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logDebug(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logDebug(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logError(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logError(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logInfo(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logInfo(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logName: String
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logTrace(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logTrace(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logWarning(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logWarning(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- def partitionFilterOptimizationEnabled(spark: SparkSession): Boolean
- def recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null, path: Option[Path] = None): Unit
Used to record the occurrence of a single event or report detailed, operation specific statistics.
Used to record the occurrence of a single event or report detailed, operation specific statistics.
- path
Used to log the path of the delta table when
deltaLogis null.
- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: => A): A
Used to report the duration as well as the success or failure of an operation on a
deltaLog.Used to report the duration as well as the success or failure of an operation on a
deltaLog.- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordDeltaOperationForTablePath[A](tablePath: String, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: => A): A
Used to report the duration as well as the success or failure of an operation on a
tahoePath.Used to report the duration as well as the success or failure of an operation on a
tahoePath.- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
- def recordFrameProfile[T](group: String, name: String)(thunk: => T): T
- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = METRIC_OPERATION_DURATION, silent: Boolean = true)(thunk: => S): S
- Definition Classes
- DatabricksLogging
- def recordProductEvent(metric: MetricDefinition with CentralizableMetric, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
- def recordProductUsage(metric: MetricDefinition with CentralizableMetric, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
- def recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
- def resolveReferencesForExpressions(sparkSession: SparkSession, exprs: Seq[Expression], planProvidingAttrs: LogicalPlan): Seq[Expression]
Resolve expressions using the attributes provided by
planProvidingAttrs.Resolve expressions using the attributes provided by
planProvidingAttrs. Throw an error if failing to resolve any expressions.- Attributes
- protected
- Definition Classes
- AnalysisHelper
- def satisfyGeneratedColumnProtocol(protocol: Protocol): Boolean
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toDataset(sparkSession: SparkSession, logicalPlan: LogicalPlan): Dataset[Row]
- Attributes
- protected
- Definition Classes
- AnalysisHelper
- def toString(): String
- Definition Classes
- AnyRef → Any
- def tryResolveReferences(sparkSession: SparkSession)(expr: Expression, planContainingExpr: LogicalPlan): Expression
- Attributes
- protected
- Definition Classes
- AnalysisHelper
- def tryResolveReferencesForExpressions(sparkSession: SparkSession)(exprs: Seq[Expression], plansProvidingAttrs: Seq[LogicalPlan]): Seq[Expression]
Resolve expressions using the attributes provided by
planProvidingAttrs, ignoring errors.Resolve expressions using the attributes provided by
planProvidingAttrs, ignoring errors.- Attributes
- protected
- Definition Classes
- AnalysisHelper
- def tryResolveReferencesForExpressions(sparkSession: SparkSession, exprs: Seq[Expression], planContainingExpr: LogicalPlan): Seq[Expression]
- Attributes
- protected
- Definition Classes
- AnalysisHelper
- def validateColumnReferences(spark: SparkSession, fieldName: String, expression: Expression, schema: StructType): Unit
SPARK-27561 added support for lateral column alias.
SPARK-27561 added support for lateral column alias. This means generation expressions that reference other generated columns no longer fail analysis in
validateGeneratedColumns.This method checks for and throws an error if: - A generated column references itself - A generated column references another generated column
- def validateGeneratedColumns(spark: SparkSession, schema: StructType): Unit
If the schema contains generated columns, check the following unsupported cases: - Refer to a non-existent column or another generated column.
If the schema contains generated columns, check the following unsupported cases: - Refer to a non-existent column or another generated column. - Use an unsupported expression. - The expression type is not the same as the column type.
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- def withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: => T): T
Report a log to indicate some command is running.
Report a log to indicate some command is running.
- Definition Classes
- DeltaProgressReporter