object GeneratedColumn extends DeltaLogging with AnalysisHelper
Provide utility methods to implement Generated Columns for Delta. Users can use the following SQL syntax to create a table with generated columns.
CREATE TABLE table_identifier(
column_name column_type,
column_name column_type GENERATED ALWAYS AS ( generation_expr ),
...
)
USING delta
[ PARTITIONED BY (partition_column_name, ...) ]
This is an example:
CREATE TABLE foo(
id bigint,
type string,
subType string GENERATED ALWAYS AS ( SUBSTRING(type FROM 0 FOR 4) ),
data string,
eventTime timestamp,
day date GENERATED ALWAYS AS ( days(eventTime) )
USING delta
PARTITIONED BY (type, day)
When writing to a table, for these generated columns: - If the output is missing a generated column, we will add an expression to generate it. - If a generated column exists in the output, in other words, we will add a constraint to ensure the given value doesn't violate the generation expression.
- Alphabetic
- By Inheritance
- GeneratedColumn
- AnalysisHelper
- DeltaLogging
- DatabricksLogging
- DeltaProgressReporter
- LoggingShims
- Logging
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
-
implicit
class
LogStringContext extends AnyRef
- Definition Classes
- LoggingShims
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
deltaAssert(check: ⇒ Boolean, name: String, msg: String, deltaLog: DeltaLog = null, data: AnyRef = null, path: Option[Path] = None): Unit
Helper method to check invariants in Delta code.
Helper method to check invariants in Delta code. Fails when running in tests, records a delta assertion event and logs a warning otherwise.
- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
enforcesGeneratedColumns(protocol: Protocol, metadata: Metadata): Boolean
Whether the table has generated columns.
Whether the table has generated columns. A table has generated columns only if its protocol satisfies Generated Column (listed in Table Features or supported implicitly) and some of columns in the table schema contain generation expressions.
As Spark will propagate column metadata storing the generation expression through the entire plan, old versions that don't support generated columns may create tables whose schema contain generation expressions. However, since these old versions has a lower writer version, we can use the table's
minWriterVersionto identify such tables and treat them as normal tables.- protocol
the table protocol.
- metadata
the table metadata.
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
generatePartitionFilters(spark: SparkSession, snapshot: SnapshotDescriptor, dataFilters: Seq[Expression], delta: LogicalPlan): Seq[Expression]
Try to generate partition filters from data filters if possible.
Try to generate partition filters from data filters if possible.
- delta
the logical plan that outputs the same attributes as the table schema. This will be used to resolve auto generated expressions.
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getCommonTags(deltaLog: DeltaLog, tahoeId: String): Map[TagDefinition, String]
- Definition Classes
- DeltaLogging
-
def
getErrorData(e: Throwable): Map[String, Any]
- Definition Classes
- DeltaLogging
-
def
getGeneratedColumns(snapshot: SnapshotDescriptor): Seq[StructField]
Returns the generated columns of a table.
Returns the generated columns of a table. A column is a generated column requires: - The table writer protocol >= GeneratedColumn.MIN_WRITER_VERSION; - It has a generation expression in the column metadata.
- def getGeneratedColumnsAndColumnsUsedByGeneratedColumns(schema: StructType): Set[String]
-
def
getGenerationExpression(field: StructField): Option[Expression]
Return the generation expression from a field if any.
Return the generation expression from a field if any. This method doesn't check the protocl. The caller should make sure the table writer protocol meets
satisfyGeneratedColumnProtocolbefore calling method. -
def
getGenerationExpressionStr(metadata: Metadata): Option[String]
Return the generation expression from a field metadata if any.
-
def
getOptimizablePartitionExpressions(schema: StructType, partitionSchema: StructType): Map[String, Seq[OptimizablePartitionExpression]]
Try to get
OptimizablePartitionExpressions of a data column when a partition column is defined as a generated column and refers to this data column.Try to get
OptimizablePartitionExpressions of a data column when a partition column is defined as a generated column and refers to this data column.- schema
the table schema
- partitionSchema
the partition schema. If a partition column is defined as a generated column, its column metadata should contain the generation expression.
-
def
hasGeneratedColumns(schema: StructType): Boolean
Whether any generation expressions exist in the schema.
Whether any generation expressions exist in the schema. Note: this doesn't mean the table contains generated columns. A table has generated columns only if its protocol satisfies Generated Column (listed in Table Features or supported implicitly) and some of columns in the table schema contain generation expressions. Use
enforcesGeneratedColumnsto check generated column tables instead. -
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
improveUnsupportedOpError(f: ⇒ Unit): Unit
- Attributes
- protected
- Definition Classes
- AnalysisHelper
-
def
initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
isGeneratedColumn(protocol: Protocol, field: StructField): Boolean
Whether a column is a generated column.
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
log: Logger
- Attributes
- protected
- Definition Classes
- Logging
-
def
logConsole(line: String): Unit
- Definition Classes
- DatabricksLogging
-
def
logDebug(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logDebug(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logDebug(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logError(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logError(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logInfo(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logInfo(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logName: String
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logTrace(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logTrace(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logWarning(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logWarning(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- def partitionFilterOptimizationEnabled(spark: SparkSession): Boolean
-
def
recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null, path: Option[Path] = None): Unit
Used to record the occurrence of a single event or report detailed, operation specific statistics.
Used to record the occurrence of a single event or report detailed, operation specific statistics.
- path
Used to log the path of the delta table when
deltaLogis null.
- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A
Used to report the duration as well as the success or failure of an operation on a
deltaLog.Used to report the duration as well as the success or failure of an operation on a
deltaLog.- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordDeltaOperationForTablePath[A](tablePath: String, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A
Used to report the duration as well as the success or failure of an operation on a
tahoePath.Used to report the duration as well as the success or failure of an operation on a
tahoePath.- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
-
def
recordFrameProfile[T](group: String, name: String)(thunk: ⇒ T): T
- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = METRIC_OPERATION_DURATION, silent: Boolean = true)(thunk: ⇒ S): S
- Definition Classes
- DatabricksLogging
-
def
recordProductEvent(metric: MetricDefinition with CentralizableMetric, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
-
def
recordProductUsage(metric: MetricDefinition with CentralizableMetric, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
-
def
recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
-
def
resolveReferencesForExpressions(sparkSession: SparkSession, exprs: Seq[Expression], planProvidingAttrs: LogicalPlan): Seq[Expression]
Resolve expressions using the attributes provided by
planProvidingAttrs.Resolve expressions using the attributes provided by
planProvidingAttrs. Throw an error if failing to resolve any expressions.- Attributes
- protected
- Definition Classes
- AnalysisHelper
- def satisfyGeneratedColumnProtocol(protocol: Protocol): Boolean
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toDataset(sparkSession: SparkSession, logicalPlan: LogicalPlan): Dataset[Row]
- Attributes
- protected
- Definition Classes
- AnalysisHelper
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
tryResolveReferences(sparkSession: SparkSession)(expr: Expression, planContainingExpr: LogicalPlan): Expression
- Attributes
- protected
- Definition Classes
- AnalysisHelper
-
def
tryResolveReferencesForExpressions(sparkSession: SparkSession)(exprs: Seq[Expression], plansProvidingAttrs: Seq[LogicalPlan]): Seq[Expression]
Resolve expressions using the attributes provided by
planProvidingAttrs, ignoring errors.Resolve expressions using the attributes provided by
planProvidingAttrs, ignoring errors.- Attributes
- protected
- Definition Classes
- AnalysisHelper
-
def
tryResolveReferencesForExpressions(sparkSession: SparkSession, exprs: Seq[Expression], planContainingExpr: LogicalPlan): Seq[Expression]
- Attributes
- protected
- Definition Classes
- AnalysisHelper
-
def
validateColumnReferences(spark: SparkSession, fieldName: String, expression: Expression, schema: StructType): Unit
SPARK-27561 added support for lateral column alias.
SPARK-27561 added support for lateral column alias. This means generation expressions that reference other generated columns no longer fail analysis in
validateGeneratedColumns.This method checks for and throws an error if: - A generated column references itself - A generated column references another generated column
-
def
validateGeneratedColumns(spark: SparkSession, schema: StructType): Unit
If the schema contains generated columns, check the following unsupported cases: - Refer to a non-existent column or another generated column.
If the schema contains generated columns, check the following unsupported cases: - Refer to a non-existent column or another generated column. - Use an unsupported expression. - The expression type is not the same as the column type.
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: ⇒ T): T
Report a log to indicate some command is running.
Report a log to indicate some command is running.
- Definition Classes
- DeltaProgressReporter