case class MergeIntoCommand(source: LogicalPlan, target: LogicalPlan, catalogTable: Option[CatalogTable], targetFileIndex: TahoeFileIndex, condition: Expression, matchedClauses: Seq[DeltaMergeIntoMatchedClause], notMatchedClauses: Seq[DeltaMergeIntoNotMatchedClause], notMatchedBySourceClauses: Seq[DeltaMergeIntoNotMatchedBySourceClause], migratedSchema: Option[StructType], trackHighWaterMarks: Set[String] = Set.empty, schemaEvolutionEnabled: Boolean = false) extends LogicalPlan with MergeIntoCommandBase with InsertOnlyMergeExecutor with ClassicMergeExecutor with Product with Serializable
Performs a merge of a source query/table into a Delta table.
Issues an error message when the ON search_condition of the MERGE statement can match a single row from the target table with multiple rows of the source table-reference.
Algorithm:
Phase 1: Find the input files in target that are touched by the rows that satisfy the condition and verify that no two source rows match with the same target row. This is implemented as an inner-join using the given condition. See ClassicMergeExecutor for more details.
Phase 2: Read the touched files again and write new files with updated and/or inserted rows.
Phase 3: Use the Delta protocol to atomically remove the touched files and add the new files.
- source
Source data to merge from
- target
Target table to merge into
- targetFileIndex
TahoeFileIndex of the target table
- condition
Condition for a source row to match with a target row
- matchedClauses
All info related to matched clauses.
- notMatchedClauses
All info related to not matched clauses.
- notMatchedBySourceClauses
All info related to not matched by source clauses.
- migratedSchema
The final schema of the target - may be changed by schema evolution.
- trackHighWaterMarks
The column names for which we will track IDENTITY high water marks.
- Alphabetic
- By Inheritance
- MergeIntoCommand
- Serializable
- Serializable
- ClassicMergeExecutor
- InsertOnlyMergeExecutor
- MergeOutputGeneration
- MergeIntoCommandBase
- SupportsNonDeterministicExpression
- UpdateExpressionsSupport
- AnalysisHelper
- MergeIntoMaterializeSource
- DeltaSparkPlanUtils
- ImplicitMetadataOperation
- PredicateHelper
- AliasHelper
- DeltaCommand
- DeltaLogging
- DatabricksLogging
- DeltaProgressReporter
- LoggingShims
- LeafRunnableCommand
- LeafLike
- RunnableCommand
- Command
- LogicalPlan
- Logging
- QueryPlanConstraints
- ConstraintHelper
- LogicalPlanDistinctKeys
- LogicalPlanStats
- AnalysisHelper
- QueryPlan
- SQLConfHelper
- TreeNode
- WithOrigin
- TreePatternBits
- Product
- Equals
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
MergeIntoCommand(source: LogicalPlan, target: LogicalPlan, catalogTable: Option[CatalogTable], targetFileIndex: TahoeFileIndex, condition: Expression, matchedClauses: Seq[DeltaMergeIntoMatchedClause], notMatchedClauses: Seq[DeltaMergeIntoNotMatchedClause], notMatchedBySourceClauses: Seq[DeltaMergeIntoNotMatchedBySourceClause], migratedSchema: Option[StructType], trackHighWaterMarks: Set[String] = Set.empty, schemaEvolutionEnabled: Boolean = false)
- source
Source data to merge from
- target
Target table to merge into
- targetFileIndex
TahoeFileIndex of the target table
- condition
Condition for a source row to match with a target row
- matchedClauses
All info related to matched clauses.
- notMatchedClauses
All info related to not matched clauses.
- notMatchedBySourceClauses
All info related to not matched by source clauses.
- migratedSchema
The final schema of the target - may be changed by schema evolution.
- trackHighWaterMarks
The column names for which we will track IDENTITY high water marks.
Type Members
-
implicit
class
LogStringContext extends AnyRef
- Definition Classes
- LoggingShims
-
case class
UpdateOperation(targetColNameParts: Seq[String], updateExpr: Expression) extends Product with Serializable
Specifies an operation that updates a target column with the given expression.
Specifies an operation that updates a target column with the given expression. The target column may or may not be a nested field and it is specified as a full quoted name or as a sequence of split into parts.
- Definition Classes
- UpdateExpressionsSupport
-
type
PlanOrExpression = Either[LogicalPlan, Expression]
- Definition Classes
- DeltaSparkPlanUtils
-
case class
ProcessedClause(condition: Option[Expression], actions: Seq[Expression]) extends Product with Serializable
Represents a merge clause after its condition and action expressions have been processed before generating the final output expression.
Represents a merge clause after its condition and action expressions have been processed before generating the final output expression.
- condition
Optional precomputed condition.
- actions
List of output expressions generated from every action of the clause.
- Attributes
- protected
- Definition Classes
- MergeOutputGeneration
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
lazy val
allAttributes: AttributeSeq
- Definition Classes
- QueryPlan
-
def
allowNonDeterministicExpression: Boolean
Returns whether it allows non-deterministic expressions.
Returns whether it allows non-deterministic expressions.
- Definition Classes
- MergeIntoCommandBase → SupportsNonDeterministicExpression
-
def
analyzed: Boolean
- Definition Classes
- AnalysisHelper
-
def
apply(number: Int): TreeNode[_]
- Definition Classes
- TreeNode
-
def
argString(maxFields: Int): String
- Definition Classes
- TreeNode
-
def
asCode: String
- Definition Classes
- TreeNode
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
assertNotAnalysisRule(): Unit
- Attributes
- protected
- Definition Classes
- AnalysisHelper
-
val
attempt: Int
Track which attempt or retry it is in runWithMaterializedSourceAndRetries
Track which attempt or retry it is in runWithMaterializedSourceAndRetries
- Attributes
- protected
- Definition Classes
- MergeIntoMaterializeSource
-
lazy val
baseMetrics: Map[String, SQLMetric]
- Definition Classes
- MergeIntoCommandBase
-
def
buildBalancedPredicate(expressions: Seq[Expression], op: (Expression, Expression) ⇒ Expression): Expression
- Attributes
- protected
- Definition Classes
- PredicateHelper
-
def
buildBaseRelation(spark: SparkSession, txn: OptimisticTransaction, actionType: String, rootPath: Path, inputLeafFiles: Seq[String], nameToAddFileMap: Map[String, AddFile]): HadoopFsRelation
Build a base relation of files that need to be rewritten as part of an update/delete/merge operation.
Build a base relation of files that need to be rewritten as part of an update/delete/merge operation.
- Attributes
- protected
- Definition Classes
- DeltaCommand
-
def
buildTargetPlanWithFiles(spark: SparkSession, deltaTxn: OptimisticTransaction, files: Seq[AddFile], columnsToDrop: Seq[String]): LogicalPlan
Builds a new logical plan to read the given
filesinstead of the whole target table.Builds a new logical plan to read the given
filesinstead of the whole target table. The plan returned has the same output columns (exprIds) as thetargetlogical plan, so that existing update/insert expressions can be applied on this new plan. Unneeded non-partition columns may be dropped.- Attributes
- protected
- Definition Classes
- MergeIntoCommandBase
-
def
buildTargetPlanWithIndex(spark: SparkSession, fileIndex: TahoeFileIndex, columnsToDrop: Seq[String]): LogicalPlan
Builds a new logical plan to read the target table using the given
fileIndex.Builds a new logical plan to read the target table using the given
fileIndex. The plan returned has the same output columns (exprIds) as thetargetlogical plan, so that existing update/insert expressions can be applied on this new plan.- columnsToDrop
unneeded non-partition columns to be dropped
- Attributes
- protected
- Definition Classes
- MergeIntoCommandBase
-
def
canEvaluate(expr: Expression, plan: LogicalPlan): Boolean
- Attributes
- protected
- Definition Classes
- PredicateHelper
-
def
canEvaluateWithinJoin(expr: Expression): Boolean
- Attributes
- protected
- Definition Classes
- PredicateHelper
-
val
canMergeSchema: Boolean
- Definition Classes
- MergeIntoCommandBase → ImplicitMetadataOperation
-
val
canOverwriteSchema: Boolean
- Definition Classes
- MergeIntoCommandBase → ImplicitMetadataOperation
-
final
lazy val
canonicalized: LogicalPlan
- Definition Classes
- QueryPlan
- Annotations
- @transient()
-
def
castIfNeeded(fromExpression: Expression, dataType: DataType, allowStructEvolution: Boolean, columnName: String): Expression
Add a cast to the child expression if it differs from the specified data type.
Add a cast to the child expression if it differs from the specified data type. Note that structs here are cast by name, rather than the Spark SQL default of casting by position.
- fromExpression
the expression to cast
- dataType
The data type to cast to.
- allowStructEvolution
Whether to allow structs to evolve. When this is false (default), struct casting will throw an error if the target struct type contains more fields than the expression to cast.
- columnName
The name of the column written to. It is used for the error message.
- Attributes
- protected
- Definition Classes
- UpdateExpressionsSupport
- val catalogTable: Option[CatalogTable]
-
def
checkIdentityColumnHighWaterMarks(deltaTxn: OptimisticTransaction): Unit
Verify that the high water marks used by the identity column generators still match the the high water marks in the version of the table read by the current transaction.
Verify that the high water marks used by the identity column generators still match the the high water marks in the version of the table read by the current transaction. These high water marks were determined during analysis in PreprocessTableMerge, which runs outside of the current transaction, so they may no longer be valid.
- Attributes
- protected
- Definition Classes
- MergeIntoCommandBase
-
def
checkNonDeterministicSource(spark: SparkSession): Unit
Throws an exception if merge metrics indicate that the source table changed between the first and the second source table scans.
Throws an exception if merge metrics indicate that the source table changed between the first and the second source table scans.
- Attributes
- protected
- Definition Classes
- MergeIntoCommandBase
-
final
def
children: Seq[LogicalPlan]
- Definition Classes
- LeafLike
-
def
childrenResolved: Boolean
- Definition Classes
- LogicalPlan
-
def
clauseDisjunction(clauses: Seq[DeltaMergeIntoClause]): Expression
Helper function that produces an expression by combining a sequence of clauses with OR.
Helper function that produces an expression by combining a sequence of clauses with OR. Requires the sequence to be non-empty.
- Attributes
- protected
- Definition Classes
- ClassicMergeExecutor
-
def
clone(): LogicalPlan
- Definition Classes
- AnalysisHelper → TreeNode → AnyRef
-
def
collect[B](pf: PartialFunction[LogicalPlan, B]): Seq[B]
- Definition Classes
- TreeNode
-
def
collectFirst[In, Out](input: Iterable[In], recurse: (In) ⇒ Option[Out]): Option[Out]
- Attributes
- protected
- Definition Classes
- DeltaSparkPlanUtils
-
def
collectFirst[B](pf: PartialFunction[LogicalPlan, B]): Option[B]
- Definition Classes
- TreeNode
-
def
collectLeaves(): Seq[LogicalPlan]
- Definition Classes
- TreeNode
-
def
collectMergeStats(deltaTxn: OptimisticTransaction, materializeSourceReason: MergeIntoMaterializeSourceReason, commitVersion: Option[Long], numRecordsStats: NumRecordsStats): MergeStats
Collects the merge operation stats and metrics into a MergeStats object that can be recorded with
recordDeltaEvent.Collects the merge operation stats and metrics into a MergeStats object that can be recorded with
recordDeltaEvent. Merge stats should be collected after committing all new actions as metrics may still be updated during commit.- Attributes
- protected
- Definition Classes
- MergeIntoCommandBase
-
def
collectWithSubqueries[B](f: PartialFunction[LogicalPlan, B]): Seq[B]
- Definition Classes
- QueryPlan
-
val
condition: Expression
- Definition Classes
- MergeIntoCommand → MergeIntoCommandBase
-
def
conf: SQLConf
- Definition Classes
- SQLConfHelper
-
lazy val
constraints: ExpressionSet
- Definition Classes
- QueryPlanConstraints
-
def
constructIsNotNullConstraints(constraints: ExpressionSet, output: Seq[Attribute]): ExpressionSet
- Definition Classes
- ConstraintHelper
-
final
def
containsAllPatterns(patterns: TreePattern*): Boolean
- Definition Classes
- TreePatternBits
-
final
def
containsAnyPattern(patterns: TreePattern*): Boolean
- Definition Classes
- TreePatternBits
-
lazy val
containsChild: Set[TreeNode[_]]
- Definition Classes
- TreeNode
-
def
containsDeterministicUDF(expr: Expression): Boolean
Returns whether an expression contains any deterministic UDFs.
Returns whether an expression contains any deterministic UDFs.
- Definition Classes
- DeltaSparkPlanUtils
-
def
containsDeterministicUDF(predicates: Seq[DeltaTableReadPredicate], partitionedOnly: Boolean): Boolean
Returns whether the read predicates of a transaction contain any deterministic UDFs.
Returns whether the read predicates of a transaction contain any deterministic UDFs.
- Definition Classes
- DeltaSparkPlanUtils
-
final
def
containsPattern(t: TreePattern): Boolean
- Definition Classes
- TreePatternBits
- Annotations
- @inline()
-
def
copyTagsFrom(other: LogicalPlan): Unit
- Definition Classes
- TreeNode
-
def
createSetTransaction(sparkSession: SparkSession, deltaLog: DeltaLog, options: Option[DeltaOptions] = None): Option[SetTransaction]
Returns SetTransaction if a valid app ID and version are present.
Returns SetTransaction if a valid app ID and version are present. Otherwise returns an empty list.
- Attributes
- protected
- Definition Classes
- DeltaCommand
-
def
deltaAssert(check: ⇒ Boolean, name: String, msg: String, deltaLog: DeltaLog = null, data: AnyRef = null, path: Option[Path] = None): Unit
Helper method to check invariants in Delta code.
Helper method to check invariants in Delta code. Fails when running in tests, records a delta assertion event and logs a warning otherwise.
- Attributes
- protected
- Definition Classes
- DeltaLogging
-
lazy val
deterministic: Boolean
- Definition Classes
- QueryPlan
-
lazy val
distinctKeys: Set[ExpressionSet]
- Definition Classes
- LogicalPlanDistinctKeys
-
def
doCanonicalize(): LogicalPlan
- Attributes
- protected
- Definition Classes
- QueryPlan
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
exists(f: (LogicalPlan) ⇒ Boolean): Boolean
- Definition Classes
- TreeNode
-
final
def
expressions: Seq[Expression]
- Definition Classes
- QueryPlan
-
def
extractPredicatesWithinOutputSet(condition: Expression, outputSet: AttributeSet): Option[Expression]
- Attributes
- protected
- Definition Classes
- PredicateHelper
-
def
fastEquals(other: TreeNode[_]): Boolean
- Definition Classes
- TreeNode
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
find(f: (LogicalPlan) ⇒ Boolean): Option[LogicalPlan]
- Definition Classes
- TreeNode
-
def
findExpressionAndTrackLineageDown(exp: Expression, plan: LogicalPlan): Option[(Expression, LogicalPlan)]
- Definition Classes
- PredicateHelper
-
def
findFirstNonDeltaScan(source: LogicalPlan): Option[LogicalPlan]
- Attributes
- protected
- Definition Classes
- DeltaSparkPlanUtils
-
def
findFirstNonDeterministicChildNode(children: Seq[Expression], checkDeterministicOptions: CheckDeterministicOptions): Option[PlanOrExpression]
- Attributes
- protected
- Definition Classes
- DeltaSparkPlanUtils
-
def
findFirstNonDeterministicNode(child: Expression, checkDeterministicOptions: CheckDeterministicOptions): Option[PlanOrExpression]
- Attributes
- protected
- Definition Classes
- DeltaSparkPlanUtils
-
def
findFirstNonDeterministicNode(plan: LogicalPlan, checkDeterministicOptions: CheckDeterministicOptions): Option[PlanOrExpression]
Returns a part of the
planthat does not have a safe level of determinism.Returns a part of the
planthat does not have a safe level of determinism. This is a conservative approximation ofplanbeing a truly deterministic query.- Attributes
- protected
- Definition Classes
- DeltaSparkPlanUtils
-
def
findTouchedFiles(spark: SparkSession, deltaTxn: OptimisticTransaction): (Seq[AddFile], DeduplicateCDFDeletes)
Find the target table files that contain the rows that satisfy the merge condition.
Find the target table files that contain the rows that satisfy the merge condition. This is implemented as an inner-join between the source query/table and the target table using the merge condition.
- Attributes
- protected
- Definition Classes
- ClassicMergeExecutor
-
def
flatMap[A](f: (LogicalPlan) ⇒ TraversableOnce[A]): Seq[A]
- Definition Classes
- TreeNode
-
def
foreach(f: (LogicalPlan) ⇒ Unit): Unit
- Definition Classes
- TreeNode
-
def
foreachUp(f: (LogicalPlan) ⇒ Unit): Unit
- Definition Classes
- TreeNode
-
def
formattedNodeName: String
- Attributes
- protected
- Definition Classes
- QueryPlan
-
def
generateAllActionExprs(targetWriteCols: Seq[Expression], rowIdColumnExpressionOpt: Option[NamedExpression], rowCommitVersionColumnExpressionOpt: Option[NamedExpression], clausesWithPrecompConditions: Seq[DeltaMergeIntoClause], cdcEnabled: Boolean, shouldCountDeletedRows: Boolean): Seq[ProcessedClause]
Generate expressions for every output column and every merge clause based on the corresponding UPDATE, DELETE and/or INSERT action(s).
Generate expressions for every output column and every merge clause based on the corresponding UPDATE, DELETE and/or INSERT action(s).
- targetWriteCols
List of output column expressions from the target table. Used to generate CDC data for DELETE.
- rowIdColumnExpressionOpt
The optional Row ID preservation column with the physical Row ID name, it stores stable Row IDs of the table.
- rowCommitVersionColumnExpressionOpt
The optional Row Commit Version preservation column with the physical Row Commit Version name, it stores stable Row Commit Versions.
- clausesWithPrecompConditions
List of merge clauses with precomputed conditions. Action expressions are generated for each of these clauses.
- cdcEnabled
Whether the generated expressions should include CDC information.
- shouldCountDeletedRows
Whether metrics for number of deleted rows should be incremented here.
- returns
For each merge clause, a list of ProcessedClause each with a precomputed condition and N+2 action expressions (N output columns + ROW_DROPPED_COL + CDC_TYPE_COLUMN_NAME) to apply on a row when that clause matches.
- Attributes
- protected
- Definition Classes
- MergeOutputGeneration
-
def
generateCandidateFileMap(basePath: Path, candidateFiles: Seq[AddFile]): Map[String, AddFile]
Generates a map of file names to add file entries for operations where we will need to rewrite files such as delete, merge, update.
Generates a map of file names to add file entries for operations where we will need to rewrite files such as delete, merge, update. We expect file names to be unique, because each file contains a UUID.
- Definition Classes
- DeltaCommand
-
def
generateCdcAndOutputRows(sourceDf: DataFrame, outputCols: Seq[Column], outputColNames: Seq[String], noopCopyExprs: Seq[Expression], rowIdColumnNameOpt: Option[String], rowCommitVersionColumnNameOpt: Option[String], deduplicateDeletes: DeduplicateCDFDeletes): DataFrame
Build the full output as an array of packed rows, then explode into the final result.
Build the full output as an array of packed rows, then explode into the final result. Based on the CDC type as originally marked, we produce both rows for the CDC_TYPE_NOT_CDC partition to be written to the main table and rows for the CDC partitions to be written as CDC files.
See CDCReader for general details on how partitioning on the CDC type column works.
- Attributes
- protected
- Definition Classes
- MergeOutputGeneration
-
def
generateClauseOutputExprs(numOutputCols: Int, clauses: Seq[ProcessedClause], noopExprs: Seq[Expression]): Seq[Expression]
Generate the output expression for each output column to apply the correct action for a type of merge clause.
Generate the output expression for each output column to apply the correct action for a type of merge clause. For each output column, the resulting expression dispatches the correct action based on all clause conditions.
- numOutputCols
Number of output columns.
- clauses
List of preprocessed merge clauses to bind together.
- noopExprs
Default expression to apply when no condition holds.
- returns
A list of one expression per output column to apply for a type of merge clause.
- Attributes
- protected
- Definition Classes
- MergeOutputGeneration
-
def
generateFilterForModifiedRows(): Expression
Returns the expression that can be used for selecting the modified rows generated by the merge operation.
Returns the expression that can be used for selecting the modified rows generated by the merge operation. The expression is to designed to work irrespectively of the join type used between the source and target tables.
The expression consists of two parts, one for each of the action clause types that produce row modifications: MATCHED, NOT MATCHED BY SOURCE. All actions of the same clause type form a disjunctive clause. The result is then conjucted to an expression that filters the rows of the particular action clause type. For example:
MERGE INTO t USING s ON s.id = t.id WHEN MATCHED AND id < 5 THEN ... WHEN MATCHED AND id > 10 THEN ... WHEN NOT MATCHED BY SOURCE AND id > 20 THEN ...
Produces the following expression:
((as.id = t.id) AND (id < 5 OR id > 10)) OR ((SOURCE TABLE IS NULL) AND (id > 20))
- Attributes
- protected
- Definition Classes
- ClassicMergeExecutor
-
def
generateFilterForNewRows(): Expression
Returns the expression that can be used for selecting the new rows generated by the merge operation.
Returns the expression that can be used for selecting the new rows generated by the merge operation.
- Attributes
- protected
- Definition Classes
- ClassicMergeExecutor
-
def
generatePrecomputedConditionsAndDF(sourceDF: DataFrame, clauses: Seq[DeltaMergeIntoClause]): (DataFrame, Seq[DeltaMergeIntoClause])
Precompute conditions in MATCHED and NOT MATCHED clauses and generate the source data frame with precomputed boolean columns.
Precompute conditions in MATCHED and NOT MATCHED clauses and generate the source data frame with precomputed boolean columns.
- sourceDF
the source DataFrame.
- clauses
the merge clauses to precompute.
- returns
Generated sourceDF with precomputed boolean columns, matched clauses with possible rewritten clause conditions, insert clauses with possible rewritten clause conditions
- Attributes
- protected
- Definition Classes
- MergeOutputGeneration
-
def
generateTreeString(depth: Int, lastChildren: ArrayList[Boolean], append: (String) ⇒ Unit, verbose: Boolean, prefix: String, addSuffix: Boolean, maxFields: Int, printNodeId: Boolean, indent: Int): Unit
- Definition Classes
- TreeNode
-
def
generateUpdateExpressions(targetSchema: StructType, defaultExprs: Seq[NamedExpression], nameParts: Seq[Seq[String]], updateExprs: Seq[Expression], resolver: Resolver, generatedColumns: Seq[StructField]): Seq[Option[Expression]]
See docs on overloaded method.
See docs on overloaded method.
- Attributes
- protected
- Definition Classes
- UpdateExpressionsSupport
-
def
generateUpdateExpressions(targetSchema: StructType, updateOps: Seq[UpdateOperation], defaultExprs: Seq[NamedExpression], resolver: Resolver, pathPrefix: Seq[String] = Nil, allowSchemaEvolution: Boolean = false, generatedColumns: Seq[StructField] = Nil): Seq[Option[Expression]]
Given a target schema and a set of update operations, generate a list of update expressions, which are aligned with the given schema.
Given a target schema and a set of update operations, generate a list of update expressions, which are aligned with the given schema.
For update operations to nested struct fields, this method recursively walks down schema tree and apply the update expressions along the way. For example, assume table
targethas the following schema: s1 struct<a: int, b: int, c: int>, s2 struct<a: int, b: int>, z intGiven an update command:
- UPDATE target SET s1.a = 1, s1.b = 2, z = 3
this method works as follows:
generateUpdateExpressions( targetSchema=[s1,s2,z], defaultExprs=[s1,s2, z], updateOps=[(s1.a, 1), (s1.b, 2), (z, 3)]) -> generates expression for s1 - build recursively from child assignments generateUpdateExpressions( targetSchema=[a,b,c], defaultExprs=[a, b, c], updateOps=[(a, 1),(b, 2)], pathPrefix=["s1"]) end-of-recursion -> returns (1, 2, a.c) -> generates expression for s2 - no child assignment and no update expression: use default expression
s2-> generates expression for z - use available update expression3-> returns ((1, 2, a.c), s2, 3)- targetSchema
schema to follow to generate update expressions. Due to schema evolution, it may contain additional columns or fields not present in the original table schema.
- updateOps
a set of update operations.
- defaultExprs
the expressions to use when no update operation is provided for a column or field. This is typically the output from the base table.
- pathPrefix
the path from root to the current (nested) column. Only used for printing out full column path in error messages.
- allowSchemaEvolution
Whether to allow generating expressions for new columns or fields added by schema evolution.
- generatedColumns
the list of the generated columns in the table. When a column is a generated column and the user doesn't provide a update expression, its update expression in the return result will be None. If
generatedColumnsis empty, any of the options in the return result must be non-empty.- returns
a sequence of expression options. The elements in the sequence are options because when a column is a generated column but the user doesn't provide an update expression for this column, we need to generate the update expression according to the generated column definition. But this method doesn't have enough context to do that. Hence, we return a
Nonefor this case so that the caller knows it should generate the update expression for such column. For other cases, we will always return Some(expr).
- Attributes
- protected
- Definition Classes
- UpdateExpressionsSupport
-
def
generateUpdateExprsForGeneratedColumns(updateTarget: LogicalPlan, generatedColumns: Seq[StructField], updateExprs: Seq[Option[Expression]], postEvolutionTargetSchema: Option[StructType] = None): Seq[Expression]
Generate update expressions for generated columns that the user doesn't provide a update expression.
Generate update expressions for generated columns that the user doesn't provide a update expression. For each item in
updateExprsthat's None, we will find its generation expression fromgeneratedColumns. In order to resolve this generation expression, we will create a fake Project which contains all update expressions and resolve the generation expression with this project. Source columns of a generation expression will also be replaced with their corresponding update expressions.For example, given a table that has a generated column
gdefined asc1 + 10. For the following update command:UPDATE target SET c1 = c2 + 100, c2 = 1000
We will generate the update expression
(c2 + 100) + 10for columng. Note: in this update expression, we should use the oldc2attribute rather than its new value 1000.- updateTarget
The logical plan of the table to be updated.
- generatedColumns
A list of generated columns.
- updateExprs
The aligned (with
postEvolutionTargetSchemaif not None, orupdateTarget.outputotherwise) update actions.- postEvolutionTargetSchema
In case of UPDATE in MERGE when schema evolution happened, this is the final schema of the target table. This might not be the same as the output of
updateTarget.- returns
a sequence of update expressions for all of columns in the table.
- Attributes
- protected
- Definition Classes
- UpdateExpressionsSupport
-
def
generateWriteAllChangesOutputCols(targetWriteCols: Seq[Expression], rowIdColumnExpressionOpt: Option[NamedExpression], rowCommitVersionColumnExpressionOpt: Option[NamedExpression], targetWriteColNames: Seq[String], noopCopyExprs: Seq[Expression], clausesWithPrecompConditions: Seq[DeltaMergeIntoClause], cdcEnabled: Boolean, shouldCountDeletedRows: Boolean = true): IndexedSeq[Column]
Generate the expressions to process full-outer join output and generate target rows.
Generate the expressions to process full-outer join output and generate target rows.
To generate these N + 2 columns, we generate N + 2 expressions and apply them on the joinedDF. The CDC column will be either used for CDC generation or dropped before performing the final write, and the other column will always be dropped after executing the increment metric expression and filtering on ROW_DROPPED_COL.
- Attributes
- protected
- Definition Classes
- MergeOutputGeneration
-
def
getAliasMap(exprs: Seq[NamedExpression]): AttributeMap[Alias]
- Attributes
- protected
- Definition Classes
- AliasHelper
-
def
getAliasMap(plan: Aggregate): AttributeMap[Alias]
- Attributes
- protected
- Definition Classes
- AliasHelper
-
def
getAliasMap(plan: Project): AttributeMap[Alias]
- Attributes
- protected
- Definition Classes
- AliasHelper
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getCommonTags(deltaLog: DeltaLog, tahoeId: String): Map[TagDefinition, String]
- Definition Classes
- DeltaLogging
-
def
getDefaultTreePatternBits: BitSet
- Attributes
- protected
- Definition Classes
- TreeNode
-
def
getDeltaLog(spark: SparkSession, path: Option[String], tableIdentifier: Option[TableIdentifier], operationName: String, hadoopConf: Map[String, String] = Map.empty): DeltaLog
Utility method to return the DeltaLog of an existing Delta table referred by either the given path or tableIdentifier.
Utility method to return the DeltaLog of an existing Delta table referred by either the given path or tableIdentifier.
- spark
SparkSession reference to use.
- path
Table location. Expects a non-empty tableIdentifier or path.
- tableIdentifier
Table identifier. Expects a non-empty tableIdentifier or path.
- operationName
Operation that is getting the DeltaLog, used in error messages.
- hadoopConf
Hadoop file system options used to build DeltaLog.
- returns
DeltaLog of the table
- Attributes
- protected
- Definition Classes
- DeltaCommand
- Exceptions thrown
AnalysisExceptionIf either no Delta table exists at the given path/identifier or there is neither path nor tableIdentifier is provided.
-
def
getDeltaTable(target: LogicalPlan, cmd: String): DeltaTableV2
Extracts the DeltaTableV2 from a LogicalPlan iff the LogicalPlan is a ResolvedTable with either a DeltaTableV2 or a V1Table that is referencing a Delta table.
Extracts the DeltaTableV2 from a LogicalPlan iff the LogicalPlan is a ResolvedTable with either a DeltaTableV2 or a V1Table that is referencing a Delta table. In all other cases this method will throw a "Table not found" exception.
- Definition Classes
- DeltaCommand
-
def
getDeltaTablePathOrIdentifier(target: LogicalPlan, cmd: String): (Option[TableIdentifier], Option[String])
Helper method to extract the table id or path from a LogicalPlan representing a Delta table.
Helper method to extract the table id or path from a LogicalPlan representing a Delta table. This uses DeltaCommand.getDeltaTable to convert the LogicalPlan to a DeltaTableV2 and then extracts either the path or identifier from it. If the DeltaTableV2 has a CatalogTable, the table identifier will be returned. Otherwise, the table's path will be returned. Throws an exception if the LogicalPlan does not represent a Delta table.
- Definition Classes
- DeltaCommand
-
def
getErrorData(e: Throwable): Map[String, Any]
- Definition Classes
- DeltaLogging
-
def
getMergeSource: MergeSource
Returns the prepared merge source.
Returns the prepared merge source.
- Attributes
- protected
- Definition Classes
- MergeIntoMaterializeSource
-
def
getMetadataAttributeByName(name: String): AttributeReference
- Definition Classes
- LogicalPlan
-
def
getMetadataAttributeByNameOpt(name: String): Option[AttributeReference]
- Definition Classes
- LogicalPlan
-
final
def
getNewDomainMetadata(txn: OptimisticTransaction, canUpdateMetadata: Boolean, isReplacingTable: Boolean, clusterBySpecOpt: Option[ClusterBySpec] = None): Seq[DomainMetadata]
Returns a sequence of new DomainMetadata if canUpdateMetadata is true and the operation is either create table or replace the whole table (not replaceWhere operation).
Returns a sequence of new DomainMetadata if canUpdateMetadata is true and the operation is either create table or replace the whole table (not replaceWhere operation). This is because we only update Domain Metadata when creating or replacing table, and replace table for DDL and DataFrameWriterV2 are already handled in CreateDeltaTableCommand. In that case, canUpdateMetadata is false, so we don't update again.
- txn
OptimisticTransaction being used to create or replace table.
- canUpdateMetadata
true if the metadata is not updated yet.
- isReplacingTable
true if the operation is replace table without replaceWhere option.
- clusterBySpecOpt
optional ClusterBySpec containing user-specified clustering columns.
- Attributes
- protected
- Definition Classes
- ImplicitMetadataOperation
-
def
getTableCatalogTable(target: LogicalPlan, cmd: String): Option[CatalogTable]
Extracts CatalogTable metadata from a LogicalPlan if the plan is a ResolvedTable.
Extracts CatalogTable metadata from a LogicalPlan if the plan is a ResolvedTable. The table can be a non delta table.
- Definition Classes
- DeltaCommand
-
def
getTablePathOrIdentifier(target: LogicalPlan, cmd: String): (Option[TableIdentifier], Option[String])
Helper method to extract the table id or path from a LogicalPlan representing a resolved table or path.
Helper method to extract the table id or path from a LogicalPlan representing a resolved table or path. This calls getDeltaTablePathOrIdentifier if the resolved table is a delta table. For non delta table with identifier, we extract its identifier. For non delta table with path, it expects the path to be wrapped in an ResolvedPathBasedNonDeltaTable and extracts it from there.
- Definition Classes
- DeltaCommand
-
def
getTagValue[T](tag: TreeNodeTag[T]): Option[T]
- Definition Classes
- TreeNode
-
def
getTargetOnlyPredicates(spark: SparkSession): Seq[Expression]
- Attributes
- protected
- Definition Classes
- MergeIntoCommandBase
-
def
getTouchedFile(basePath: Path, escapedFilePath: String, nameToAddFileMap: Map[String, AddFile]): AddFile
Find the AddFile record corresponding to the file that was read as part of a delete/update/merge operation.
Find the AddFile record corresponding to the file that was read as part of a delete/update/merge operation.
- basePath
The path of the table. Must not be escaped.
- escapedFilePath
The path to a file that can be either absolute or relative. All special chars in this path must be already escaped by URI standards.
- nameToAddFileMap
Map generated through
generateCandidateFileMap().
- Definition Classes
- DeltaCommand
-
def
hasBeenExecuted(txn: OptimisticTransaction, sparkSession: SparkSession, options: Option[DeltaOptions] = None): Boolean
Returns true if there is information in the spark session that indicates that this write has already been successfully written.
Returns true if there is information in the spark session that indicates that this write has already been successfully written.
- Attributes
- protected
- Definition Classes
- DeltaCommand
-
def
hashCode(): Int
- Definition Classes
- TreeNode → AnyRef → Any
-
def
improveUnsupportedOpError(f: ⇒ Unit): Unit
- Attributes
- protected
- Definition Classes
- AnalysisHelper
-
def
includesDeletes: Boolean
Whether this merge statement includes delete statements.
Whether this merge statement includes delete statements.
- Attributes
- protected
- Definition Classes
- MergeIntoCommandBase
-
def
includesInserts: Boolean
Whether this merge statement includes inserts statements.
Whether this merge statement includes inserts statements.
- Attributes
- protected
- Definition Classes
- MergeIntoCommandBase
-
def
incrementMetricAndReturnBool(name: String, valueToReturn: Boolean): Expression
- returns
An
Expressionto increment a SQL metric
- Attributes
- protected
- Definition Classes
- MergeIntoCommandBase
-
def
incrementMetricsAndReturnBool(names: Seq[String], valueToReturn: Boolean): Expression
- returns
An
Expressionto increment SQL metrics
- Attributes
- protected
- Definition Classes
- MergeIntoCommandBase
-
def
inferAdditionalConstraints(constraints: ExpressionSet): ExpressionSet
- Definition Classes
- ConstraintHelper
-
def
initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
innerChildren: Seq[QueryPlan[_]]
- Definition Classes
- QueryPlan → TreeNode
-
def
inputSet: AttributeSet
- Definition Classes
- QueryPlan
-
final
def
invalidateStatsCache(): Unit
- Definition Classes
- LogicalPlanStats
-
def
isCanonicalizedPlan: Boolean
- Attributes
- protected
- Definition Classes
- QueryPlan
-
def
isCatalogTable(analyzer: Analyzer, tableIdent: TableIdentifier): Boolean
Use the analyzer to see whether the provided TableIdentifier is for a path based table or not
Use the analyzer to see whether the provided TableIdentifier is for a path based table or not
- analyzer
The session state analyzer to call
- tableIdent
Table Identifier to determine whether is path based or not
- returns
Boolean where true means that the table is a table in a metastore and false means the table is a path based table
- Definition Classes
- DeltaCommand
-
def
isCdcEnabled(deltaTxn: OptimisticTransaction): Boolean
- Attributes
- protected
- Definition Classes
- MergeIntoCommandBase
-
def
isInsertOnly: Boolean
Whether this merge statement only has only insert (NOT MATCHED) clauses.
Whether this merge statement only has only insert (NOT MATCHED) clauses.
- Attributes
- protected
- Definition Classes
- MergeIntoCommandBase
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
isLikelySelective(e: Expression): Boolean
- Definition Classes
- PredicateHelper
-
def
isMatchedOnly: Boolean
Whether this merge statement has only MATCHED clauses.
Whether this merge statement has only MATCHED clauses.
- Attributes
- protected
- Definition Classes
- MergeIntoCommandBase
-
def
isNullIntolerant(expr: Expression): Boolean
- Attributes
- protected
- Definition Classes
- PredicateHelper
-
val
isOnlyOneUnconditionalDelete: Boolean
- Attributes
- protected
- Definition Classes
- MergeIntoCommandBase
-
def
isPathIdentifier(tableIdent: TableIdentifier): Boolean
Checks if the given identifier can be for a delta table's path
Checks if the given identifier can be for a delta table's path
- tableIdent
Table Identifier for which to check
- Attributes
- protected
- Definition Classes
- DeltaCommand
-
def
isRuleIneffective(ruleId: RuleId): Boolean
- Attributes
- protected
- Definition Classes
- TreeNode
-
def
isStreaming: Boolean
- Definition Classes
- LogicalPlan
-
def
isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
jsonFields: List[JField]
- Attributes
- protected
- Definition Classes
- TreeNode
-
final
def
legacyWithNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan
- Attributes
- protected
- Definition Classes
- TreeNode
-
def
log: Logger
- Attributes
- protected
- Definition Classes
- Logging
-
def
logConsole(line: String): Unit
- Definition Classes
- DatabricksLogging
-
def
logDebug(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logDebug(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logDebug(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logError(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logError(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logInfo(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logInfo(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logName: String
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logTrace(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logTrace(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logWarning(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logWarning(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
makeCopy(newArgs: Array[AnyRef]): LogicalPlan
- Definition Classes
- TreeNode
-
def
map[A](f: (LogicalPlan) ⇒ A): Seq[A]
- Definition Classes
- TreeNode
-
final
def
mapChildren(f: (LogicalPlan) ⇒ LogicalPlan): LogicalPlan
- Definition Classes
- LeafLike
-
def
mapExpressions(f: (Expression) ⇒ Expression): MergeIntoCommand.this.type
- Definition Classes
- QueryPlan
-
def
mapProductIterator[B](f: (Any) ⇒ B)(implicit arg0: ClassTag[B]): Array[B]
- Attributes
- protected
- Definition Classes
- TreeNode
-
def
markRuleAsIneffective(ruleId: RuleId): Unit
- Attributes
- protected
- Definition Classes
- TreeNode
-
val
matchedClauses: Seq[DeltaMergeIntoMatchedClause]
- Definition Classes
- MergeIntoCommand → MergeIntoCommandBase
-
val
materializedSourceRDD: Option[RDD[InternalRow]]
If the source was materialized, reference to the checkpointed RDD.
If the source was materialized, reference to the checkpointed RDD.
- Attributes
- protected
- Definition Classes
- MergeIntoMaterializeSource
-
def
maxRows: Option[Long]
- Definition Classes
- LogicalPlan
-
def
maxRowsPerPartition: Option[Long]
- Definition Classes
- LogicalPlan
-
def
metadataOutput: Seq[Attribute]
- Definition Classes
- LogicalPlan
-
lazy val
metrics: Map[String, SQLMetric]
- Definition Classes
- MergeIntoCommandBase → RunnableCommand
-
val
migratedSchema: Option[StructType]
- Definition Classes
- MergeIntoCommand → MergeIntoCommandBase
-
final
def
missingInput: AttributeSet
- Definition Classes
- QueryPlan
-
def
multiTransformDown(rule: PartialFunction[LogicalPlan, Seq[LogicalPlan]]): Stream[LogicalPlan]
- Definition Classes
- TreeNode
-
def
multiTransformDownWithPruning(cond: (TreePatternBits) ⇒ Boolean, ruleId: RuleId)(rule: PartialFunction[LogicalPlan, Seq[LogicalPlan]]): Stream[LogicalPlan]
- Definition Classes
- TreeNode
-
val
multipleMatchDeleteOnlyOvercount: Option[Long]
- Attributes
- protected
- Definition Classes
- MergeIntoCommandBase
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
nodeName: String
- Definition Classes
- TreeNode
-
final
val
nodePatterns: Seq[TreePattern]
- Definition Classes
- Command → TreeNode
-
val
notMatchedBySourceClauses: Seq[DeltaMergeIntoNotMatchedBySourceClause]
- Definition Classes
- MergeIntoCommand → MergeIntoCommandBase
-
val
notMatchedClauses: Seq[DeltaMergeIntoNotMatchedClause]
- Definition Classes
- MergeIntoCommand → MergeIntoCommandBase
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
numberedTreeString: String
- Definition Classes
- TreeNode
-
val
origin: Origin
- Definition Classes
- TreeNode → WithOrigin
-
def
otherCopyArgs: Seq[AnyRef]
- Attributes
- protected
- Definition Classes
- TreeNode
-
val
output: Seq[Attribute]
- Definition Classes
- MergeIntoCommand → Command → QueryPlan
-
def
outputOrdering: Seq[SortOrder]
- Definition Classes
- QueryPlan
-
lazy val
outputSet: AttributeSet
- Definition Classes
- QueryPlan
- Annotations
- @transient()
-
def
outputWithNullability(output: Seq[Attribute], nonNullAttrExprIds: Seq[ExprId]): Seq[Attribute]
- Attributes
- protected
- Definition Classes
- PredicateHelper
-
def
p(number: Int): LogicalPlan
- Definition Classes
- TreeNode
-
def
parsePredicates(spark: SparkSession, predicate: String): Seq[Expression]
Converts string predicates into Expressions relative to a transaction.
Converts string predicates into Expressions relative to a transaction.
- Attributes
- protected
- Definition Classes
- DeltaCommand
- Exceptions thrown
AnalysisExceptionif a non-partition column is referenced.
-
val
performedSecondSourceScan: Boolean
- Attributes
- protected
- Definition Classes
- MergeIntoCommandBase
-
def
planContainsOnlyDeltaScans(source: LogicalPlan): Boolean
- Attributes
- protected
- Definition Classes
- DeltaSparkPlanUtils
-
def
planContainsUdf(plan: LogicalPlan): Boolean
- Attributes
- protected
- Definition Classes
- DeltaSparkPlanUtils
-
def
planIsDeterministic(plan: LogicalPlan, checkDeterministicOptions: CheckDeterministicOptions): Boolean
Returns
trueifplanhas a safe level of determinism.Returns
trueifplanhas a safe level of determinism. This is a conservative approximation ofplanbeing a truly deterministic query.- Attributes
- protected
- Definition Classes
- DeltaSparkPlanUtils
-
def
postEvolutionTargetExpressions(makeNullable: Boolean = false): Seq[NamedExpression]
Expressions to convert from a pre-evolution target row to the post-evolution target row.
Expressions to convert from a pre-evolution target row to the post-evolution target row. These expressions are used for columns that are not modified in updated rows or to copy rows that are not modified. There are two kinds of expressions here: * References to existing columns in the target dataframe. Note that these references may have a different data type than they originally did due to schema evolution so we add a cast that supports schema evolution. The references will be marked as nullable if
makeNullableis set to true, which allows the attributes to reference the output of an outer join. * Literal nulls, for new columns which are being added to the target table as part of this transaction, since new columns will have a value of null for all existing rows.- Attributes
- protected
- Definition Classes
- MergeIntoCommandBase
-
def
prepareMergeSource(spark: SparkSession, source: LogicalPlan, condition: Expression, matchedClauses: Seq[DeltaMergeIntoMatchedClause], notMatchedClauses: Seq[DeltaMergeIntoNotMatchedClause], isInsertOnly: Boolean): Unit
If source needs to be materialized, prepare the materialized dataframe in sourceDF Otherwise, prepare regular dataframe.
If source needs to be materialized, prepare the materialized dataframe in sourceDF Otherwise, prepare regular dataframe.
- returns
the source materialization reason
- Attributes
- protected
- Definition Classes
- MergeIntoCommandBase → MergeIntoMaterializeSource
-
def
prettyJson: String
- Definition Classes
- TreeNode
-
def
printSchema(): Unit
- Definition Classes
- QueryPlan
-
def
producedAttributes: AttributeSet
- Definition Classes
- Command → QueryPlan
-
def
recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null, path: Option[Path] = None): Unit
Used to record the occurrence of a single event or report detailed, operation specific statistics.
Used to record the occurrence of a single event or report detailed, operation specific statistics.
- path
Used to log the path of the delta table when
deltaLogis null.
- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A
Used to report the duration as well as the success or failure of an operation on a
deltaLog.Used to report the duration as well as the success or failure of an operation on a
deltaLog.- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordDeltaOperationForTablePath[A](tablePath: String, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A
Used to report the duration as well as the success or failure of an operation on a
tahoePath.Used to report the duration as well as the success or failure of an operation on a
tahoePath.- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
-
def
recordFrameProfile[T](group: String, name: String)(thunk: ⇒ T): T
- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordMergeOperation[A](extraOpType: String = "", status: String = null, sqlMetricName: String = null)(thunk: ⇒ A): A
Execute the given
thunkand return its result while recording the time taken to do it and setting additional local properties for better UI visibility.Execute the given
thunkand return its result while recording the time taken to do it and setting additional local properties for better UI visibility.- extraOpType
extra operation name recorded in the logs
- status
human readable status string describing what the thunk is doing
- sqlMetricName
name of SQL metric to update with the time taken by the thunk
- thunk
the code to execute
- Attributes
- protected
- Definition Classes
- MergeIntoCommandBase
-
def
recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = METRIC_OPERATION_DURATION, silent: Boolean = true)(thunk: ⇒ S): S
- Definition Classes
- DatabricksLogging
-
def
recordProductEvent(metric: MetricDefinition with CentralizableMetric, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
-
def
recordProductUsage(metric: MetricDefinition with CentralizableMetric, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
-
def
recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
-
lazy val
references: AttributeSet
- Definition Classes
- QueryPlan
- Annotations
- @transient()
-
def
refresh(): Unit
- Definition Classes
- LogicalPlan
-
def
removeFilesFromPaths(deltaLog: DeltaLog, nameToAddFileMap: Map[String, AddFile], filesToRewrite: Seq[String], operationTimestamp: Long): Seq[RemoveFile]
This method provides the RemoveFile actions that are necessary for files that are touched and need to be rewritten in methods like Delete, Update, and Merge.
This method provides the RemoveFile actions that are necessary for files that are touched and need to be rewritten in methods like Delete, Update, and Merge.
- deltaLog
The DeltaLog of the table that is being operated on
- nameToAddFileMap
A map generated using
generateCandidateFileMap.- filesToRewrite
Absolute paths of the files that were touched. We will search for these in
candidateFiles. Obtained as the output of theinput_file_namefunction.- operationTimestamp
The timestamp of the operation
- Attributes
- protected
- Definition Classes
- DeltaCommand
-
def
replaceAlias(expr: Expression, aliasMap: AttributeMap[Alias]): Expression
- Attributes
- protected
- Definition Classes
- AliasHelper
-
def
replaceAliasButKeepName(expr: NamedExpression, aliasMap: AttributeMap[Alias]): NamedExpression
- Attributes
- protected
- Definition Classes
- AliasHelper
-
def
resolve(nameParts: Seq[String], resolver: Resolver): Option[NamedExpression]
- Definition Classes
- LogicalPlan
-
def
resolve(schema: StructType, resolver: Resolver): Seq[Attribute]
- Definition Classes
- LogicalPlan
-
def
resolveChildren(nameParts: Seq[String], resolver: Resolver): Option[NamedExpression]
- Definition Classes
- LogicalPlan
-
def
resolveExpressions(r: PartialFunction[Expression, Expression]): LogicalPlan
- Definition Classes
- AnalysisHelper
-
def
resolveExpressionsWithPruning(cond: (TreePatternBits) ⇒ Boolean, ruleId: RuleId)(rule: PartialFunction[Expression, Expression]): LogicalPlan
- Definition Classes
- AnalysisHelper
-
def
resolveIdentifier(analyzer: Analyzer, identifier: TableIdentifier): LogicalPlan
Use the analyzer to resolve the identifier provided
Use the analyzer to resolve the identifier provided
- analyzer
The session state analyzer to call
- identifier
Table Identifier to determine whether is path based or not
- Attributes
- protected
- Definition Classes
- DeltaCommand
-
def
resolveOperators(rule: PartialFunction[LogicalPlan, LogicalPlan]): LogicalPlan
- Definition Classes
- AnalysisHelper
-
def
resolveOperatorsDown(rule: PartialFunction[LogicalPlan, LogicalPlan]): LogicalPlan
- Definition Classes
- AnalysisHelper
-
def
resolveOperatorsDownWithPruning(cond: (TreePatternBits) ⇒ Boolean, ruleId: RuleId)(rule: PartialFunction[LogicalPlan, LogicalPlan]): LogicalPlan
- Definition Classes
- AnalysisHelper
-
def
resolveOperatorsUp(rule: PartialFunction[LogicalPlan, LogicalPlan]): LogicalPlan
- Definition Classes
- AnalysisHelper
-
def
resolveOperatorsUpWithNewOutput(rule: PartialFunction[LogicalPlan, (LogicalPlan, Seq[(Attribute, Attribute)])]): LogicalPlan
- Definition Classes
- AnalysisHelper
-
def
resolveOperatorsUpWithPruning(cond: (TreePatternBits) ⇒ Boolean, ruleId: RuleId)(rule: PartialFunction[LogicalPlan, LogicalPlan]): LogicalPlan
- Definition Classes
- AnalysisHelper
-
def
resolveOperatorsWithPruning(cond: (TreePatternBits) ⇒ Boolean, ruleId: RuleId)(rule: PartialFunction[LogicalPlan, LogicalPlan]): LogicalPlan
- Definition Classes
- AnalysisHelper
-
def
resolveQuoted(name: String, resolver: Resolver): Option[NamedExpression]
- Definition Classes
- LogicalPlan
-
def
resolveReferencesForExpressions(sparkSession: SparkSession, exprs: Seq[Expression], planProvidingAttrs: LogicalPlan): Seq[Expression]
Resolve expressions using the attributes provided by
planProvidingAttrs.Resolve expressions using the attributes provided by
planProvidingAttrs. Throw an error if failing to resolve any expressions.- Attributes
- protected
- Definition Classes
- AnalysisHelper
-
lazy val
resolved: Boolean
- Definition Classes
- LogicalPlan
-
def
rewriteAttrs(attrMap: AttributeMap[Attribute]): LogicalPlan
- Definition Classes
- QueryPlan
-
def
run(spark: SparkSession): Seq[Row]
- Definition Classes
- MergeIntoCommandBase → RunnableCommand
-
def
runMerge(spark: SparkSession): Seq[Row]
- Attributes
- protected
- Definition Classes
- MergeIntoCommand → MergeIntoCommandBase
-
def
runWithMaterializedSourceLostRetries(spark: SparkSession, deltaLog: DeltaLog, metrics: Map[String, SQLMetric], runMergeFunc: (SparkSession) ⇒ Seq[Row]): Seq[Row]
Run the Merge with retries in case it detects an RDD block lost error of the materialized source RDD.
Run the Merge with retries in case it detects an RDD block lost error of the materialized source RDD. It will also record out of disk error, if such happens - possibly because of increased disk pressure from the materialized source RDD.
- Attributes
- protected
- Definition Classes
- MergeIntoMaterializeSource
-
def
sameOutput(other: LogicalPlan): Boolean
- Definition Classes
- LogicalPlan
-
final
def
sameResult(other: LogicalPlan): Boolean
- Definition Classes
- QueryPlan
-
lazy val
sc: SparkContext
- Attributes
- protected
- Definition Classes
- MergeIntoCommandBase
- Annotations
- @transient()
-
lazy val
schema: StructType
- Definition Classes
- QueryPlan
-
val
schemaEvolutionEnabled: Boolean
- Definition Classes
- MergeIntoCommand → MergeIntoCommandBase
-
def
schemaString: String
- Definition Classes
- QueryPlan
-
final
def
semanticHash(): Int
- Definition Classes
- QueryPlan
-
def
sendDriverMetrics(spark: SparkSession, metrics: Map[String, SQLMetric]): Unit
Send the driver-side metrics.
Send the driver-side metrics.
This is needed to make the SQL metrics visible in the Spark UI. All metrics are default initialized with 0 so that's what we're reporting in case we skip an already executed action.
- Attributes
- protected
- Definition Classes
- DeltaCommand
-
def
seqToString(exprs: Seq[Expression]): String
- Attributes
- protected
- Definition Classes
- MergeIntoCommandBase
-
def
setTagValue[T](tag: TreeNodeTag[T], value: T): Unit
- Definition Classes
- TreeNode
-
def
shouldMaterializeSource(spark: SparkSession, source: LogicalPlan, isInsertOnly: Boolean): (Boolean, MergeIntoMaterializeSourceReason)
- returns
pair of boolean whether source should be materialized and the source materialization reason
- Attributes
- protected
- Definition Classes
- MergeIntoMaterializeSource
-
def
shouldOptimizeMatchedOnlyMerge(spark: SparkSession): Boolean
- Attributes
- protected
- Definition Classes
- MergeIntoCommandBase
-
def
shouldWritePersistentDeletionVectors(spark: SparkSession, txn: OptimisticTransaction): Boolean
- Attributes
- protected
- Definition Classes
- MergeIntoCommandBase
-
def
simpleString(maxFields: Int): String
- Definition Classes
- QueryPlan → TreeNode
-
def
simpleStringWithNodeId(): String
- Definition Classes
- QueryPlan → TreeNode
-
val
source: LogicalPlan
- Definition Classes
- MergeIntoCommand → MergeIntoCommandBase
-
def
splitConjunctivePredicates(condition: Expression): Seq[Expression]
- Attributes
- protected
- Definition Classes
- PredicateHelper
-
def
splitDisjunctivePredicates(condition: Expression): Seq[Expression]
- Attributes
- protected
- Definition Classes
- PredicateHelper
-
def
statePrefix: String
- Attributes
- protected
- Definition Classes
- LogicalPlan → QueryPlan
-
def
stats: Statistics
- Definition Classes
- Command → LogicalPlanStats
-
val
statsCache: Option[Statistics]
- Attributes
- protected
- Definition Classes
- LogicalPlanStats
-
def
stringArgs: Iterator[Any]
- Attributes
- protected
- Definition Classes
- TreeNode
-
lazy val
subqueries: Seq[LogicalPlan]
- Definition Classes
- QueryPlan
- Annotations
- @transient()
-
def
subqueriesAll: Seq[LogicalPlan]
- Definition Classes
- QueryPlan
-
val
supportMergeAndUpdateLegacyCastBehavior: Boolean
Whether casting behavior can revert to following 'spark.sql.ansi.enabled' instead of 'spark.sql.storeAssignmentPolicy' to preserve legacy behavior for UPDATE and MERGE.
Whether casting behavior can revert to following 'spark.sql.ansi.enabled' instead of 'spark.sql.storeAssignmentPolicy' to preserve legacy behavior for UPDATE and MERGE. Legacy behavior is applied only if 'spark.databricks.delta.updateAndMergeCastingFollowsAnsiEnabledFlag' is set to true.
- Attributes
- protected
- Definition Classes
- MergeIntoCommandBase → UpdateExpressionsSupport
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
val
target: LogicalPlan
- Definition Classes
- MergeIntoCommand → MergeIntoCommandBase
-
lazy val
targetDeltaLog: DeltaLog
- Attributes
- protected
- Definition Classes
- MergeIntoCommandBase
- Annotations
- @transient()
-
val
targetFileIndex: TahoeFileIndex
- Definition Classes
- MergeIntoCommand → MergeIntoCommandBase
-
def
throwErrorOnMultipleMatches(hasMultipleMatches: Boolean, spark: SparkSession): Unit
- Attributes
- protected
- Definition Classes
- MergeIntoCommandBase
-
def
toDataset(sparkSession: SparkSession, logicalPlan: LogicalPlan): Dataset[Row]
- Attributes
- protected
- Definition Classes
- AnalysisHelper
-
def
toJSON: String
- Definition Classes
- TreeNode
-
def
toString(): String
- Definition Classes
- TreeNode → AnyRef → Any
- val trackHighWaterMarks: Set[String]
-
def
transform(rule: PartialFunction[LogicalPlan, LogicalPlan]): LogicalPlan
- Definition Classes
- TreeNode
-
def
transformAllExpressions(rule: PartialFunction[Expression, Expression]): MergeIntoCommand.this.type
- Definition Classes
- QueryPlan
-
def
transformAllExpressionsWithPruning(cond: (TreePatternBits) ⇒ Boolean, ruleId: RuleId)(rule: PartialFunction[Expression, Expression]): MergeIntoCommand.this.type
- Definition Classes
- AnalysisHelper → QueryPlan
-
def
transformAllExpressionsWithSubqueries(rule: PartialFunction[Expression, Expression]): MergeIntoCommand.this.type
- Definition Classes
- QueryPlan
-
def
transformDown(rule: PartialFunction[LogicalPlan, LogicalPlan]): LogicalPlan
- Definition Classes
- TreeNode
-
def
transformDownWithPruning(cond: (TreePatternBits) ⇒ Boolean, ruleId: RuleId)(rule: PartialFunction[LogicalPlan, LogicalPlan]): LogicalPlan
- Definition Classes
- AnalysisHelper → TreeNode
-
def
transformDownWithSubqueries(f: PartialFunction[LogicalPlan, LogicalPlan]): LogicalPlan
- Definition Classes
- QueryPlan
-
def
transformDownWithSubqueriesAndPruning(cond: (TreePatternBits) ⇒ Boolean, ruleId: RuleId)(f: PartialFunction[LogicalPlan, LogicalPlan]): LogicalPlan
- Definition Classes
- QueryPlan
-
def
transformExpressions(rule: PartialFunction[Expression, Expression]): MergeIntoCommand.this.type
- Definition Classes
- QueryPlan
-
def
transformExpressionsDown(rule: PartialFunction[Expression, Expression]): MergeIntoCommand.this.type
- Definition Classes
- QueryPlan
-
def
transformExpressionsDownWithPruning(cond: (TreePatternBits) ⇒ Boolean, ruleId: RuleId)(rule: PartialFunction[Expression, Expression]): MergeIntoCommand.this.type
- Definition Classes
- QueryPlan
-
def
transformExpressionsUp(rule: PartialFunction[Expression, Expression]): MergeIntoCommand.this.type
- Definition Classes
- QueryPlan
-
def
transformExpressionsUpWithPruning(cond: (TreePatternBits) ⇒ Boolean, ruleId: RuleId)(rule: PartialFunction[Expression, Expression]): MergeIntoCommand.this.type
- Definition Classes
- QueryPlan
-
def
transformExpressionsWithPruning(cond: (TreePatternBits) ⇒ Boolean, ruleId: RuleId)(rule: PartialFunction[Expression, Expression]): MergeIntoCommand.this.type
- Definition Classes
- QueryPlan
-
def
transformUp(rule: PartialFunction[LogicalPlan, LogicalPlan]): LogicalPlan
- Definition Classes
- TreeNode
-
def
transformUpWithBeforeAndAfterRuleOnChildren(cond: (LogicalPlan) ⇒ Boolean, ruleId: RuleId)(rule: PartialFunction[(LogicalPlan, LogicalPlan), LogicalPlan]): LogicalPlan
- Definition Classes
- TreeNode
-
def
transformUpWithNewOutput(rule: PartialFunction[LogicalPlan, (LogicalPlan, Seq[(Attribute, Attribute)])], skipCond: (LogicalPlan) ⇒ Boolean, canGetOutput: (LogicalPlan) ⇒ Boolean): LogicalPlan
- Definition Classes
- AnalysisHelper → QueryPlan
-
def
transformUpWithPruning(cond: (TreePatternBits) ⇒ Boolean, ruleId: RuleId)(rule: PartialFunction[LogicalPlan, LogicalPlan]): LogicalPlan
- Definition Classes
- AnalysisHelper → TreeNode
-
def
transformUpWithSubqueries(f: PartialFunction[LogicalPlan, LogicalPlan]): LogicalPlan
- Definition Classes
- QueryPlan
-
def
transformWithPruning(cond: (TreePatternBits) ⇒ Boolean, ruleId: RuleId)(rule: PartialFunction[LogicalPlan, LogicalPlan]): LogicalPlan
- Definition Classes
- TreeNode
-
def
transformWithSubqueries(f: PartialFunction[LogicalPlan, LogicalPlan]): LogicalPlan
- Definition Classes
- QueryPlan
-
lazy val
treePatternBits: BitSet
- Definition Classes
- QueryPlan → TreeNode → TreePatternBits
-
def
treeString(append: (String) ⇒ Unit, verbose: Boolean, addSuffix: Boolean, maxFields: Int, printOperatorId: Boolean): Unit
- Definition Classes
- TreeNode
-
final
def
treeString(verbose: Boolean, addSuffix: Boolean, maxFields: Int, printOperatorId: Boolean): String
- Definition Classes
- TreeNode
-
final
def
treeString: String
- Definition Classes
- TreeNode
-
def
trimAliases(e: Expression): Expression
- Attributes
- protected
- Definition Classes
- AliasHelper
-
def
trimNonTopLevelAliases[T <: Expression](e: T): T
- Attributes
- protected
- Definition Classes
- AliasHelper
-
def
tryResolveReferences(sparkSession: SparkSession)(expr: Expression, planContainingExpr: LogicalPlan): Expression
- Attributes
- protected
- Definition Classes
- AnalysisHelper
-
def
tryResolveReferencesForExpressions(sparkSession: SparkSession)(exprs: Seq[Expression], plansProvidingAttrs: Seq[LogicalPlan]): Seq[Expression]
Resolve expressions using the attributes provided by
planProvidingAttrs, ignoring errors.Resolve expressions using the attributes provided by
planProvidingAttrs, ignoring errors.- Attributes
- protected
- Definition Classes
- AnalysisHelper
-
def
tryResolveReferencesForExpressions(sparkSession: SparkSession, exprs: Seq[Expression], planContainingExpr: LogicalPlan): Seq[Expression]
- Attributes
- protected
- Definition Classes
- AnalysisHelper
-
def
unsetTagValue[T](tag: TreeNodeTag[T]): Unit
- Definition Classes
- TreeNode
-
final
def
updateMetadata(spark: SparkSession, txn: OptimisticTransaction, schema: StructType, partitionColumns: Seq[String], configuration: Map[String, String], isOverwriteMode: Boolean, rearrangeOnly: Boolean): Unit
- Attributes
- protected
- Definition Classes
- ImplicitMetadataOperation
-
def
updateOuterReferencesInSubquery(plan: LogicalPlan, attrMap: AttributeMap[Attribute]): LogicalPlan
- Definition Classes
- AnalysisHelper → QueryPlan
-
lazy val
validConstraints: ExpressionSet
- Attributes
- protected
- Definition Classes
- QueryPlanConstraints
-
def
verboseString(maxFields: Int): String
- Definition Classes
- QueryPlan → TreeNode
-
def
verboseStringWithOperatorId(): String
- Definition Classes
- QueryPlan
-
def
verboseStringWithSuffix(maxFields: Int): String
- Definition Classes
- LogicalPlan → TreeNode
-
def
verifyPartitionPredicates(spark: SparkSession, partitionColumns: Seq[String], predicates: Seq[Expression]): Unit
- Definition Classes
- DeltaCommand
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan
- Definition Classes
- TreeNode
-
def
withNewChildrenInternal(newChildren: IndexedSeq[LogicalPlan]): LogicalPlan
- Definition Classes
- LeafLike
-
def
withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: ⇒ T): T
Report a log to indicate some command is running.
Report a log to indicate some command is running.
- Definition Classes
- DeltaProgressReporter
-
def
writeAllChanges(spark: SparkSession, deltaTxn: OptimisticTransaction, filesToRewrite: Seq[AddFile], deduplicateCDFDeletes: DeduplicateCDFDeletes, writeUnmodifiedRows: Boolean): Seq[FileAction]
Write new files by reading the touched files and updating/inserting data using the source query/table.
Write new files by reading the touched files and updating/inserting data using the source query/table. This is implemented using a full-outer-join using the merge condition.
Note that unlike the insert-only code paths with just one control column ROW_DROPPED_COL, this method has a second control column CDC_TYPE_COL_NAME used for handling CDC when enabled.
- Attributes
- protected
- Definition Classes
- ClassicMergeExecutor
-
def
writeDVs(spark: SparkSession, deltaTxn: OptimisticTransaction, filesToRewrite: Seq[AddFile]): Seq[FileAction]
Writes Deletion Vectors for rows modified by the merge operation.
Writes Deletion Vectors for rows modified by the merge operation.
- Attributes
- protected
- Definition Classes
- ClassicMergeExecutor
-
def
writeFiles(spark: SparkSession, txn: OptimisticTransaction, outputDF: DataFrame): Seq[FileAction]
Write the output data to files, repartitioning the output DataFrame by the partition columns if table is partitioned and
merge.repartitionBeforeWrite.enabledis set to true.Write the output data to files, repartitioning the output DataFrame by the partition columns if table is partitioned and
merge.repartitionBeforeWrite.enabledis set to true.- Attributes
- protected
- Definition Classes
- MergeIntoCommandBase
-
def
writeOnlyInserts(spark: SparkSession, deltaTxn: OptimisticTransaction, filterMatchedRows: Boolean, numSourceRowsMetric: String): Seq[FileAction]
Optimization to write new files by inserting only new data.
Optimization to write new files by inserting only new data.
When there are no matched clauses for the merge command, data is skipped based on the merge condition and left anti join is performed on the source data to find the rows to be inserted.
When there is nothing matched for the merge command even if there are matched clauses, the source table is used to perform inserting.
- spark
The spark session.
- deltaTxn
The existing transaction.
- filterMatchedRows
Whether to filter away matched data or not.
- numSourceRowsMetric
The name of the metric in which to record the number of source rows
- Attributes
- protected
- Definition Classes
- InsertOnlyMergeExecutor
-
object
RetryHandling extends Enumeration
- Definition Classes
- MergeIntoMaterializeSource
-
object
SubqueryExpression
Extractor object for the subquery plan of expressions that contain subqueries.
Extractor object for the subquery plan of expressions that contain subqueries.
- Definition Classes
- DeltaSparkPlanUtils