org.apache.spark.sql.delta.commands.merge
MergeOutputGeneration
Companion object MergeOutputGeneration
trait MergeOutputGeneration extends AnyRef
Contains logic to transform the merge clauses into expressions that can be evaluated to obtain the output of the merge operation.
- Self Type
- MergeOutputGeneration with MergeIntoCommandBase
- Alphabetic
- By Inheritance
- MergeOutputGeneration
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Type Members
- case class ProcessedClause(condition: Option[Expression], actions: Seq[Expression]) extends Product with Serializable
Represents a merge clause after its condition and action expressions have been processed before generating the final output expression.
Represents a merge clause after its condition and action expressions have been processed before generating the final output expression.
- condition
Optional precomputed condition.
- actions
List of output expressions generated from every action of the clause.
- Attributes
- protected
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- def generateAllActionExprs(targetWriteCols: Seq[Expression], rowIdColumnExpressionOpt: Option[NamedExpression], rowCommitVersionColumnExpressionOpt: Option[NamedExpression], clausesWithPrecompConditions: Seq[DeltaMergeIntoClause], cdcEnabled: Boolean, shouldCountDeletedRows: Boolean): Seq[(MergeOutputGeneration.this)#ProcessedClause]
Generate expressions for every output column and every merge clause based on the corresponding UPDATE, DELETE and/or INSERT action(s).
Generate expressions for every output column and every merge clause based on the corresponding UPDATE, DELETE and/or INSERT action(s).
- targetWriteCols
List of output column expressions from the target table. Used to generate CDC data for DELETE.
- rowIdColumnExpressionOpt
The optional Row ID preservation column with the physical Row ID name, it stores stable Row IDs of the table.
- rowCommitVersionColumnExpressionOpt
The optional Row Commit Version preservation column with the physical Row Commit Version name, it stores stable Row Commit Versions.
- clausesWithPrecompConditions
List of merge clauses with precomputed conditions. Action expressions are generated for each of these clauses.
- cdcEnabled
Whether the generated expressions should include CDC information.
- shouldCountDeletedRows
Whether metrics for number of deleted rows should be incremented here.
- returns
For each merge clause, a list of ProcessedClause each with a precomputed condition and N+2 action expressions (N output columns + ROW_DROPPED_COL + CDC_TYPE_COLUMN_NAME) to apply on a row when that clause matches.
- Attributes
- protected
- def generateCdcAndOutputRows(sourceDf: DataFrame, outputCols: Seq[Column], outputColNames: Seq[String], noopCopyExprs: Seq[Expression], rowIdColumnNameOpt: Option[String], rowCommitVersionColumnNameOpt: Option[String], deduplicateDeletes: DeduplicateCDFDeletes): DataFrame
Build the full output as an array of packed rows, then explode into the final result.
Build the full output as an array of packed rows, then explode into the final result. Based on the CDC type as originally marked, we produce both rows for the CDC_TYPE_NOT_CDC partition to be written to the main table and rows for the CDC partitions to be written as CDC files.
See CDCReader for general details on how partitioning on the CDC type column works.
- Attributes
- protected
- def generateClauseOutputExprs(numOutputCols: Int, clauses: Seq[(MergeOutputGeneration.this)#ProcessedClause], noopExprs: Seq[Expression]): Seq[Expression]
Generate the output expression for each output column to apply the correct action for a type of merge clause.
Generate the output expression for each output column to apply the correct action for a type of merge clause. For each output column, the resulting expression dispatches the correct action based on all clause conditions.
- numOutputCols
Number of output columns.
- clauses
List of preprocessed merge clauses to bind together.
- noopExprs
Default expression to apply when no condition holds.
- returns
A list of one expression per output column to apply for a type of merge clause.
- Attributes
- protected
- def generatePrecomputedConditionsAndDF(sourceDF: DataFrame, clauses: Seq[DeltaMergeIntoClause]): (DataFrame, Seq[DeltaMergeIntoClause])
Precompute conditions in MATCHED and NOT MATCHED clauses and generate the source data frame with precomputed boolean columns.
Precompute conditions in MATCHED and NOT MATCHED clauses and generate the source data frame with precomputed boolean columns.
- sourceDF
the source DataFrame.
- clauses
the merge clauses to precompute.
- returns
Generated sourceDF with precomputed boolean columns, matched clauses with possible rewritten clause conditions, insert clauses with possible rewritten clause conditions
- Attributes
- protected
- def generateWriteAllChangesOutputCols(targetWriteCols: Seq[Expression], rowIdColumnExpressionOpt: Option[NamedExpression], rowCommitVersionColumnExpressionOpt: Option[NamedExpression], targetWriteColNames: Seq[String], noopCopyExprs: Seq[Expression], clausesWithPrecompConditions: Seq[DeltaMergeIntoClause], cdcEnabled: Boolean, shouldCountDeletedRows: Boolean = true): IndexedSeq[Column]
Generate the expressions to process full-outer join output and generate target rows.
Generate the expressions to process full-outer join output and generate target rows.
To generate these N + 2 columns, we generate N + 2 expressions and apply them on the joinedDF. The CDC column will be either used for CDC generation or dropped before performing the final write, and the other column will always be dropped after executing the increment metric expression and filtering on ROW_DROPPED_COL.
- Attributes
- protected
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()