trait InsertOnlyMergeExecutor extends MergeOutputGeneration
Trait with optimized execution for merges that only inserts new data. There are two cases for inserts only: when there are no matched clauses for the merge command and when there is nothing matched for the merge command even if there are matched clauses.
- Self Type
- InsertOnlyMergeExecutor with MergeIntoCommandBase
- Alphabetic
- By Inheritance
- InsertOnlyMergeExecutor
- MergeOutputGeneration
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
-
case class
ProcessedClause(condition: Option[Expression], actions: Seq[Expression]) extends Product with Serializable
Represents a merge clause after its condition and action expressions have been processed before generating the final output expression.
Represents a merge clause after its condition and action expressions have been processed before generating the final output expression.
- condition
Optional precomputed condition.
- actions
List of output expressions generated from every action of the clause.
- Attributes
- protected
- Definition Classes
- MergeOutputGeneration
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
generateAllActionExprs(targetWriteCols: Seq[Expression], rowIdColumnExpressionOpt: Option[NamedExpression], rowCommitVersionColumnExpressionOpt: Option[NamedExpression], clausesWithPrecompConditions: Seq[DeltaMergeIntoClause], cdcEnabled: Boolean, shouldCountDeletedRows: Boolean): Seq[(InsertOnlyMergeExecutor.this)#ProcessedClause]
Generate expressions for every output column and every merge clause based on the corresponding UPDATE, DELETE and/or INSERT action(s).
Generate expressions for every output column and every merge clause based on the corresponding UPDATE, DELETE and/or INSERT action(s).
- targetWriteCols
List of output column expressions from the target table. Used to generate CDC data for DELETE.
- rowIdColumnExpressionOpt
The optional Row ID preservation column with the physical Row ID name, it stores stable Row IDs of the table.
- rowCommitVersionColumnExpressionOpt
The optional Row Commit Version preservation column with the physical Row Commit Version name, it stores stable Row Commit Versions.
- clausesWithPrecompConditions
List of merge clauses with precomputed conditions. Action expressions are generated for each of these clauses.
- cdcEnabled
Whether the generated expressions should include CDC information.
- shouldCountDeletedRows
Whether metrics for number of deleted rows should be incremented here.
- returns
For each merge clause, a list of ProcessedClause each with a precomputed condition and N+2 action expressions (N output columns + ROW_DROPPED_COL + CDC_TYPE_COLUMN_NAME) to apply on a row when that clause matches.
- Attributes
- protected
- Definition Classes
- MergeOutputGeneration
-
def
generateCdcAndOutputRows(sourceDf: DataFrame, outputCols: Seq[Column], outputColNames: Seq[String], noopCopyExprs: Seq[Expression], rowIdColumnNameOpt: Option[String], rowCommitVersionColumnNameOpt: Option[String], deduplicateDeletes: DeduplicateCDFDeletes): DataFrame
Build the full output as an array of packed rows, then explode into the final result.
Build the full output as an array of packed rows, then explode into the final result. Based on the CDC type as originally marked, we produce both rows for the CDC_TYPE_NOT_CDC partition to be written to the main table and rows for the CDC partitions to be written as CDC files.
See CDCReader for general details on how partitioning on the CDC type column works.
- Attributes
- protected
- Definition Classes
- MergeOutputGeneration
-
def
generateClauseOutputExprs(numOutputCols: Int, clauses: Seq[(InsertOnlyMergeExecutor.this)#ProcessedClause], noopExprs: Seq[Expression]): Seq[Expression]
Generate the output expression for each output column to apply the correct action for a type of merge clause.
Generate the output expression for each output column to apply the correct action for a type of merge clause. For each output column, the resulting expression dispatches the correct action based on all clause conditions.
- numOutputCols
Number of output columns.
- clauses
List of preprocessed merge clauses to bind together.
- noopExprs
Default expression to apply when no condition holds.
- returns
A list of one expression per output column to apply for a type of merge clause.
- Attributes
- protected
- Definition Classes
- MergeOutputGeneration
-
def
generatePrecomputedConditionsAndDF(sourceDF: DataFrame, clauses: Seq[DeltaMergeIntoClause]): (DataFrame, Seq[DeltaMergeIntoClause])
Precompute conditions in MATCHED and NOT MATCHED clauses and generate the source data frame with precomputed boolean columns.
Precompute conditions in MATCHED and NOT MATCHED clauses and generate the source data frame with precomputed boolean columns.
- sourceDF
the source DataFrame.
- clauses
the merge clauses to precompute.
- returns
Generated sourceDF with precomputed boolean columns, matched clauses with possible rewritten clause conditions, insert clauses with possible rewritten clause conditions
- Attributes
- protected
- Definition Classes
- MergeOutputGeneration
-
def
generateWriteAllChangesOutputCols(targetWriteCols: Seq[Expression], rowIdColumnExpressionOpt: Option[NamedExpression], rowCommitVersionColumnExpressionOpt: Option[NamedExpression], targetWriteColNames: Seq[String], noopCopyExprs: Seq[Expression], clausesWithPrecompConditions: Seq[DeltaMergeIntoClause], cdcEnabled: Boolean, shouldCountDeletedRows: Boolean = true): IndexedSeq[Column]
Generate the expressions to process full-outer join output and generate target rows.
Generate the expressions to process full-outer join output and generate target rows.
To generate these N + 2 columns, we generate N + 2 expressions and apply them on the joinedDF. The CDC column will be either used for CDC generation or dropped before performing the final write, and the other column will always be dropped after executing the increment metric expression and filtering on ROW_DROPPED_COL.
- Attributes
- protected
- Definition Classes
- MergeOutputGeneration
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
writeOnlyInserts(spark: SparkSession, deltaTxn: OptimisticTransaction, filterMatchedRows: Boolean, numSourceRowsMetric: String): Seq[FileAction]
Optimization to write new files by inserting only new data.
Optimization to write new files by inserting only new data.
When there are no matched clauses for the merge command, data is skipped based on the merge condition and left anti join is performed on the source data to find the rows to be inserted.
When there is nothing matched for the merge command even if there are matched clauses, the source table is used to perform inserting.
- spark
The spark session.
- deltaTxn
The existing transaction.
- filterMatchedRows
Whether to filter away matched data or not.
- numSourceRowsMetric
The name of the metric in which to record the number of source rows
- Attributes
- protected