t

org.apache.spark.sql.delta.commands.merge

InsertOnlyMergeExecutor

trait InsertOnlyMergeExecutor extends MergeOutputGeneration

Trait with optimized execution for merges that only inserts new data. There are two cases for inserts only: when there are no matched clauses for the merge command and when there is nothing matched for the merge command even if there are matched clauses.

Self Type
InsertOnlyMergeExecutor with MergeIntoCommandBase
Linear Supertypes
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. InsertOnlyMergeExecutor
  2. MergeOutputGeneration
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. case class ProcessedClause(condition: Option[Expression], actions: Seq[Expression]) extends Product with Serializable

    Represents a merge clause after its condition and action expressions have been processed before generating the final output expression.

    Represents a merge clause after its condition and action expressions have been processed before generating the final output expression.

    condition

    Optional precomputed condition.

    actions

    List of output expressions generated from every action of the clause.

    Attributes
    protected
    Definition Classes
    MergeOutputGeneration

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  6. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. def generateAllActionExprs(targetWriteCols: Seq[Expression], rowIdColumnExpressionOpt: Option[NamedExpression], rowCommitVersionColumnExpressionOpt: Option[NamedExpression], clausesWithPrecompConditions: Seq[DeltaMergeIntoClause], cdcEnabled: Boolean, shouldCountDeletedRows: Boolean): Seq[(InsertOnlyMergeExecutor.this)#ProcessedClause]

    Generate expressions for every output column and every merge clause based on the corresponding UPDATE, DELETE and/or INSERT action(s).

    Generate expressions for every output column and every merge clause based on the corresponding UPDATE, DELETE and/or INSERT action(s).

    targetWriteCols

    List of output column expressions from the target table. Used to generate CDC data for DELETE.

    rowIdColumnExpressionOpt

    The optional Row ID preservation column with the physical Row ID name, it stores stable Row IDs of the table.

    rowCommitVersionColumnExpressionOpt

    The optional Row Commit Version preservation column with the physical Row Commit Version name, it stores stable Row Commit Versions.

    clausesWithPrecompConditions

    List of merge clauses with precomputed conditions. Action expressions are generated for each of these clauses.

    cdcEnabled

    Whether the generated expressions should include CDC information.

    shouldCountDeletedRows

    Whether metrics for number of deleted rows should be incremented here.

    returns

    For each merge clause, a list of ProcessedClause each with a precomputed condition and N+2 action expressions (N output columns + ROW_DROPPED_COL + CDC_TYPE_COLUMN_NAME) to apply on a row when that clause matches.

    Attributes
    protected
    Definition Classes
    MergeOutputGeneration
  10. def generateCdcAndOutputRows(sourceDf: DataFrame, outputCols: Seq[Column], outputColNames: Seq[String], noopCopyExprs: Seq[Expression], rowIdColumnNameOpt: Option[String], rowCommitVersionColumnNameOpt: Option[String], deduplicateDeletes: DeduplicateCDFDeletes): DataFrame

    Build the full output as an array of packed rows, then explode into the final result.

    Build the full output as an array of packed rows, then explode into the final result. Based on the CDC type as originally marked, we produce both rows for the CDC_TYPE_NOT_CDC partition to be written to the main table and rows for the CDC partitions to be written as CDC files.

    See CDCReader for general details on how partitioning on the CDC type column works.

    Attributes
    protected
    Definition Classes
    MergeOutputGeneration
  11. def generateClauseOutputExprs(numOutputCols: Int, clauses: Seq[(InsertOnlyMergeExecutor.this)#ProcessedClause], noopExprs: Seq[Expression]): Seq[Expression]

    Generate the output expression for each output column to apply the correct action for a type of merge clause.

    Generate the output expression for each output column to apply the correct action for a type of merge clause. For each output column, the resulting expression dispatches the correct action based on all clause conditions.

    numOutputCols

    Number of output columns.

    clauses

    List of preprocessed merge clauses to bind together.

    noopExprs

    Default expression to apply when no condition holds.

    returns

    A list of one expression per output column to apply for a type of merge clause.

    Attributes
    protected
    Definition Classes
    MergeOutputGeneration
  12. def generatePrecomputedConditionsAndDF(sourceDF: DataFrame, clauses: Seq[DeltaMergeIntoClause]): (DataFrame, Seq[DeltaMergeIntoClause])

    Precompute conditions in MATCHED and NOT MATCHED clauses and generate the source data frame with precomputed boolean columns.

    Precompute conditions in MATCHED and NOT MATCHED clauses and generate the source data frame with precomputed boolean columns.

    sourceDF

    the source DataFrame.

    clauses

    the merge clauses to precompute.

    returns

    Generated sourceDF with precomputed boolean columns, matched clauses with possible rewritten clause conditions, insert clauses with possible rewritten clause conditions

    Attributes
    protected
    Definition Classes
    MergeOutputGeneration
  13. def generateWriteAllChangesOutputCols(targetWriteCols: Seq[Expression], rowIdColumnExpressionOpt: Option[NamedExpression], rowCommitVersionColumnExpressionOpt: Option[NamedExpression], targetWriteColNames: Seq[String], noopCopyExprs: Seq[Expression], clausesWithPrecompConditions: Seq[DeltaMergeIntoClause], cdcEnabled: Boolean, shouldCountDeletedRows: Boolean = true): IndexedSeq[Column]

    Generate the expressions to process full-outer join output and generate target rows.

    Generate the expressions to process full-outer join output and generate target rows.

    To generate these N + 2 columns, we generate N + 2 expressions and apply them on the joinedDF. The CDC column will be either used for CDC generation or dropped before performing the final write, and the other column will always be dropped after executing the increment metric expression and filtering on ROW_DROPPED_COL.

    Attributes
    protected
    Definition Classes
    MergeOutputGeneration
  14. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  15. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  16. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  17. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  18. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  19. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  20. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  21. def toString(): String
    Definition Classes
    AnyRef → Any
  22. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  23. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  24. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  25. def writeOnlyInserts(spark: SparkSession, deltaTxn: OptimisticTransaction, filterMatchedRows: Boolean, numSourceRowsMetric: String): Seq[FileAction]

    Optimization to write new files by inserting only new data.

    Optimization to write new files by inserting only new data.

    When there are no matched clauses for the merge command, data is skipped based on the merge condition and left anti join is performed on the source data to find the rows to be inserted.

    When there is nothing matched for the merge command even if there are matched clauses, the source table is used to perform inserting.

    spark

    The spark session.

    deltaTxn

    The existing transaction.

    filterMatchedRows

    Whether to filter away matched data or not.

    numSourceRowsMetric

    The name of the metric in which to record the number of source rows

    Attributes
    protected

Inherited from MergeOutputGeneration

Inherited from AnyRef

Inherited from Any

Ungrouped