trait MergeIntoMaterializeSource extends DeltaLogging with DeltaSparkPlanUtils

Trait with logic and utilities used for materializing a snapshot of MERGE source in case we can't guarantee deterministic repeated reads from it.

We materialize source if it is not safe to assume that it's deterministic (override with MERGE_SOURCE_MATERIALIZATION). Otherwise, if source changes between the phases of the MERGE, it can produce wrong results. We use local checkpointing for the materialization, which saves the source as a materialized RDD[InternalRow] on the executor local disks.

1st concern is that if an executor is lost, this data can be lost. When Spark executor decommissioning API is used, it should attempt to move this materialized data safely out before removing the executor.

2nd concern is that if an executor is lost for another reason (e.g. spot kill), we will still lose that data. To mitigate that, we implement a retry loop. The whole Merge operation needs to be restarted from the beginning in this case. When we retry, we increase the replication level of the materialized data from 1 to 2. (override with MERGE_SOURCE_MATERIALIZATION_RDD_STORAGE_LEVEL_RETRY). If it still fails after the maximum number of attempts (MERGE_MATERIALIZE_SOURCE_MAX_ATTEMPTS), we record the failure for tracking purposes.

3rd concern is that executors run out of disk space with the extra materialization. We record such failures for tracking purposes.

Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. MergeIntoMaterializeSource
  2. DeltaSparkPlanUtils
  3. DeltaLogging
  4. DatabricksLogging
  5. DeltaProgressReporter
  6. LoggingShims
  7. Logging
  8. AnyRef
  9. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. implicit class LogStringContext extends AnyRef
    Definition Classes
    LoggingShims
  2. type PlanOrExpression = Either[LogicalPlan, Expression]
    Definition Classes
    DeltaSparkPlanUtils

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. val attempt: Int

    Track which attempt or retry it is in runWithMaterializedSourceAndRetries

    Track which attempt or retry it is in runWithMaterializedSourceAndRetries

    Attributes
    protected
  6. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  7. def collectFirst[In, Out](input: Iterable[In], recurse: (In) ⇒ Option[Out]): Option[Out]
    Attributes
    protected
    Definition Classes
    DeltaSparkPlanUtils
  8. def containsDeterministicUDF(expr: Expression): Boolean

    Returns whether an expression contains any deterministic UDFs.

    Returns whether an expression contains any deterministic UDFs.

    Definition Classes
    DeltaSparkPlanUtils
  9. def containsDeterministicUDF(predicates: Seq[DeltaTableReadPredicate], partitionedOnly: Boolean): Boolean

    Returns whether the read predicates of a transaction contain any deterministic UDFs.

    Returns whether the read predicates of a transaction contain any deterministic UDFs.

    Definition Classes
    DeltaSparkPlanUtils
  10. def deltaAssert(check: ⇒ Boolean, name: String, msg: String, deltaLog: DeltaLog = null, data: AnyRef = null, path: Option[Path] = None): Unit

    Helper method to check invariants in Delta code.

    Helper method to check invariants in Delta code. Fails when running in tests, records a delta assertion event and logs a warning otherwise.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  11. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  12. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  13. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  14. def findFirstNonDeltaScan(source: LogicalPlan): Option[LogicalPlan]
    Attributes
    protected
    Definition Classes
    DeltaSparkPlanUtils
  15. def findFirstNonDeterministicChildNode(children: Seq[Expression], checkDeterministicOptions: CheckDeterministicOptions): Option[PlanOrExpression]
    Attributes
    protected
    Definition Classes
    DeltaSparkPlanUtils
  16. def findFirstNonDeterministicNode(child: Expression, checkDeterministicOptions: CheckDeterministicOptions): Option[PlanOrExpression]
    Attributes
    protected
    Definition Classes
    DeltaSparkPlanUtils
  17. def findFirstNonDeterministicNode(plan: LogicalPlan, checkDeterministicOptions: CheckDeterministicOptions): Option[PlanOrExpression]

    Returns a part of the plan that does not have a safe level of determinism.

    Returns a part of the plan that does not have a safe level of determinism. This is a conservative approximation of plan being a truly deterministic query.

    Attributes
    protected
    Definition Classes
    DeltaSparkPlanUtils
  18. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  19. def getCommonTags(deltaLog: DeltaLog, tahoeId: String): Map[TagDefinition, String]
    Definition Classes
    DeltaLogging
  20. def getErrorData(e: Throwable): Map[String, Any]
    Definition Classes
    DeltaLogging
  21. def getMergeSource: MergeSource

    Returns the prepared merge source.

    Returns the prepared merge source.

    Attributes
    protected
  22. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  23. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  24. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  25. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  26. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  27. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  28. def logConsole(line: String): Unit
    Definition Classes
    DatabricksLogging
  29. def logDebug(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  30. def logDebug(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  31. def logDebug(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  32. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  33. def logError(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  34. def logError(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  35. def logError(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  36. def logError(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  37. def logInfo(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  38. def logInfo(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  39. def logInfo(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  40. def logInfo(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  41. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  42. def logTrace(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  43. def logTrace(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  44. def logTrace(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  45. def logTrace(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  46. def logWarning(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  47. def logWarning(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  48. def logWarning(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  49. def logWarning(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  50. val materializedSourceRDD: Option[RDD[InternalRow]]

    If the source was materialized, reference to the checkpointed RDD.

    If the source was materialized, reference to the checkpointed RDD.

    Attributes
    protected
  51. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  52. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  53. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  54. def planContainsOnlyDeltaScans(source: LogicalPlan): Boolean
    Attributes
    protected
    Definition Classes
    DeltaSparkPlanUtils
  55. def planContainsUdf(plan: LogicalPlan): Boolean
    Attributes
    protected
    Definition Classes
    DeltaSparkPlanUtils
  56. def planIsDeterministic(plan: LogicalPlan, checkDeterministicOptions: CheckDeterministicOptions): Boolean

    Returns true if plan has a safe level of determinism.

    Returns true if plan has a safe level of determinism. This is a conservative approximation of plan being a truly deterministic query.

    Attributes
    protected
    Definition Classes
    DeltaSparkPlanUtils
  57. def prepareMergeSource(spark: SparkSession, source: LogicalPlan, condition: Expression, matchedClauses: Seq[DeltaMergeIntoMatchedClause], notMatchedClauses: Seq[DeltaMergeIntoNotMatchedClause], isInsertOnly: Boolean): Unit

    If source needs to be materialized, prepare the materialized dataframe in sourceDF Otherwise, prepare regular dataframe.

    If source needs to be materialized, prepare the materialized dataframe in sourceDF Otherwise, prepare regular dataframe.

    returns

    the source materialization reason

    Attributes
    protected
  58. def recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null, path: Option[Path] = None): Unit

    Used to record the occurrence of a single event or report detailed, operation specific statistics.

    Used to record the occurrence of a single event or report detailed, operation specific statistics.

    path

    Used to log the path of the delta table when deltaLog is null.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  59. def recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A

    Used to report the duration as well as the success or failure of an operation on a deltaLog.

    Used to report the duration as well as the success or failure of an operation on a deltaLog.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  60. def recordDeltaOperationForTablePath[A](tablePath: String, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A

    Used to report the duration as well as the success or failure of an operation on a tahoePath.

    Used to report the duration as well as the success or failure of an operation on a tahoePath.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  61. def recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
    Definition Classes
    DatabricksLogging
  62. def recordFrameProfile[T](group: String, name: String)(thunk: ⇒ T): T
    Attributes
    protected
    Definition Classes
    DeltaLogging
  63. def recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = METRIC_OPERATION_DURATION, silent: Boolean = true)(thunk: ⇒ S): S
    Definition Classes
    DatabricksLogging
  64. def recordProductEvent(metric: MetricDefinition with CentralizableMetric, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
    Definition Classes
    DatabricksLogging
  65. def recordProductUsage(metric: MetricDefinition with CentralizableMetric, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
    Definition Classes
    DatabricksLogging
  66. def recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
    Definition Classes
    DatabricksLogging
  67. def runWithMaterializedSourceLostRetries(spark: SparkSession, deltaLog: DeltaLog, metrics: Map[String, SQLMetric], runMergeFunc: (SparkSession) ⇒ Seq[Row]): Seq[Row]

    Run the Merge with retries in case it detects an RDD block lost error of the materialized source RDD.

    Run the Merge with retries in case it detects an RDD block lost error of the materialized source RDD. It will also record out of disk error, if such happens - possibly because of increased disk pressure from the materialized source RDD.

    Attributes
    protected
  68. def shouldMaterializeSource(spark: SparkSession, source: LogicalPlan, isInsertOnly: Boolean): (Boolean, MergeIntoMaterializeSourceReason)

    returns

    pair of boolean whether source should be materialized and the source materialization reason

    Attributes
    protected
  69. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  70. def toString(): String
    Definition Classes
    AnyRef → Any
  71. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  72. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  73. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  74. def withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: ⇒ T): T

    Report a log to indicate some command is running.

    Report a log to indicate some command is running.

    Definition Classes
    DeltaProgressReporter
  75. object RetryHandling extends Enumeration
  76. object SubqueryExpression

    Extractor object for the subquery plan of expressions that contain subqueries.

    Extractor object for the subquery plan of expressions that contain subqueries.

    Definition Classes
    DeltaSparkPlanUtils

Inherited from DeltaSparkPlanUtils

Inherited from DeltaLogging

Inherited from DatabricksLogging

Inherited from DeltaProgressReporter

Inherited from LoggingShims

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped