trait TransactionalWrite extends DeltaLogging
Adds the ability to write files out as part of a transaction. Checks are performed to ensure that the data being written matches either the current metadata or the new metadata being set by this transaction.
- Self Type
- OptimisticTransactionImpl
- Alphabetic
- By Inheritance
- TransactionalWrite
- DeltaLogging
- DatabricksLogging
- DeltaProgressReporter
- Logging
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Abstract Value Members
Concrete Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def checkPartitionColumns(partitionSchema: StructType, output: Seq[Attribute], colsDropped: Boolean): Unit
- Attributes
- protected
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- def convertEmptyToNullIfNeeded(plan: SparkPlan, partCols: Seq[Attribute], constraints: Seq[Constraint]): SparkPlan
If there is any string partition column and there are constraints defined, add a projection to convert empty string to null for that column.
If there is any string partition column and there are constraints defined, add a projection to convert empty string to null for that column. The empty strings will be converted to null eventually even without this convert, but we want to do this earlier before check constraints so that empty strings are correctly rejected. Note that this should not cause the downstream logic in
FileFormatWriterto add duplicate conversions because the logic there checks the partition column using the original plan's output. When the plan is modified with additional projections, the partition column check won't match and will not add more conversion.- plan
The original SparkPlan.
- partCols
The partition columns.
- constraints
The defined constraints.
- returns
A SparkPlan potentially modified with an additional projection on top of
plan
- Attributes
- protected
- def deltaAssert(check: => Boolean, name: String, msg: String, deltaLog: DeltaLog = null, data: AnyRef = null, path: Option[Path] = None): Unit
Helper method to check invariants in Delta code.
Helper method to check invariants in Delta code. Fails when running in tests, records a delta assertion event and logs a warning otherwise.
- Attributes
- protected
- Definition Classes
- DeltaLogging
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def getCommitter(outputPath: Path): DelayedCommitProtocol
- Attributes
- protected
- def getCommonTags(deltaLog: DeltaLog, tahoeId: String): Map[TagDefinition, String]
- Definition Classes
- DeltaLogging
- def getErrorData(e: Throwable): Map[String, Any]
- Definition Classes
- DeltaLogging
- def getOptionalStatsTrackerAndStatsCollection(output: Seq[Attribute], outputPath: Path, partitionSchema: StructType, data: DataFrame): (Option[DeltaJobStatisticsTracker], Option[StatisticsCollection])
Return the pair of optional stats tracker and stats collection class
Return the pair of optional stats tracker and stats collection class
- Attributes
- protected
- def getPartitioningColumns(partitionSchema: StructType, output: Seq[Attribute]): Seq[Attribute]
- Attributes
- protected
- def getStatsColExpr(statsDataSchema: Seq[Attribute], statsCollection: StatisticsCollection): Expression
- Attributes
- protected
- def getStatsSchema(dataFrameOutput: Seq[Attribute], partitionSchema: StructType): (Seq[Attribute], Seq[Attribute])
Return a tuple of (outputStatsCollectionSchema, statsCollectionSchema).
Return a tuple of (outputStatsCollectionSchema, statsCollectionSchema). outputStatsCollectionSchema is the data source schema from DataFrame used for stats collection. It contains the columns in the DataFrame output, excluding the partition columns. tableStatsCollectionSchema is the schema to collect stats for. It contains the columns in the table schema, excluding the partition columns. Note: We only collect NULL_COUNT stats (as the number of rows) for the columns in statsCollectionSchema but missing in outputStatsCollectionSchema
- Attributes
- protected
- val hasWritten: Boolean
- Attributes
- protected
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def log: Logger
- Attributes
- protected
- Definition Classes
- Logging
- def logConsole(line: String): Unit
- Definition Classes
- DatabricksLogging
- def logDebug(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logName: String
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def makeOutputNullable(output: Seq[Attribute]): Seq[Attribute]
Makes the output attributes nullable, so that we don't write unreadable parquet files.
Makes the output attributes nullable, so that we don't write unreadable parquet files.
- Attributes
- protected
- def mapColumnAttributes(output: Seq[Attribute], mappingMode: DeltaColumnMappingMode): Seq[Attribute]
Replace the output attributes with the physical mapping information.
Replace the output attributes with the physical mapping information.
- Attributes
- protected
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def normalizeData(deltaLog: DeltaLog, options: Option[DeltaOptions], data: Dataset[_]): (QueryExecution, Seq[Attribute], Seq[Constraint], Set[String])
Normalize the schema of the query, and return the QueryExecution to execute.
Normalize the schema of the query, and return the QueryExecution to execute. If the table has generated columns and users provide these columns in the output, we will also return constraints that should be respected. If any constraints are returned, the caller should apply these constraints when writing data.
Note: The output attributes of the QueryExecution may not match the attributes we return as the output schema. This is because streaming queries create
IncrementalExecution, which cannot be further modified. We can however have the Parquet writer use the physical plan fromIncrementalExecutionand the output schema provided through the attributes.- Attributes
- protected
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- def performCDCPartition(inputData: Dataset[_]): (DataFrame, StructType)
Returns a tuple of (data, partition schema).
Returns a tuple of (data, partition schema). For CDC writes, a
is_cdcis added to the front of the partition schema.column is added to the data andis_cdc=true/false- Attributes
- protected
- def recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null, path: Option[Path] = None): Unit
Used to record the occurrence of a single event or report detailed, operation specific statistics.
Used to record the occurrence of a single event or report detailed, operation specific statistics.
- path
Used to log the path of the delta table when
deltaLogis null.
- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: => A): A
Used to report the duration as well as the success or failure of an operation on a
deltaLog.Used to report the duration as well as the success or failure of an operation on a
deltaLog.- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordDeltaOperationForTablePath[A](tablePath: String, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: => A): A
Used to report the duration as well as the success or failure of an operation on a
tahoePath.Used to report the duration as well as the success or failure of an operation on a
tahoePath.- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
- def recordFrameProfile[T](group: String, name: String)(thunk: => T): T
- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = METRIC_OPERATION_DURATION, silent: Boolean = true)(thunk: => S): S
- Definition Classes
- DatabricksLogging
- def recordProductEvent(metric: MetricDefinition with CentralizableMetric, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
- def recordProductUsage(metric: MetricDefinition with CentralizableMetric, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
- def recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- def withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: => T): T
Report a log to indicate some command is running.
Report a log to indicate some command is running.
- Definition Classes
- DeltaProgressReporter
- def writeFiles(inputData: Dataset[_], writeOptions: Option[DeltaOptions], isOptimize: Boolean, additionalConstraints: Seq[Constraint]): Seq[FileAction]
Writes out the dataframe after performing schema validation.
Writes out the dataframe after performing schema validation. Returns a list of actions to append these files to the reservoir.
- inputData
Data to write out.
- writeOptions
Options to decide how to write out the data.
- isOptimize
Whether the operation writing this is Optimize or not.
- additionalConstraints
Additional constraints on the write.
- def writeFiles(data: Dataset[_], deltaOptions: Option[DeltaOptions], additionalConstraints: Seq[Constraint]): Seq[FileAction]
- def writeFiles(data: Dataset[_]): Seq[FileAction]
- def writeFiles(data: Dataset[_], writeOptions: Option[DeltaOptions]): Seq[FileAction]
- def writeFiles(data: Dataset[_], additionalConstraints: Seq[Constraint]): Seq[FileAction]