class OptimizeExecutor extends DeltaCommand with SQLMetricsReporting with Serializable
Optimize job which compacts small files into larger files to reduce the number of files and potentially allow more efficient reads.
- Alphabetic
- By Inheritance
- OptimizeExecutor
- Serializable
- Serializable
- SQLMetricsReporting
- DeltaCommand
- DeltaLogging
- DatabricksLogging
- DeltaProgressReporter
- LoggingShims
- Logging
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
OptimizeExecutor(sparkSession: SparkSession, snapshot: Snapshot, catalogTable: Option[CatalogTable], partitionPredicate: Seq[Expression], zOrderByColumns: Seq[String], isAutoCompact: Boolean, optimizeContext: DeltaOptimizeContext)
- sparkSession
Spark environment reference.
- snapshot
The snapshot of the table to optimize
- partitionPredicate
List of partition predicates to select subset of files to optimize.
Type Members
-
implicit
class
LogStringContext extends AnyRef
- Definition Classes
- LoggingShims
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
buildBaseRelation(spark: SparkSession, txn: OptimisticTransaction, actionType: String, rootPath: Path, inputLeafFiles: Seq[String], nameToAddFileMap: Map[String, AddFile]): HadoopFsRelation
Build a base relation of files that need to be rewritten as part of an update/delete/merge operation.
Build a base relation of files that need to be rewritten as part of an update/delete/merge operation.
- Attributes
- protected
- Definition Classes
- DeltaCommand
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
createSetTransaction(sparkSession: SparkSession, deltaLog: DeltaLog, options: Option[DeltaOptions] = None): Option[SetTransaction]
Returns SetTransaction if a valid app ID and version are present.
Returns SetTransaction if a valid app ID and version are present. Otherwise returns an empty list.
- Attributes
- protected
- Definition Classes
- DeltaCommand
-
def
deltaAssert(check: ⇒ Boolean, name: String, msg: String, deltaLog: DeltaLog = null, data: AnyRef = null, path: Option[Path] = None): Unit
Helper method to check invariants in Delta code.
Helper method to check invariants in Delta code. Fails when running in tests, records a delta assertion event and logs a warning otherwise.
- Attributes
- protected
- Definition Classes
- DeltaLogging
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
generateCandidateFileMap(basePath: Path, candidateFiles: Seq[AddFile]): Map[String, AddFile]
Generates a map of file names to add file entries for operations where we will need to rewrite files such as delete, merge, update.
Generates a map of file names to add file entries for operations where we will need to rewrite files such as delete, merge, update. We expect file names to be unique, because each file contains a UUID.
- Definition Classes
- DeltaCommand
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getCommonTags(deltaLog: DeltaLog, tahoeId: String): Map[TagDefinition, String]
- Definition Classes
- DeltaLogging
-
def
getDeltaLog(spark: SparkSession, path: Option[String], tableIdentifier: Option[TableIdentifier], operationName: String, hadoopConf: Map[String, String] = Map.empty): DeltaLog
Utility method to return the DeltaLog of an existing Delta table referred by either the given path or tableIdentifier.
Utility method to return the DeltaLog of an existing Delta table referred by either the given path or tableIdentifier.
- spark
SparkSession reference to use.
- path
Table location. Expects a non-empty tableIdentifier or path.
- tableIdentifier
Table identifier. Expects a non-empty tableIdentifier or path.
- operationName
Operation that is getting the DeltaLog, used in error messages.
- hadoopConf
Hadoop file system options used to build DeltaLog.
- returns
DeltaLog of the table
- Attributes
- protected
- Definition Classes
- DeltaCommand
- Exceptions thrown
AnalysisExceptionIf either no Delta table exists at the given path/identifier or there is neither path nor tableIdentifier is provided.
-
def
getDeltaTable(target: LogicalPlan, cmd: String): DeltaTableV2
Extracts the DeltaTableV2 from a LogicalPlan iff the LogicalPlan is a ResolvedTable with either a DeltaTableV2 or a V1Table that is referencing a Delta table.
Extracts the DeltaTableV2 from a LogicalPlan iff the LogicalPlan is a ResolvedTable with either a DeltaTableV2 or a V1Table that is referencing a Delta table. In all other cases this method will throw a "Table not found" exception.
- Definition Classes
- DeltaCommand
-
def
getDeltaTablePathOrIdentifier(target: LogicalPlan, cmd: String): (Option[TableIdentifier], Option[String])
Helper method to extract the table id or path from a LogicalPlan representing a Delta table.
Helper method to extract the table id or path from a LogicalPlan representing a Delta table. This uses DeltaCommand.getDeltaTable to convert the LogicalPlan to a DeltaTableV2 and then extracts either the path or identifier from it. If the DeltaTableV2 has a CatalogTable, the table identifier will be returned. Otherwise, the table's path will be returned. Throws an exception if the LogicalPlan does not represent a Delta table.
- Definition Classes
- DeltaCommand
-
def
getErrorData(e: Throwable): Map[String, Any]
- Definition Classes
- DeltaLogging
-
def
getMetric(name: String): Option[SQLMetric]
Returns the metric with
nameregistered for the given transaction if it exists.Returns the metric with
nameregistered for the given transaction if it exists.- Definition Classes
- SQLMetricsReporting
-
def
getMetricsForOperation(operation: Operation): Map[String, String]
Get the metrics for an operation based on collected SQL Metrics and filtering out the ones based on the metric parameters for that operation.
Get the metrics for an operation based on collected SQL Metrics and filtering out the ones based on the metric parameters for that operation.
- Definition Classes
- SQLMetricsReporting
-
def
getTableCatalogTable(target: LogicalPlan, cmd: String): Option[CatalogTable]
Extracts CatalogTable metadata from a LogicalPlan if the plan is a ResolvedTable.
Extracts CatalogTable metadata from a LogicalPlan if the plan is a ResolvedTable. The table can be a non delta table.
- Definition Classes
- DeltaCommand
-
def
getTablePathOrIdentifier(target: LogicalPlan, cmd: String): (Option[TableIdentifier], Option[String])
Helper method to extract the table id or path from a LogicalPlan representing a resolved table or path.
Helper method to extract the table id or path from a LogicalPlan representing a resolved table or path. This calls getDeltaTablePathOrIdentifier if the resolved table is a delta table. For non delta table with identifier, we extract its identifier. For non delta table with path, it expects the path to be wrapped in an ResolvedPathBasedNonDeltaTable and extracts it from there.
- Definition Classes
- DeltaCommand
-
def
getTouchedFile(basePath: Path, escapedFilePath: String, nameToAddFileMap: Map[String, AddFile]): AddFile
Find the AddFile record corresponding to the file that was read as part of a delete/update/merge operation.
Find the AddFile record corresponding to the file that was read as part of a delete/update/merge operation.
- basePath
The path of the table. Must not be escaped.
- escapedFilePath
The path to a file that can be either absolute or relative. All special chars in this path must be already escaped by URI standards.
- nameToAddFileMap
Map generated through
generateCandidateFileMap().
- Definition Classes
- DeltaCommand
-
def
hasBeenExecuted(txn: OptimisticTransaction, sparkSession: SparkSession, options: Option[DeltaOptions] = None): Boolean
Returns true if there is information in the spark session that indicates that this write has already been successfully written.
Returns true if there is information in the spark session that indicates that this write has already been successfully written.
- Attributes
- protected
- Definition Classes
- DeltaCommand
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
isCatalogTable(analyzer: Analyzer, tableIdent: TableIdentifier): Boolean
Use the analyzer to see whether the provided TableIdentifier is for a path based table or not
Use the analyzer to see whether the provided TableIdentifier is for a path based table or not
- analyzer
The session state analyzer to call
- tableIdent
Table Identifier to determine whether is path based or not
- returns
Boolean where true means that the table is a table in a metastore and false means the table is a path based table
- Definition Classes
- DeltaCommand
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
isPathIdentifier(tableIdent: TableIdentifier): Boolean
Checks if the given identifier can be for a delta table's path
Checks if the given identifier can be for a delta table's path
- tableIdent
Table Identifier for which to check
- Attributes
- protected
- Definition Classes
- DeltaCommand
-
def
isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
log: Logger
- Attributes
- protected
- Definition Classes
- Logging
-
def
logConsole(line: String): Unit
- Definition Classes
- DatabricksLogging
-
def
logDebug(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logDebug(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logDebug(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logError(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logError(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logInfo(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logInfo(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logName: String
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logTrace(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logTrace(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logWarning(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logWarning(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- def optimize(): Seq[Row]
-
def
parsePredicates(spark: SparkSession, predicate: String): Seq[Expression]
Converts string predicates into Expressions relative to a transaction.
Converts string predicates into Expressions relative to a transaction.
- Attributes
- protected
- Definition Classes
- DeltaCommand
- Exceptions thrown
AnalysisExceptionif a non-partition column is referenced.
-
def
recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null, path: Option[Path] = None): Unit
Used to record the occurrence of a single event or report detailed, operation specific statistics.
Used to record the occurrence of a single event or report detailed, operation specific statistics.
- path
Used to log the path of the delta table when
deltaLogis null.
- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A
Used to report the duration as well as the success or failure of an operation on a
deltaLog.Used to report the duration as well as the success or failure of an operation on a
deltaLog.- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordDeltaOperationForTablePath[A](tablePath: String, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A
Used to report the duration as well as the success or failure of an operation on a
tahoePath.Used to report the duration as well as the success or failure of an operation on a
tahoePath.- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
-
def
recordFrameProfile[T](group: String, name: String)(thunk: ⇒ T): T
- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = METRIC_OPERATION_DURATION, silent: Boolean = true)(thunk: ⇒ S): S
- Definition Classes
- DatabricksLogging
-
def
recordProductEvent(metric: MetricDefinition with CentralizableMetric, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
-
def
recordProductUsage(metric: MetricDefinition with CentralizableMetric, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
-
def
recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
-
def
registerSQLMetrics(spark: SparkSession, metrics: Map[String, SQLMetric]): Unit
Register SQL metrics for an operation by appending the supplied metrics map to the operationSQLMetrics map.
Register SQL metrics for an operation by appending the supplied metrics map to the operationSQLMetrics map.
- Definition Classes
- SQLMetricsReporting
-
def
removeFilesFromPaths(deltaLog: DeltaLog, nameToAddFileMap: Map[String, AddFile], filesToRewrite: Seq[String], operationTimestamp: Long): Seq[RemoveFile]
This method provides the RemoveFile actions that are necessary for files that are touched and need to be rewritten in methods like Delete, Update, and Merge.
This method provides the RemoveFile actions that are necessary for files that are touched and need to be rewritten in methods like Delete, Update, and Merge.
- deltaLog
The DeltaLog of the table that is being operated on
- nameToAddFileMap
A map generated using
generateCandidateFileMap.- filesToRewrite
Absolute paths of the files that were touched. We will search for these in
candidateFiles. Obtained as the output of theinput_file_namefunction.- operationTimestamp
The timestamp of the operation
- Attributes
- protected
- Definition Classes
- DeltaCommand
-
def
resolveIdentifier(analyzer: Analyzer, identifier: TableIdentifier): LogicalPlan
Use the analyzer to resolve the identifier provided
Use the analyzer to resolve the identifier provided
- analyzer
The session state analyzer to call
- identifier
Table Identifier to determine whether is path based or not
- Attributes
- protected
- Definition Classes
- DeltaCommand
-
def
sendDriverMetrics(spark: SparkSession, metrics: Map[String, SQLMetric]): Unit
Send the driver-side metrics.
Send the driver-side metrics.
This is needed to make the SQL metrics visible in the Spark UI. All metrics are default initialized with 0 so that's what we're reporting in case we skip an already executed action.
- Attributes
- protected
- Definition Classes
- DeltaCommand
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
verifyPartitionPredicates(spark: SparkSession, partitionColumns: Seq[String], predicates: Seq[Expression]): Unit
- Definition Classes
- DeltaCommand
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: ⇒ T): T
Report a log to indicate some command is running.
Report a log to indicate some command is running.
- Definition Classes
- DeltaProgressReporter