package commands
- Alphabetic
- Public
- All
Type Members
-
trait
AlterDeltaTableCommand extends DeltaCommand
A super trait for alter table commands that modify Delta tables.
-
case class
AlterTableAddColumnsDeltaCommand(table: DeltaTableV2, colsToAddWithPosition: Seq[QualifiedColType]) extends LogicalPlan with LeafRunnableCommand with AlterDeltaTableCommand with IgnoreCachedData with Product with Serializable
A command that add columns to a Delta table.
A command that add columns to a Delta table. The syntax of using this command in SQL is:
ALTER TABLE table_identifier ADD COLUMNS (col_name data_type [COMMENT col_comment], ...);
-
case class
AlterTableAddConstraintDeltaCommand(table: DeltaTableV2, name: String, exprText: String) extends LogicalPlan with AlterTableConstraintDeltaCommand with Product with Serializable
Command to add a constraint to a Delta table.
Command to add a constraint to a Delta table. Currently only CHECK constraints are supported.
Adding a constraint will scan all data in the table to verify the constraint currently holds.
- table
The table to which the constraint should be added.
- name
The name of the new constraint.
- exprText
The contents of the new CHECK constraint, to be parsed and evaluated.
-
case class
AlterTableChangeColumnDeltaCommand(table: DeltaTableV2, columnPath: Seq[String], columnName: String, newColumn: StructField, colPosition: Option[ColumnPosition], syncIdentity: Boolean) extends LogicalPlan with LeafRunnableCommand with AlterDeltaTableCommand with IgnoreCachedData with Product with Serializable
A command to change the column for a Delta table, support changing the comment of a column and reordering columns.
A command to change the column for a Delta table, support changing the comment of a column and reordering columns.
The syntax of using this command in SQL is:
ALTER TABLE table_identifier CHANGE [COLUMN] column_old_name column_new_name column_dataType [COMMENT column_comment] [FIRST | AFTER column_name];
-
case class
AlterTableClusterByDeltaCommand(table: DeltaTableV2, clusteringColumns: Seq[Seq[String]]) extends LogicalPlan with LeafRunnableCommand with AlterDeltaTableCommand with IgnoreCachedData with Product with Serializable
Command for altering clustering columns for clustered tables.
Command for altering clustering columns for clustered tables. - ALTER TABLE .. CLUSTER BY (col1, col2, ...) - ALTER TABLE .. CLUSTER BY NONE
Note that the given
clusteringColumnsare empty when CLUSTER BY NONE is specified. Also,clusteringColumnsare validated (e.g., duplication / existence check) in DeltaCatalog.alterTable(). - trait AlterTableConstraintDeltaCommand extends LogicalPlan with LeafRunnableCommand with AlterDeltaTableCommand with IgnoreCachedData
-
case class
AlterTableDropColumnsDeltaCommand(table: DeltaTableV2, columnsToDrop: Seq[Seq[String]]) extends LogicalPlan with LeafRunnableCommand with AlterDeltaTableCommand with IgnoreCachedData with Product with Serializable
A command that drop columns from a Delta table.
A command that drop columns from a Delta table. The syntax of using this command in SQL is:
ALTER TABLE table_identifier DROP COLUMN(S) (col_name_1, col_name_2, ...);
-
case class
AlterTableDropConstraintDeltaCommand(table: DeltaTableV2, name: String, ifExists: Boolean) extends LogicalPlan with AlterTableConstraintDeltaCommand with Product with Serializable
Command to drop a constraint from a Delta table.
Command to drop a constraint from a Delta table. No-op if a constraint with the given name doesn't exist.
Currently only CHECK constraints are supported.
- table
The table from which the constraint should be dropped
- name
The name of the constraint to drop
-
case class
AlterTableDropFeatureDeltaCommand(table: DeltaTableV2, featureName: String, truncateHistory: Boolean = false) extends LogicalPlan with LeafRunnableCommand with AlterDeltaTableCommand with IgnoreCachedData with Product with Serializable
A command that removes an existing feature from the table.
A command that removes an existing feature from the table. The feature needs to implement the RemovableFeature trait.
The syntax of the command is:
ALTER TABLE t DROP FEATURE f [TRUNCATE HISTORY]
When dropping a feature, remove the feature traces from the latest version. However, the table history still contains feature traces. This creates two problems:
1) Reconstructing the state of the latest version may require replaying log records prior to feature removal. Log replay is based on checkpoints which is used by clients as a starting point for replaying history. Any actions before the checkpoint do not need to be replayed. However, checkpoints may be deleted at any time, which can then expose readers to older log records. 2) Clients could create checkpoints in past versions. These could lead to incorrect behavior if the client that created the checkpoint did not support all features.
To address these issues, we currently provide two implementations:
1) DropFeatureWithHistoryTruncation. We truncate history at the boundary of version of the dropped feature (when required). Requires two executions of the drop feature command with a waiting time in between the two executions. 2) executeDropFeatureWithCheckpointProtection, i.e. fast drop feature. We create barrier checkpoints to protect against log replay and checkpoint creation. The behavior is enforced with the aid of CheckpointProtectionTableFeature.
Config tableFeatures.fastDropFeature.enabled can be used to control which implementation is used. Furthermore, please note the option [TRUNCATE HISTORY] in the SQL syntax is only relevant for DropFeatureWithHistoryTruncation. When used, we always fallback to that implementation.
At a high level, dropping a feature consists of two stages (see RemovableFeature):
1) preDowngradeCommand. This command is responsible for removing any data and metadata related to the feature. 2) Protocol downgrade. Removes the feature from the current version's protocol. During this stage we also validate whether all traces of the feature-to-be-removed are gone.
For removing features with requiresHistoryProtection=false the two steps above are sufficient. For features that require history protection, we follow a different approach for each of the implementations listed above. Please see the corresponding functions for more details.
Note, legacy features can be removed as well. When removing a legacy feature from a legacy protocol, if the result cannot be represented with a legacy representation we use the table features representation. For example, removing Invariants from (1, 3) results to (1, 7, None, [AppendOnly, CheckConstraints]). Adding back Invariants to the protocol is normalized back to (1, 3). This allows to consistently transition back and forth between legacy protocols and table feature protocols.
-
case class
AlterTableReplaceColumnsDeltaCommand(table: DeltaTableV2, columns: Seq[StructField]) extends LogicalPlan with LeafRunnableCommand with AlterDeltaTableCommand with IgnoreCachedData with Product with Serializable
A command to replace columns for a Delta table, support changing the comment of a column, reordering columns, and loosening nullabilities.
A command to replace columns for a Delta table, support changing the comment of a column, reordering columns, and loosening nullabilities.
The syntax of using this command in SQL is:
ALTER TABLE table_identifier REPLACE COLUMNS (col_spec[, col_spec ...]);
-
case class
AlterTableSetLocationDeltaCommand(table: DeltaTableV2, location: String) extends LogicalPlan with LeafRunnableCommand with AlterDeltaTableCommand with IgnoreCachedData with Product with Serializable
A command to change the location of a Delta table.
A command to change the location of a Delta table. Effectively, this only changes the symlink in the Hive MetaStore from one Delta table to another.
This command errors out if the new location is not a Delta table. By default, the new Delta table must have the same schema as the old table, but we have a SQL conf that allows users to bypass this schema check.
The syntax of using this command in SQL is:
ALTER TABLE table_identifier SET LOCATION 'path/to/new/delta/table';
-
case class
AlterTableSetPropertiesDeltaCommand(table: DeltaTableV2, configuration: Map[String, String]) extends LogicalPlan with LeafRunnableCommand with AlterDeltaTableCommand with IgnoreCachedData with Product with Serializable
A command that sets Delta table configuration.
A command that sets Delta table configuration.
The syntax of this command is:
ALTER TABLE table1 SET TBLPROPERTIES ('key1' = 'val1', 'key2' = 'val2', ...);
-
case class
AlterTableUnsetPropertiesDeltaCommand(table: DeltaTableV2, propKeys: Seq[String], ifExists: Boolean, fromDropFeatureCommand: Boolean = false) extends LogicalPlan with LeafRunnableCommand with AlterDeltaTableCommand with IgnoreCachedData with Product with Serializable
A command that unsets Delta table configuration.
A command that unsets Delta table configuration. If ifExists is false, each individual key will be checked if it exists or not, it's a one-by-one operation, not an all or nothing check. Otherwise, non-existent keys will be ignored.
The syntax of this command is:
ALTER TABLE table1 UNSET TBLPROPERTIES [IF EXISTS] ('key1', 'key2', ...);
-
case class
Batch(bins: Seq[Bin]) extends Product with Serializable
A batch represents all the bins that will be processed and commited in a single transaction.
A batch represents all the bins that will be processed and commited in a single transaction.
- bins
The set of bins to process in this transaction
-
case class
Bin(partitionValues: Map[String, String], files: Seq[AddFile]) extends Product with Serializable
A bin represents a single set of files that are being re-written in a single Spark job.
A bin represents a single set of files that are being re-written in a single Spark job. For compaction, this represents a single file being written. For clustering, this is an entire partition for Z-ordering, or an entire ZCube for liquid clustering.
- partitionValues
The partition this set of files is in
- files
The list of files being re-written
-
abstract
class
CloneConvertedSource extends CloneSource
A convertible non-delta table source to be cloned from
-
class
CloneDeltaSource extends CloneSource
A delta table source to be cloned from
-
case class
CloneIcebergSource(tableIdentifier: TableIdentifier, sparkTable: Option[Table], tableSchema: Option[StructType], spark: SparkSession) extends CloneConvertedSource with Product with Serializable
A iceberg table source to be cloned from
-
case class
CloneParquetSource(tableIdentifier: TableIdentifier, catalogTable: Option[CatalogTable], spark: SparkSession) extends CloneConvertedSource with Product with Serializable
A parquet table source to be cloned from
-
trait
CloneSource extends Closeable
An interface of the source table to be cloned from.
- abstract class CloneTableBase extends LogicalPlan with LeafCommand with CloneTableBaseUtils with SQLConfHelper
- trait CloneTableBaseUtils extends DeltaLogging
-
case class
CloneTableCommand(sourceTable: CloneSource, targetIdent: TableIdentifier, tablePropertyOverrides: Map[String, String], targetPath: Path) extends CloneTableBase with Product with Serializable
Clones a Delta table to a new location with a new table id.
Clones a Delta table to a new location with a new table id. The clone can be performed as a shallow clone (i.e. shallow = true), where we do not copy the files, but just point to them. If a table exists at the given targetPath, that table will be replaced.
- sourceTable
is the table to be cloned
- targetIdent
destination table identifier to clone to
- tablePropertyOverrides
user-defined table properties that should override any properties with the same key from the source table
- targetPath
the actual destination
-
case class
ClusteringStrategy(sparkSession: SparkSession, clusteringColumns: Seq[String], optimizeContext: DeltaOptimizeContext) extends OptimizeTableStrategy with Product with Serializable
Implements clustering strategy for clustered tables
-
case class
CompactionStrategy(sparkSession: SparkSession, optimizeContext: DeltaOptimizeContext) extends OptimizeTableStrategy with Product with Serializable
Implements compaction strategy
- case class ConvertToDeltaCommand(tableIdentifier: TableIdentifier, partitionSchema: Option[StructType], collectStats: Boolean, deltaPath: Option[String]) extends ConvertToDeltaCommandBase with Product with Serializable
-
abstract
class
ConvertToDeltaCommandBase extends LogicalPlan with LeafRunnableCommand with DeltaCommand
Convert an existing parquet table to a delta table by creating delta logs based on existing files.
Convert an existing parquet table to a delta table by creating delta logs based on existing files. Here are the main components:
- File Listing: Launch a spark job to list files from a given directory in parallel.
- Schema Inference: Given an iterator on the file list result, we group the iterator into sequential batches and launch a spark job to infer schema for each batch, and finally merge schemas from all batches.
- Stats collection: Again, we group the iterator on file list results into sequential batches and launch a spark job to collect stats for each batch.
- Commit the files: We take the iterator of files with stats and write out a delta log file as the first commit. This bypasses the transaction protocol, but it's ok as this would be the very first commit.
-
case class
CreateDeltaTableCommand(table: CatalogTable, existingTableOpt: Option[CatalogTable], mode: SaveMode, query: Option[LogicalPlan], operation: CreationMode = TableCreationModes.Create, tableByPath: Boolean = false, output: Seq[Attribute] = Nil, protocol: Option[Protocol] = None, createTableFunc: Option[(CatalogTable) ⇒ Unit] = None) extends LogicalPlan with LeafRunnableCommand with DeltaCommand with DeltaLogging with Product with Serializable
Single entry point for all write or declaration operations for Delta tables accessed through the table name.
Single entry point for all write or declaration operations for Delta tables accessed through the table name.
- table
The table identifier for the Delta table
- existingTableOpt
The existing table for the same identifier if exists
- mode
The save mode when writing data. Relevant when the query is empty or set to Ignore with
CREATE TABLE IF NOT EXISTS.- query
The query to commit into the Delta table if it exist. This can come from
- CTAS
- saveAsTable
- protocol
This is used to create a table with specific protocol version
- createTableFunc
If specified, call this function to create the table, instead of Spark
SessionCatalog#createTablewhich is backed by Hive Metastore.
-
case class
DeleteCommand(deltaLog: DeltaLog, catalogTable: Option[CatalogTable], target: LogicalPlan, condition: Option[Expression]) extends LogicalPlan with LeafRunnableCommand with DeltaCommand with DeleteCommandMetrics with Product with Serializable
Performs a Delete based on the search condition
Performs a Delete based on the search condition
Algorithm: 1) Scan all the files and determine which files have the rows that need to be deleted. 2) Traverse the affected files and rebuild the touched files. 3) Use the Delta protocol to atomically write the remaining rows to new files and remove the affected files that are identified in step 1.
- trait DeleteCommandMetrics extends AnyRef
-
case class
DeleteMetric(condition: String, numFilesTotal: Long, numTouchedFiles: Long, numRewrittenFiles: Long, numRemovedFiles: Long, numAddedFiles: Long, numAddedChangeFiles: Long, numFilesBeforeSkipping: Long, numBytesBeforeSkipping: Long, numFilesAfterSkipping: Long, numBytesAfterSkipping: Long, numPartitionsAfterSkipping: Option[Long], numPartitionsAddedTo: Option[Long], numPartitionsRemovedFrom: Option[Long], numCopiedRows: Option[Long], numDeletedRows: Option[Long], numBytesAdded: Long, numBytesRemoved: Long, changeFileBytes: Long, scanTimeMs: Long, rewriteTimeMs: Long, numDeletionVectorsAdded: Long, numDeletionVectorsRemoved: Long, numDeletionVectorsUpdated: Long, commitVersion: Option[Long] = None, isWriteCommand: Boolean = false, numLogicalRecordsAdded: Option[Long] = None, numLogicalRecordsRemoved: Option[Long] = None) extends Product with Serializable
Used to report details about delete.
Used to report details about delete.
- Note
All the time units are milliseconds.
-
case class
DeletionVectorData(filePath: String, deletionVectorId: Option[String], deletedRowIndexSet: Array[Byte], deletedRowIndexCount: Long) extends Sizing with Product with Serializable
Row containing the file path and its new deletion vector bitmap in memory
Row containing the file path and its new deletion vector bitmap in memory
- filePath
Absolute path of the data file this DV result is generated for.
- deletionVectorId
Existing DeletionVectorDescriptor serialized in JSON format. This info is used to load the existing DV with the new DV.
- deletedRowIndexSet
In-memory Deletion vector bitmap generated containing the newly deleted row indexes from data file.
- deletedRowIndexCount
Count of rows marked as deleted using the deletedRowIndexSet.
-
case class
DeletionVectorResult(filePath: String, deletionVector: DeletionVectorDescriptor, matchedRowCount: Long) extends Product with Serializable
Final output for each file containing the file path, DeletionVectorDescriptor and how many rows are marked as deleted in this file as part of the this operation (doesn't include rows that are already marked as deleted).
Final output for each file containing the file path, DeletionVectorDescriptor and how many rows are marked as deleted in this file as part of the this operation (doesn't include rows that are already marked as deleted).
- filePath
Absolute path of the data file this DV result is generated for.
- deletionVector
Deletion vector generated containing the newly deleted row indices from data file.
- matchedRowCount
Number of rows marked as deleted using the deletionVector.
- trait DeletionVectorUtils extends DeltaLogging
-
trait
DeltaCommand extends DeltaLogging
Helper trait for all delta commands.
- case class DeltaGenerateCommand(child: LogicalPlan, modeName: String) extends LogicalPlan with RunnableCommand with UnaryNode with DeltaCommand with Product with Serializable
-
case class
DeltaOptimizeContext(reorg: Option[DeltaReorgOperation] = None, minFileSize: Option[Long] = None, maxFileSize: Option[Long] = None, maxDeletedRowsRatio: Option[Double] = None, isFull: Boolean = false) extends Product with Serializable
Stored all runtime context information that can control the execution of optimize.
Stored all runtime context information that can control the execution of optimize.
- reorg
The REORG operation that triggered the rewriting task, if any.
- minFileSize
Files which are smaller than this threshold will be selected for compaction. If not specified, DeltaSQLConf.DELTA_OPTIMIZE_MIN_FILE_SIZE will be used. This parameter must be set to
0when reorg is set.- maxDeletedRowsRatio
Files with a ratio of soft-deleted rows to the total rows larger than this threshold will be rewritten by the OPTIMIZE command. If not specified, DeltaSQLConf.DELTA_OPTIMIZE_MAX_DELETED_ROWS_RATIO will be used. This parameter must be set to
0when reorg is set.- isFull
whether OPTIMIZE FULL is run. This is only for clustered tables.
-
class
DeltaPurgeOperation extends DeltaReorgOperation with ReorgTableHelper
Reorg operation to purge files with soft deleted rows.
Reorg operation to purge files with soft deleted rows. This operation will also try finding and removing the dropped columns from parquet files, if ever exists such column that does not present in the current table schema.
-
sealed
trait
DeltaReorgOperation extends AnyRef
Defines a Reorg operation to be applied during optimize.
- case class DeltaReorgTable(target: LogicalPlan, reorgTableSpec: DeltaReorgTableSpec = ...)(predicates: Seq[String]) extends LogicalPlan with UnaryCommand with Product with Serializable
-
case class
DeltaReorgTableCommand(target: LogicalPlan, reorgTableSpec: DeltaReorgTableSpec = ...)(predicates: Seq[String]) extends OptimizeTableCommandBase with ReorgTableForUpgradeUniformHelper with LeafCommand with IgnoreCachedData with Product with Serializable
The REORG TABLE command.
- case class DeltaReorgTableSpec(reorgTableMode: DeltaReorgTableMode.Value, icebergCompatVersionOpt: Option[Int]) extends Product with Serializable
-
class
DeltaRewriteTypeWideningOperation extends DeltaReorgOperation with ReorgTableHelper
Internal reorg operation to rewrite files to conform to the current table schema when dropping the type widening table feature.
-
class
DeltaUpgradeUniformOperation extends DeltaReorgOperation
Reorg operation to upgrade the iceberg compatibility version of a table.
- case class DeltaVacuumStats(isDryRun: Boolean, specifiedRetentionMillis: Option[Long], defaultRetentionMillis: Long, minRetainedTimestamp: Long, dirsPresentBeforeDelete: Long, filesAndDirsPresentBeforeDelete: Long, objectsDeleted: Long, sizeOfDataToDelete: Long, timeTakenToIdentifyEligibleFiles: Long, timeTakenForDelete: Long, vacuumStartTime: Long, vacuumEndTime: Long, numPartitionColumns: Long, latestCommitVersion: Long, eligibleStartCommitVersion: Option[Long], eligibleEndCommitVersion: Option[Long], typeOfVacuum: String) extends Product with Serializable
-
case class
DescribeDeltaDetailCommand(child: LogicalPlan, hadoopConf: Map[String, String]) extends LogicalPlan with RunnableCommand with UnaryNode with DeltaLogging with DeltaCommand with Product with Serializable
A command for describing the details of a table such as the format, name, and size.
-
case class
DescribeDeltaHistory(child: LogicalPlan, limit: Option[Int], output: Seq[Attribute] = ...) extends LogicalPlan with UnaryNode with MultiInstanceRelation with DeltaCommand with Product with Serializable
A logical placeholder for describing a Delta table's history, so that the history can be leveraged in subqueries.
A logical placeholder for describing a Delta table's history, so that the history can be leveraged in subqueries. Replaced with
DescribeDeltaHistoryCommandduring planning. -
case class
DescribeDeltaHistoryCommand(table: DeltaTableV2, limit: Option[Int], output: Seq[Attribute] = ...) extends LogicalPlan with LeafRunnableCommand with MultiInstanceRelation with DeltaLogging with Product with Serializable
A command for describing the history of a Delta table.
-
case class
FileToDvDescriptor(path: String, deletionVectorId: Option[String]) extends Product with Serializable
Holds a mapping from a file path (url-encoded) to an (optional) serialized Deletion Vector descriptor.
- case class LastVacuumInfo(latestCommitVersionOutsideOfRetentionWindow: Option[Long] = None) extends Product with Serializable
-
case class
MergeIntoCommand(source: LogicalPlan, target: LogicalPlan, catalogTable: Option[CatalogTable], targetFileIndex: TahoeFileIndex, condition: Expression, matchedClauses: Seq[DeltaMergeIntoMatchedClause], notMatchedClauses: Seq[DeltaMergeIntoNotMatchedClause], notMatchedBySourceClauses: Seq[DeltaMergeIntoNotMatchedBySourceClause], migratedSchema: Option[StructType], trackHighWaterMarks: Set[String] = Set.empty, schemaEvolutionEnabled: Boolean = false) extends LogicalPlan with MergeIntoCommandBase with InsertOnlyMergeExecutor with ClassicMergeExecutor with Product with Serializable
Performs a merge of a source query/table into a Delta table.
Performs a merge of a source query/table into a Delta table.
Issues an error message when the ON search_condition of the MERGE statement can match a single row from the target table with multiple rows of the source table-reference.
Algorithm:
Phase 1: Find the input files in target that are touched by the rows that satisfy the condition and verify that no two source rows match with the same target row. This is implemented as an inner-join using the given condition. See ClassicMergeExecutor for more details.
Phase 2: Read the touched files again and write new files with updated and/or inserted rows.
Phase 3: Use the Delta protocol to atomically remove the touched files and add the new files.
- source
Source data to merge from
- target
Target table to merge into
- targetFileIndex
TahoeFileIndex of the target table
- condition
Condition for a source row to match with a target row
- matchedClauses
All info related to matched clauses.
- notMatchedClauses
All info related to not matched clauses.
- notMatchedBySourceClauses
All info related to not matched by source clauses.
- migratedSchema
The final schema of the target - may be changed by schema evolution.
- trackHighWaterMarks
The column names for which we will track IDENTITY high water marks.
- trait MergeIntoCommandBase extends LogicalPlan with LeafRunnableCommand with DeltaCommand with DeltaLogging with PredicateHelper with ImplicitMetadataOperation with MergeIntoMaterializeSource with UpdateExpressionsSupport with SupportsNonDeterministicExpression
-
class
OptimizeExecutor extends DeltaCommand with SQLMetricsReporting with Serializable
Optimize job which compacts small files into larger files to reduce the number of files and potentially allow more efficient reads.
-
case class
OptimizeTableCommand(child: LogicalPlan, userPartitionPredicates: Seq[String], optimizeContext: DeltaOptimizeContext)(zOrderBy: Seq[UnresolvedAttribute]) extends OptimizeTableCommandBase with UnaryNode with Product with Serializable
The
optimizecommand implementation for Spark SQL.The
optimizecommand implementation for Spark SQL. Example SQL:OPTIMIZE ('/path/to/dir' | delta.table) [WHERE part = 25] [FULL];Note FULL and WHERE clauses are set exclusively.
-
abstract
class
OptimizeTableCommandBase extends LogicalPlan with RunnableCommand with DeltaCommand
Base class defining abstract optimize command
-
trait
OptimizeTableStrategy extends AnyRef
Defines set of utilities used in OptimizeTableCommand.
Defines set of utilities used in OptimizeTableCommand. The behavior of these utilities will change based on the OptimizeTableMode: COMPACTION, ZORDER and CLUSTERING.
-
trait
ReorgTableForUpgradeUniformHelper extends DeltaLogging
Helper trait for ReorgTableCommand to rewrite the table to be Iceberg compatible.
- trait ReorgTableHelper extends Serializable
-
case class
RestoreTableCommand(sourceTable: DeltaTableV2) extends LogicalPlan with LeafRunnableCommand with DeltaCommand with RestoreTableCommandBase with Product with Serializable
Perform restore of delta table to a specified version or timestamp
Perform restore of delta table to a specified version or timestamp
Algorithm: 1) Read the latest snapshot of the table. 2) Read snapshot for version or timestamp to restore 3) Compute files available in snapshot for restoring (files were removed by some commit) but missed in the latest. Add these files into commit as AddFile action. 4) Compute files available in the latest snapshot (files were added after version to restore) but missed in the snapshot to restore. Add these files into commit as RemoveFile action. 5) If SQLConf.IGNORE_MISSING_FILES option is false (default value) check availability of AddFile in file system. 6) Commit metadata, Protocol, all RemoveFile and AddFile actions into delta log using
commitLarge(commit will be failed in case of parallel transaction) 7) If table was modified in parallel then ignore restore and raise exception. -
trait
RestoreTableCommandBase extends AnyRef
Base trait class for RESTORE.
Base trait class for RESTORE. Defines command output schema and metrics.
-
case class
ShowDeltaTableColumnsCommand(child: LogicalPlan) extends LogicalPlan with RunnableCommand with UnaryNode with DeltaCommand with Product with Serializable
A command for listing all column names of a Delta table.
A command for listing all column names of a Delta table.
- child
The resolved Delta table
-
case class
SnapshotOverwriteOperationMetrics(sourceSnapshotSizeInBytes: Long, sourceSnapshotFileCount: Long, destSnapshotAddedFileCount: Long, destSnapshotAddedFilesSizeInBytes: Long) extends Product with Serializable
Metrics of snapshot overwrite operation.
Metrics of snapshot overwrite operation.
- sourceSnapshotSizeInBytes
Total size of the data in the source snapshot.
- sourceSnapshotFileCount
Number of data files in the source snapshot.
- destSnapshotAddedFileCount
Number of new data files added to the destination snapshot as part of the execution.
- destSnapshotAddedFilesSizeInBytes
Total size (in bytes) of the data files that were added to the destination snapshot.
-
case class
TableColumns(col_name: String) extends Product with Serializable
The column format of the result returned by the
SHOW COLUMNScommand. -
case class
TableDetail(format: String, id: String, name: String, description: String, location: String, createdAt: Timestamp, lastModified: Timestamp, partitionColumns: Seq[String], clusteringColumns: Seq[String], numFiles: Long, sizeInBytes: Long, properties: Map[String, String], minReaderVersion: Integer, minWriterVersion: Integer, tableFeatures: Seq[String]) extends Product with Serializable
The result returned by the
describe detailcommand. - case class TouchedFileWithDV(inputFilePath: String, fileLogEntry: AddFile, newDeletionVector: DeletionVectorDescriptor, deletedRows: Long) extends Product with Serializable
-
case class
UpdateCommand(tahoeFileIndex: TahoeFileIndex, catalogTable: Option[CatalogTable], target: LogicalPlan, updateExpressions: Seq[Expression], condition: Option[Expression]) extends LogicalPlan with LeafRunnableCommand with DeltaCommand with Product with Serializable
Performs an Update using
updateExpressionon the rows that matchconditionPerforms an Update using
updateExpressionon the rows that matchconditionAlgorithm: 1) Identify the affected files, i.e., the files that may have the rows to be updated. 2) Scan affected files, apply the updates, and generate a new DF with updated rows. 3) Use the Delta protocol to atomically write the new DF as new files and remove the affected files that are identified in step 1.
-
case class
UpdateMetric(condition: String, numFilesTotal: Long, numTouchedFiles: Long, numRewrittenFiles: Long, numAddedChangeFiles: Long, changeFileBytes: Long, scanTimeMs: Long, rewriteTimeMs: Long, numDeletionVectorsAdded: Long, numDeletionVectorsRemoved: Long, numDeletionVectorsUpdated: Long, commitVersion: Option[Long] = None, numLogicalRecordsAdded: Option[Long] = None, numLogicalRecordsRemoved: Option[Long] = None) extends Product with Serializable
Used to report details about update.
Used to report details about update.
- Note
All the time units are milliseconds.
- trait VacuumCommandImpl extends DeltaCommand
-
case class
WriteIntoDelta(deltaLog: DeltaLog, mode: SaveMode, options: DeltaOptions, partitionColumns: Seq[String], configuration: Map[String, String], data: DataFrame, catalogTableOpt: Option[CatalogTable] = None, schemaInCatalog: Option[StructType] = None) extends LogicalPlan with LeafRunnableCommand with ImplicitMetadataOperation with DeltaCommand with WriteIntoDeltaLike with Product with Serializable
Used to write a DataFrame into a delta table.
Used to write a DataFrame into a delta table.
New Table Semantics
- The schema of the DataFrame is used to initialize the table.
- The partition columns will be used to partition the table.
Existing Table Semantics
- The save mode will control how existing data is handled (i.e. overwrite, append, etc)
- The schema of the DataFrame will be checked and if there are new columns present they will be added to the tables schema. Conflicting columns (i.e. a INT, and a STRING) will result in an exception
- The partition columns, if present are validated against the existing metadata. If not present, then the partitioning of the table is respected.
In combination with
Overwrite, areplaceWhereoption can be used to transactionally replace data that matches a predicate.In combination with
Overwritedynamic partition overwrite mode (optionpartitionOverwriteModeset todynamic, or in spark confspark.sql.sources.partitionOverwriteModeset todynamic) is also supported.Dynamic partition overwrite mode conflicts with
replaceWhere:- If a
replaceWhereoption is provided, and dynamic partition overwrite mode is enabled in the DataFrameWriter options, an error will be thrown. - If a
replaceWhereoption is provided, and dynamic partition overwrite mode is enabled in the spark conf, data will be overwritten according to thereplaceWhereexpression
- catalogTableOpt
Should explicitly be set when table is accessed from catalog
- schemaInCatalog
The schema created in Catalog. We will use this schema to update metadata when it is set (in CTAS code path), and otherwise use schema from
data.
-
trait
WriteIntoDeltaLike extends AnyRef
An interface for writing data into Delta tables.
-
case class
ZOrderStrategy(sparkSession: SparkSession, zOrderColumns: Seq[String]) extends OptimizeTableStrategy with Product with Serializable
Implements ZOrder strategy
Value Members
- object CloneSourceFormat
- object CloneTableBase extends Logging
- object CloneTableCommand extends Serializable
- object ConvertToDeltaCommand extends DeltaLogging with Serializable
- object DMLUtils
-
object
DMLWithDeletionVectorsHelper extends DeltaCommand
Contains utility classes and method for performing DML operations with Deletion Vectors.
- object DeleteCommand extends Serializable
- object DeletionVectorBitmapGenerator
- object DeletionVectorData extends Serializable
- object DeletionVectorResult extends Serializable
- object DeletionVectorUtils extends DeletionVectorUtils
-
object
DeletionVectorWriter extends DeltaLogging
Utility methods to write the deletion vector to storage.
Utility methods to write the deletion vector to storage. If a particular file already has an existing DV, it will be merged with the new deletion vector and written to storage.
- object DeltaGenerateCommand extends Serializable
- object DeltaReorgTableMode extends Enumeration
- object DescribeDeltaDetailCommand extends Serializable
- object DescribeDeltaHistory extends Serializable
- object FileToDvDescriptor extends Serializable
- object LastVacuumInfo extends DeltaCommand with Serializable
- object MergeIntoCommandBase
- object OptimizeTableCommand extends Serializable
- object OptimizeTableMode extends Enumeration
- object OptimizeTableStrategy
- object TableCreationModes
- object TableDetail extends Serializable
- object UpdateCommand extends Serializable
-
object
VacuumCommand extends VacuumCommandImpl with Serializable
Vacuums the table by clearing all untracked files and folders within this table.
Vacuums the table by clearing all untracked files and folders within this table. First lists all the files and directories in the table, and gets the relative paths with respect to the base of the table. Then it gets the list of all tracked files for this table, which may or may not be within the table base path, and gets the relative paths of all the tracked files with respect to the base of the table. Files outside of the table path will be ignored. Then we take a diff of the files and delete directories that were already empty, and all files that are within the table that are no longer tracked.