package commands
- Alphabetic
- Public
- All
Type Members
- case class ConvertToDeltaCommand(tableIdentifier: TableIdentifier, partitionSchema: Option[StructType], deltaPath: Option[String]) extends ConvertToDeltaCommandBase with Product with Serializable
-
abstract
class
ConvertToDeltaCommandBase extends LogicalPlan with RunnableCommand with DeltaCommand
Convert an existing parquet table to a delta table by creating delta logs based on existing files.
Convert an existing parquet table to a delta table by creating delta logs based on existing files. Here are the main components:
- File Listing: Launch a spark job to list files from a given directory in parallel.
- Schema Inference: Given an iterator on the file list result, we group the iterator into sequential batches and launch a spark job to infer schema for each batch, and finally merge schemas from all batches.
- Stats collection: Again, we group the iterator on file list results into sequential batches and launch a spark job to collect stats for each batch.
- Commit the files: We take the iterator of files with stats and write out a delta log file as the first commit. This bypasses the transaction protocol, but it's ok as this would be the very first commit.
-
case class
DeleteCommand(tahoeFileIndex: TahoeFileIndex, target: LogicalPlan, condition: Option[Expression]) extends LogicalPlan with RunnableCommand with DeltaCommand with Product with Serializable
Performs a Delete based on the search condition
Performs a Delete based on the search condition
Algorithm: 1) Scan all the files and determine which files have the rows that need to be deleted. 2) Traverse the affected files and rebuild the touched files. 3) Use the Delta protocol to atomically write the remaining rows to new files and remove the affected files that are identified in step 1.
-
case class
DeleteMetric(condition: String, numFilesTotal: Long, numTouchedFiles: Long, numRewrittenFiles: Long, scanTimeMs: Long, rewriteTimeMs: Long) extends Product with Serializable
Used to report details about delete.
Used to report details about delete.
- Note
All the time units are milliseconds.
-
trait
DeltaCommand extends DeltaLogging
Helper trait for all delta commands.
- case class DeltaGenerateCommand(modeName: String, tableId: TableIdentifier) extends LogicalPlan with DeltaGenerateCommandBase with Product with Serializable
- trait DeltaGenerateCommandBase extends LogicalPlan with RunnableCommand
- case class DeltaVacuumStats(isDryRun: Boolean, specifiedRetentionMillis: Option[Long], defaultRetentionMillis: Long, minRetainedTimestamp: Long, dirsPresentBeforeDelete: Long, objectsDeleted: Long) extends Product with Serializable
- case class DescribeDeltaDetailCommand(path: Option[String], tableIdentifier: Option[TableIdentifier]) extends LogicalPlan with DescribeDeltaDetailCommandBase with Product with Serializable
-
trait
DescribeDeltaDetailCommandBase extends LogicalPlan with RunnableCommand with DeltaLogging
A command for describing the details of a table such as the format, name, and size.
- case class MergeDataFiles(files: Long) extends Product with Serializable
- case class MergeDataRows(rows: Long) extends Product with Serializable
-
case class
MergeIntoCommand(source: LogicalPlan, target: LogicalPlan, targetFileIndex: TahoeFileIndex, condition: Expression, matchedClauses: Seq[DeltaMergeIntoMatchedClause], notMatchedClause: Option[DeltaMergeIntoInsertClause], migratedSchema: Option[StructType]) extends LogicalPlan with RunnableCommand with DeltaCommand with PredicateHelper with AnalysisHelper with ImplicitMetadataOperation with Product with Serializable
Performs a merge of a source query/table into a Delta table.
Performs a merge of a source query/table into a Delta table.
Issues an error message when the ON search_condition of the MERGE statement can match a single row from the target table with multiple rows of the source table-reference.
Algorithm:
Phase 1: Find the input files in target that are touched by the rows that satisfy the condition and verify that no two source rows match with the same target row. This is implemented as an inner-join using the given condition. See findTouchedFiles for more details.
Phase 2: Read the touched files again and write new files with updated and/or inserted rows.
Phase 3: Use the Delta protocol to atomically remove the touched files and add the new files.
- source
Source data to merge from
- target
Target table to merge into
- targetFileIndex
TahoeFileIndex of the target table
- condition
Condition for a source row to match with a target row
- matchedClauses
All info related to matched clauses.
- notMatchedClause
All info related to not matched clause.
- migratedSchema
The final schema of the target - may be changed by schema evolution.
-
case class
MergeStats(conditionExpr: String, updateConditionExpr: String, updateExprs: Array[String], insertConditionExpr: String, insertExprs: Array[String], deleteConditionExpr: String, source: MergeDataRows, targetBeforeSkipping: MergeDataFiles, targetAfterSkipping: MergeDataFiles, targetFilesRemoved: Long, targetFilesAdded: Long, targetRowsCopied: Long, targetRowsUpdated: Long, targetRowsInserted: Long, targetRowsDeleted: Long) extends Product with Serializable
State for a merge operation
-
case class
TableDetail(format: String, id: String, name: String, description: String, location: String, createdAt: Timestamp, lastModified: Timestamp, partitionColumns: Seq[String], numFiles: Long, sizeInBytes: Long, properties: Map[String, String], minReaderVersion: Integer, minWriterVersion: Integer) extends Product with Serializable
The result returned by the
describe detailcommand. -
case class
UpdateCommand(tahoeFileIndex: TahoeFileIndex, target: LogicalPlan, updateExpressions: Seq[Expression], condition: Option[Expression]) extends LogicalPlan with RunnableCommand with DeltaCommand with Product with Serializable
Performs an Update using
updateExpressionon the rows that matchconditionPerforms an Update using
updateExpressionon the rows that matchconditionAlgorithm: 1) Identify the affected files, i.e., the files that may have the rows to be updated. 2) Scan affected files, apply the updates, and generate a new DF with updated rows. 3) Use the Delta protocol to atomically write the new DF as new files and remove the affected files that are identified in step 1.
-
case class
UpdateMetric(condition: String, numFilesTotal: Long, numTouchedFiles: Long, numRewrittenFiles: Long, scanTimeMs: Long, rewriteTimeMs: Long) extends Product with Serializable
Used to report details about update.
Used to report details about update.
- Note
All the time units are milliseconds.
- trait VacuumCommandImpl extends DeltaCommand
-
case class
WriteIntoDelta(deltaLog: DeltaLog, mode: SaveMode, options: DeltaOptions, partitionColumns: Seq[String], configuration: Map[String, String], data: DataFrame) extends LogicalPlan with RunnableCommand with ImplicitMetadataOperation with DeltaCommand with Product with Serializable
Used to write a DataFrame into a delta table.
Used to write a DataFrame into a delta table.
New Table Semantics
- The schema of the DataFrame is used to initialize the table.
- The partition columns will be used to partition the table.
Existing Table Semantics
- The save mode will control how existing data is handled (i.e. overwrite, append, etc)
- The schema of the DataFrame will be checked and if there are new columns present they will be added to the tables schema. Conflicting columns (i.e. a INT, and a STRING) will result in an exception
- The partition columns, if present are validated against the existing metadata. If not present, then the partitioning of the table is respected.
In combination with
Overwrite, areplaceWhereoption can be used to transactionally replace data that matches a predicate.
Value Members
- object DeleteCommand extends Serializable
- object DeltaGenerateCommand extends Serializable
- object MergeIntoCommand extends Serializable
- object UpdateCommand extends Serializable
-
object
VacuumCommand extends VacuumCommandImpl
Vacuums the table by clearing all untracked files and folders within this table.
Vacuums the table by clearing all untracked files and folders within this table. First lists all the files and directories in the table, and gets the relative paths with respect to the base of the table. Then it gets the list of all tracked files for this table, which may or may not be within the table base path, and gets the relative paths of all the tracked files with respect to the base of the table. Files outside of the table path will be ignored. Then we take a diff of the files and delete directories that were already empty, and all files that are within the table that are no longer tracked.