Packages

package commands

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. case class ConvertToDeltaCommand(tableIdentifier: TableIdentifier, partitionSchema: Option[StructType], deltaPath: Option[String]) extends ConvertToDeltaCommandBase with Product with Serializable
  2. abstract class ConvertToDeltaCommandBase extends LogicalPlan with RunnableCommand with DeltaCommand

    Convert an existing parquet table to a delta table by creating delta logs based on existing files.

    Convert an existing parquet table to a delta table by creating delta logs based on existing files. Here are the main components:

    • File Listing: Launch a spark job to list files from a given directory in parallel.
    • Schema Inference: Given an iterator on the file list result, we group the iterator into sequential batches and launch a spark job to infer schema for each batch, and finally merge schemas from all batches.
    • Stats collection: Again, we group the iterator on file list results into sequential batches and launch a spark job to collect stats for each batch.
    • Commit the files: We take the iterator of files with stats and write out a delta log file as the first commit. This bypasses the transaction protocol, but it's ok as this would be the very first commit.
  3. case class DeleteCommand(tahoeFileIndex: TahoeFileIndex, target: LogicalPlan, condition: Option[Expression]) extends LogicalPlan with RunnableCommand with DeltaCommand with Product with Serializable

    Performs a Delete based on the search condition

    Performs a Delete based on the search condition

    Algorithm: 1) Scan all the files and determine which files have the rows that need to be deleted. 2) Traverse the affected files and rebuild the touched files. 3) Use the Delta protocol to atomically write the remaining rows to new files and remove the affected files that are identified in step 1.

  4. case class DeleteMetric(condition: String, numFilesTotal: Long, numTouchedFiles: Long, numRewrittenFiles: Long, scanTimeMs: Long, rewriteTimeMs: Long) extends Product with Serializable

    Used to report details about delete.

    Used to report details about delete.

    Note

    All the time units are milliseconds.

  5. trait DeltaCommand extends DeltaLogging

    Helper trait for all delta commands.

  6. case class DeltaGenerateCommand(modeName: String, tableId: TableIdentifier) extends LogicalPlan with DeltaGenerateCommandBase with Product with Serializable
  7. trait DeltaGenerateCommandBase extends LogicalPlan with RunnableCommand
  8. case class DeltaVacuumStats(isDryRun: Boolean, specifiedRetentionMillis: Option[Long], defaultRetentionMillis: Long, minRetainedTimestamp: Long, dirsPresentBeforeDelete: Long, objectsDeleted: Long) extends Product with Serializable
  9. case class DescribeDeltaDetailCommand(path: Option[String], tableIdentifier: Option[TableIdentifier]) extends LogicalPlan with DescribeDeltaDetailCommandBase with Product with Serializable
  10. trait DescribeDeltaDetailCommandBase extends LogicalPlan with RunnableCommand with DeltaLogging

    A command for describing the details of a table such as the format, name, and size.

  11. case class MergeDataFiles(files: Long) extends Product with Serializable
  12. case class MergeDataRows(rows: Long) extends Product with Serializable
  13. case class MergeIntoCommand(source: LogicalPlan, target: LogicalPlan, targetFileIndex: TahoeFileIndex, condition: Expression, matchedClauses: Seq[DeltaMergeIntoMatchedClause], notMatchedClause: Option[DeltaMergeIntoInsertClause], migratedSchema: Option[StructType]) extends LogicalPlan with RunnableCommand with DeltaCommand with PredicateHelper with AnalysisHelper with ImplicitMetadataOperation with Product with Serializable

    Performs a merge of a source query/table into a Delta table.

    Performs a merge of a source query/table into a Delta table.

    Issues an error message when the ON search_condition of the MERGE statement can match a single row from the target table with multiple rows of the source table-reference.

    Algorithm:

    Phase 1: Find the input files in target that are touched by the rows that satisfy the condition and verify that no two source rows match with the same target row. This is implemented as an inner-join using the given condition. See findTouchedFiles for more details.

    Phase 2: Read the touched files again and write new files with updated and/or inserted rows.

    Phase 3: Use the Delta protocol to atomically remove the touched files and add the new files.

    source

    Source data to merge from

    target

    Target table to merge into

    targetFileIndex

    TahoeFileIndex of the target table

    condition

    Condition for a source row to match with a target row

    matchedClauses

    All info related to matched clauses.

    notMatchedClause

    All info related to not matched clause.

    migratedSchema

    The final schema of the target - may be changed by schema evolution.

  14. case class MergeStats(conditionExpr: String, updateConditionExpr: String, updateExprs: Array[String], insertConditionExpr: String, insertExprs: Array[String], deleteConditionExpr: String, source: MergeDataRows, targetBeforeSkipping: MergeDataFiles, targetAfterSkipping: MergeDataFiles, targetFilesRemoved: Long, targetFilesAdded: Long, targetRowsCopied: Long, targetRowsUpdated: Long, targetRowsInserted: Long, targetRowsDeleted: Long) extends Product with Serializable

    State for a merge operation

  15. case class TableDetail(format: String, id: String, name: String, description: String, location: String, createdAt: Timestamp, lastModified: Timestamp, partitionColumns: Seq[String], numFiles: Long, sizeInBytes: Long, properties: Map[String, String], minReaderVersion: Integer, minWriterVersion: Integer) extends Product with Serializable

    The result returned by the describe detail command.

  16. case class UpdateCommand(tahoeFileIndex: TahoeFileIndex, target: LogicalPlan, updateExpressions: Seq[Expression], condition: Option[Expression]) extends LogicalPlan with RunnableCommand with DeltaCommand with Product with Serializable

    Performs an Update using updateExpression on the rows that match condition

    Performs an Update using updateExpression on the rows that match condition

    Algorithm: 1) Identify the affected files, i.e., the files that may have the rows to be updated. 2) Scan affected files, apply the updates, and generate a new DF with updated rows. 3) Use the Delta protocol to atomically write the new DF as new files and remove the affected files that are identified in step 1.

  17. case class UpdateMetric(condition: String, numFilesTotal: Long, numTouchedFiles: Long, numRewrittenFiles: Long, scanTimeMs: Long, rewriteTimeMs: Long) extends Product with Serializable

    Used to report details about update.

    Used to report details about update.

    Note

    All the time units are milliseconds.

  18. trait VacuumCommandImpl extends DeltaCommand
  19. case class WriteIntoDelta(deltaLog: DeltaLog, mode: SaveMode, options: DeltaOptions, partitionColumns: Seq[String], configuration: Map[String, String], data: DataFrame) extends LogicalPlan with RunnableCommand with ImplicitMetadataOperation with DeltaCommand with Product with Serializable

    Used to write a DataFrame into a delta table.

    Used to write a DataFrame into a delta table.

    New Table Semantics

    • The schema of the DataFrame is used to initialize the table.
    • The partition columns will be used to partition the table.

    Existing Table Semantics

    • The save mode will control how existing data is handled (i.e. overwrite, append, etc)
    • The schema of the DataFrame will be checked and if there are new columns present they will be added to the tables schema. Conflicting columns (i.e. a INT, and a STRING) will result in an exception
    • The partition columns, if present are validated against the existing metadata. If not present, then the partitioning of the table is respected.

    In combination with Overwrite, a replaceWhere option can be used to transactionally replace data that matches a predicate.

Value Members

  1. object DeleteCommand extends Serializable
  2. object DeltaGenerateCommand extends Serializable
  3. object MergeIntoCommand extends Serializable
  4. object UpdateCommand extends Serializable
  5. object VacuumCommand extends VacuumCommandImpl

    Vacuums the table by clearing all untracked files and folders within this table.

    Vacuums the table by clearing all untracked files and folders within this table. First lists all the files and directories in the table, and gets the relative paths with respect to the base of the table. Then it gets the list of all tracked files for this table, which may or may not be within the table base path, and gets the relative paths of all the tracked files with respect to the base of the table. Files outside of the table path will be ignored. Then we take a diff of the files and delete directories that were already empty, and all files that are within the table that are no longer tracked.

Ungrouped