package delta
- Alphabetic
- Public
- Protected
Package Members
- package actions
- package catalog
- package clustering
- package commands
- package constraints
- package deletionvectors
- package expressions
- package files
- package fuzzer
- package hooks
- package implicits
- package managedcommit
- package metering
- package metric
- package optimizer
- package perf
- package schema
- package skipping
- package sources
- package stats
- package storage
- package streaming
- package tablefeatures
- package util
- package zorder
Type Members
- case class CDCNameBased(functionArgs: Seq[Expression]) extends LogicalPlan with CDCStatementBase with Product with Serializable
Plan for the "table_changes" function
- case class CDCPathBased(functionArgs: Seq[Expression]) extends LogicalPlan with CDCStatementBase with Product with Serializable
Plan for the "table_changes_by_path" function
- trait CDCStatementBase extends LogicalPlan with DeltaTableValueFunction
Base trait for analyzing
table_changesandtable_changes_for_path.Base trait for analyzing
table_changesandtable_changes_for_path. The resolution works as follows:- The TVF logical plan is resolved using the TableFunctionRegistry in the Analyzer. This uses
reflection to create one of
CDCNameBasedorCDCPathBasedby passing all the arguments. 2. DeltaAnalysis turns the plans to aTableChangesnode to resolve the DeltaTable. This can be resolved by the DeltaCatalog for tables or DeltaAnalysis for the path based use. 3. TableChanges then turns into a LogicalRelation that returns the CDC relation.
- The TVF logical plan is resolved using the TableFunctionRegistry in the Analyzer. This uses
reflection to create one of
- case class CapturedSnapshot(snapshot: Snapshot, updateTimestamp: Long) extends Product with Serializable
Wraps the most recently updated snapshot along with the timestamp the update was started.
Wraps the most recently updated snapshot along with the timestamp the update was started. Defined outside the class since it's used in tests.
- case class CheckOverflowInTableWrite(child: Expression, columnName: String) extends UnaryExpression with Product with Serializable
- class CheckUnresolvedRelationTimeTravel extends (LogicalPlan) => Unit
Custom check rule that compensates for [SPARK-45383].
Custom check rule that compensates for [SPARK-45383]. It checks the (unresolved) child relation of each RelationTimeTravel in the plan, in order to trigger a helpful table-not-found AnalysisException instead of the internal spark error that would otherwise result.
- case class CheckpointInstance(version: Long, format: Format, fileName: Option[String] = None, numParts: Option[Int] = None) extends Ordered[CheckpointInstance] with Product with Serializable
A class to help with comparing checkpoints with each other, where we may have had concurrent writers that checkpoint with different number of parts.
A class to help with comparing checkpoints with each other, where we may have had concurrent writers that checkpoint with different number of parts. The
numPartsfield will be present only for multipart checkpoints (represented by Format.WITH_PARTS). ThefileNamefield is present only for V2 Checkpoints (represented by Format.V2) These additional fields are used as a tie breaker when comparing multiple checkpoint instance of same Format for the sameversion. - trait CheckpointProvider extends UninitializedCheckpointProvider
A trait which provides information about a checkpoint to the Snapshot.
- trait Checkpoints extends DeltaLogging
- case class ColumnMappingException(msg: String, mode: DeltaColumnMappingMode) extends AnalysisException with Product with Serializable
- class ColumnMappingUnsupportedException extends UnsupportedOperationException
Errors thrown around column mapping.
- class CommitOwnerGetCommitsFailedException extends Exception
Exception thrown When TableCommitOwnerClient.getCommits fails due to any reason.
- case class CommitStats(startVersion: Long, commitVersion: Long, readVersion: Long, txnDurationMs: Long, commitDurationMs: Long, fsWriteDurationMs: Long, stateReconstructionDurationMs: Long, numAdd: Int, numRemove: Int, numSetTransaction: Int, bytesNew: Long, numFilesTotal: Long, sizeInBytesTotal: Long, numCdcFiles: Long, cdcBytesNew: Long, protocol: Protocol, commitSizeBytes: Long, checkpointSizeBytes: Long, totalCommitsSizeSinceLastCheckpoint: Long, checkpointAttempt: Boolean, info: CommitInfo, newMetadata: Option[Metadata], numAbsolutePathsInAdd: Int, numDistinctPartitionsInAdd: Int, numPartitionColumnsInTable: Int, isolationLevel: String, fileSizeHistogram: Option[FileSizeHistogram] = None, addFilesHistogram: Option[FileSizeHistogram] = None, removeFilesHistogram: Option[FileSizeHistogram] = None, numOfDomainMetadatas: Long = 0, txnId: Option[String] = None) extends Product with Serializable
Record metrics about a successful commit.
- class ConcurrentAppendException extends io.delta.exceptions.DeltaConcurrentModificationException
This class is kept for backward compatibility.
This class is kept for backward compatibility. Use io.delta.exceptions.ConcurrentAppendException instead.
- class ConcurrentDeleteDeleteException extends io.delta.exceptions.DeltaConcurrentModificationException
This class is kept for backward compatibility.
This class is kept for backward compatibility. Use io.delta.exceptions.ConcurrentDeleteDeleteException instead.
- class ConcurrentDeleteReadException extends io.delta.exceptions.DeltaConcurrentModificationException
This class is kept for backward compatibility.
This class is kept for backward compatibility. Use io.delta.exceptions.ConcurrentDeleteReadException instead.
- class ConcurrentTransactionException extends io.delta.exceptions.DeltaConcurrentModificationException
This class is kept for backward compatibility.
This class is kept for backward compatibility. Use io.delta.exceptions.ConcurrentTransactionException instead.
- class ConcurrentWriteException extends io.delta.exceptions.DeltaConcurrentModificationException
This class is kept for backward compatibility.
This class is kept for backward compatibility. Use io.delta.exceptions.ConcurrentWriteException instead.
- case class DateFormatPartitionExpr(partitionColumn: String, format: String) extends OptimizablePartitionExpression with Product with Serializable
The rules for the generation expression
DATE_FORMAT(col, format), such as: DATE_FORMAT(timestamp, 'yyyy-MM'), DATE_FORMAT(timestamp, 'yyyy-MM-dd-HH')The rules for the generation expression
DATE_FORMAT(col, format), such as: DATE_FORMAT(timestamp, 'yyyy-MM'), DATE_FORMAT(timestamp, 'yyyy-MM-dd-HH')- partitionColumn
the partition column name using DATE_FORMAT in its generation expression.
- format
the
formatparameter of DATE_FORMAT in the generation expression. unix_timestamp('12345-12', 'yyyy-MM') | unix_timestamp('+12345-12', 'yyyy-MM') EXCEPTION fail | 327432240000 CORRECTED null | 327432240000 LEGACY 327432240000 | null
- case class DatePartitionExpr(partitionColumn: String) extends OptimizablePartitionExpression with Product with Serializable
The rules for the generation expression
CAST(col AS DATE). - case class DayPartitionExpr(dayPart: String) extends OptimizablePartitionExpression with Product with Serializable
This is a placeholder to catch
day(col)so that we can merge YearPartitionExpr, MonthPartitionExpr and DayPartitionExpr to YearMonthDayPartitionExpr.This is a placeholder to catch
day(col)so that we can merge YearPartitionExpr, MonthPartitionExpr and DayPartitionExpr to YearMonthDayPartitionExpr.- dayPart
the day partition column name.
- class DeltaAnalysis extends Rule[LogicalPlan] with AnalysisHelper with DeltaLogging
Analysis rules for Delta.
Analysis rules for Delta. Currently, these rules enable schema enforcement / evolution with INSERT INTO.
- class DeltaAnalysisException extends AnalysisException with DeltaThrowable
- class DeltaArithmeticException extends ArithmeticException with DeltaThrowable
- sealed trait DeltaBatchCDFSchemaMode extends AnyRef
Definitions for the batch read schema mode for CDF
- class DeltaChecksumException extends ChecksumException with DeltaThrowable
- trait DeltaColumnMappingBase extends DeltaLogging
- sealed trait DeltaColumnMappingMode extends AnyRef
A trait for Delta column mapping modes.
- class DeltaColumnMappingUnsupportedException extends ColumnMappingUnsupportedException with DeltaThrowable
- class DeltaCommandUnsupportedWithDeletionVectorsException extends UnsupportedOperationException with DeltaThrowable
- sealed trait DeltaCommitTag extends AnyRef
Marker trait for a commit tag used by delta.
- abstract class DeltaConcurrentModificationException extends ConcurrentModificationException
The basic class for all Tahoe commit conflict exceptions.
- case class DeltaConfig[T](key: String, defaultValue: String, fromString: (String) => T, validationFunction: (T) => Boolean, helpMessage: String, editable: Boolean = true, alternateKeys: Seq[String] = Seq.empty) extends Product with Serializable
- trait DeltaConfigsBase extends DeltaLogging
Contains list of reservoir configs and validation checks.
- case class DeltaDynamicPartitionOverwriteCommand(table: NamedRelation, deltaTable: DeltaTableV2, query: LogicalPlan, writeOptions: Map[String, String], isByName: Boolean, analyzedQuery: Option[LogicalPlan] = None) extends LogicalPlan with RunnableCommand with V2WriteCommand with Product with Serializable
A
RunnableCommandthat will execute dynamic partition overwrite using WriteIntoDelta.A
RunnableCommandthat will execute dynamic partition overwrite using WriteIntoDelta.This is a workaround of Spark not supporting V1 fallback for dynamic partition overwrite. Note the following details: - Extends
V2WriteCommmandso that Spark can transform this plan in the same as other commands likeAppendData. - Exposes the query as a child so that the Spark optimizer can optimize it. - trait DeltaErrorsBase extends DocsPath with DeltaLogging with QueryErrorsBase
A holder object for Delta errors.
A holder object for Delta errors.
IMPORTANT: Any time you add a test that references the docs, add to the Seq defined in DeltaErrorsSuite so that the doc links that are generated can be verified to work in docs.delta.io
- class DeltaFileAlreadyExistsException extends FileAlreadyExistsException with DeltaThrowable
- trait DeltaFileFormat extends AnyRef
- class DeltaFileNotFoundException extends FileNotFoundException with DeltaThrowable
- case class DeltaHistory(version: Option[Long], timestamp: Timestamp, userId: Option[String], userName: Option[String], operation: String, operationParameters: Map[String, String], job: Option[JobInfo], notebook: Option[NotebookInfo], clusterId: Option[String], readVersion: Option[Long], isolationLevel: Option[String], isBlindAppend: Option[Boolean], operationMetrics: Option[Map[String, String]], userMetadata: Option[String], engineInfo: Option[String]) extends CommitMarker with Product with Serializable
class describing the output schema of org.apache.spark.sql.delta.commands.DescribeDeltaHistoryCommand
- class DeltaHistoryManager extends DeltaLogging
This class keeps tracks of the version of commits and their timestamps for a Delta table to help with operations like describing the history of a table.
- class DeltaIOException extends IOException with DeltaThrowable
- class DeltaIllegalArgumentException extends IllegalArgumentException with DeltaThrowable
- class DeltaIllegalStateException extends IllegalStateException with DeltaThrowable
- class DeltaIndexOutOfBoundsException extends IndexOutOfBoundsException with DeltaThrowable
- class DeltaLog extends Checkpoints with MetadataCleanup with LogStoreProvider with SnapshotManagement with DeltaFileFormat with ProvidesUniFormConverters with ReadChecksum
Used to query the current state of the log as well as modify it by adding new atomic collections of actions.
Used to query the current state of the log as well as modify it by adding new atomic collections of actions.
Internally, this class implements an optimistic concurrency control algorithm to handle multiple readers or writers. Any single read is guaranteed to see a consistent snapshot of the table.
- case class DeltaLogFileIndex extends FileIndex with Logging with Product with Serializable
A specialized file index for files found in the _delta_log directory.
A specialized file index for files found in the _delta_log directory. By using this file index, we avoid any additional file listing, partitioning inference, and file existence checks when computing the state of a Delta table.
- class DeltaNoSuchTableException extends AnalysisException with DeltaThrowable
- trait DeltaOptionParser extends AnyRef
- class DeltaOptions extends DeltaWriteOptions with DeltaReadOptions with Serializable
Options for the Delta data source.
- case class DeltaParquetFileFormat(protocol: Protocol, metadata: Metadata, nullableRowTrackingFields: Boolean = false, optimizationsEnabled: Boolean = true, tablePath: Option[String] = None, isCDCRead: Boolean = false) extends ParquetFileFormat with Product with Serializable
A thin wrapper over the Parquet file format to support
A thin wrapper over the Parquet file format to support
- columns names without restrictions.
- populated a column from the deletion vector of this file (if exists) to indicate whether the row is deleted or not according to the deletion vector. Consumers of this scan can use the column values to filter out the deleted rows.
- class DeltaParquetWriteSupport extends ParquetWriteSupport
- class DeltaParseException extends ParseException with DeltaThrowable
- trait DeltaReadOptions extends DeltaOptionParser
- class DeltaRuntimeException extends RuntimeException with DeltaThrowable
- class DeltaSparkException extends SparkException with DeltaThrowable
- sealed trait DeltaStartingVersion extends AnyRef
Definitions for the starting version of a Delta stream.
- class DeltaStreamingColumnMappingSchemaIncompatibleException extends DeltaUnsupportedOperationException
Errors thrown when an operation is not supported with column mapping schema changes (rename / drop column).
Errors thrown when an operation is not supported with column mapping schema changes (rename / drop column).
To make compatible with existing behavior for those who accidentally has already used this operation, user should always be able to use
escapeConfigNameto fall back at own risk. - class DeltaTableFeatureException extends DeltaRuntimeException
- case class DeltaTableIdentifier(path: Option[String] = None, table: Option[TableIdentifier] = None) extends Product with Serializable
An identifier for a Delta table containing one of the path or the table identifier.
- class DeltaTablePropertyValidationFailedException extends RuntimeException with DeltaThrowable
- sealed trait DeltaTablePropertyValidationFailedSubClass extends AnyRef
- trait DeltaTableValueFunction extends LogicalPlan with UnresolvedLeafNode
Represents an unresolved Delta Table Value Function
- trait DeltaThrowable extends SparkThrowable
The trait for all exceptions of Delta code path.
- case class DeltaTimeTravelSpec(timestamp: Option[Expression], version: Option[Long], creationSource: Option[String]) extends DeltaLogging with Product with Serializable
The specification to time travel a Delta Table to the given
timestamporversion.The specification to time travel a Delta Table to the given
timestamporversion.- timestamp
An expression that can be evaluated into a timestamp. The expression cannot be a subquery.
- version
The version of the table to time travel to. Must be >= 0.
- creationSource
The API used to perform time travel, e.g.
atSyntax,dfReaderor SQL
- class DeltaUnsupportedOperationException extends UnsupportedOperationException with DeltaThrowable
- case class DeltaUnsupportedOperationsCheck(spark: SparkSession) extends (LogicalPlan) => Unit with DeltaLogging with Product with Serializable
A rule to add helpful error messages when Delta is being used with unsupported Hive operations or if an unsupported operation is being made, e.g.
A rule to add helpful error messages when Delta is being used with unsupported Hive operations or if an unsupported operation is being made, e.g. a DML operation like INSERT/UPDATE/DELETE/MERGE when a table doesn't exist.
- case class DeltaUnsupportedTableFeatureException(errorClass: String, tableNameOrPath: String, unsupported: Iterable[String]) extends DeltaTableFeatureException with Product with Serializable
- trait DeltaWriteOptions extends DeltaWriteOptionsImpl with DeltaOptionParser
- trait DeltaWriteOptionsImpl extends DeltaOptionParser
- trait DocsPath extends AnyRef
- sealed trait FeatureAutomaticallyEnabledByMetadata extends AnyRef
A trait indicating this feature can be automatically enabled via a change in a table's metadata, e.g., through setting particular values of certain feature-specific table properties.
A trait indicating this feature can be automatically enabled via a change in a table's metadata, e.g., through setting particular values of certain feature-specific table properties.
When the feature's metadata requirements are satisfied for new tables, or for existing tables when [[automaticallyUpdateProtocolOfExistingTables]] set to `true`, the client will silently add the feature to the protocol's
readerFeaturesand/orwriterFeatures. Otherwise, a proper protocol version bump must be present in the same transaction. - case class HourPartitionExpr(hourPart: String) extends OptimizablePartitionExpression with Product with Serializable
This is a placeholder to catch
hour(col)so that we can merge YearPartitionExpr, MonthPartitionExpr, DayPartitionExpr and HourPartitionExpr to YearMonthDayHourPartitionExpr. - case class IcebergCompat(version: Integer, config: DeltaConfig[Option[Boolean]], requiredTableFeatures: Seq[TableFeature], requiredTableProperties: Seq[RequiredDeltaTableProperty[_]], checks: Seq[IcebergCompatCheck]) extends DeltaLogging with Product with Serializable
All IcebergCompatVx should extend from this base class
All IcebergCompatVx should extend from this base class
- version
the compat version number
- config
the DeltaConfig for this IcebergCompat version
- requiredTableFeatures
a list of table features it relies on
- requiredTableProperties
a list of table properties it relies on. See RequiredDeltaTableProperty
- checks
a list of checks this IcebergCompatVx will perform.
- See also
- trait IcebergCompatCheck extends (IcebergCompatContext) => Unit
- case class IcebergCompatContext(prevSnapshot: Snapshot, newestProtocol: Protocol, newestMetadata: Metadata, isCreatingOrReorgTable: Boolean, actions: Seq[Action], tableId: String, version: Integer) extends Product with Serializable
- case class IdentityPartitionExpr(partitionColumn: String) extends OptimizablePartitionExpression with Product with Serializable
The rules for the generation of identity expressions, used for partitioning on a nested column.
The rules for the generation of identity expressions, used for partitioning on a nested column. Note: - Writing an empty string to a partition column would become
null(SPARK-24438) so generated partition filters always pick up thenullpartition for safety.- partitionColumn
the partition column name used in the generation expression.
- case class InCommitTimestampsPreDowngradeCommand(table: DeltaTableV2) extends PreDowngradeTableFeatureCommand with DeltaLogging with Product with Serializable
- class InitialSnapshot extends Snapshot
An initial snapshot with only metadata specified.
An initial snapshot with only metadata specified. Useful for creating a DataFrame from an existing parquet table during its conversion to delta.
- case class InvalidProtocolVersionException(tableNameOrPath: String, readerRequiredVersion: Int, writerRequiredVersion: Int, supportedReaderVersions: Seq[Int], supportedWriterVersions: Seq[Int]) extends RuntimeException with DeltaThrowable with Product with Serializable
Thrown when the protocol version of a table is greater than supported by this client.
- sealed trait IsolationLevel extends AnyRef
Trait that defines the level consistency guarantee is going to be provided by
OptimisticTransaction.commit().Trait that defines the level consistency guarantee is going to be provided by
OptimisticTransaction.commit(). Serializable is the most strict level and SnapshotIsolation is the least strict one.- See also
IsolationLevel.allLevelsInDescOrder for all the levels in the descending order of strictness and IsolationLevel.DEFAULT for the default table isolation level.
- trait JsonMetadataDomain[T] extends AnyRef
A trait for capturing metadata domain of type T.
- abstract class JsonMetadataDomainUtils[T] extends AnyRef
- case class LastCheckpointInfo(version: Long, size: Long, parts: Option[Int], sizeInBytes: Option[Long], numOfAddFiles: Option[Long], checkpointSchema: Option[StructType], v2Checkpoint: Option[LastCheckpointV2] = None, checksum: Option[String] = None) extends Product with Serializable
Records information about a checkpoint.
Records information about a checkpoint.
This class provides the checksum validation logic, needed to ensure that content of LAST_CHECKPOINT file points to a valid json. The readers might read some part from old file and some part from the new file (if the file is read across multiple requests). In some rare scenarios, the split read might produce a valid json and readers will be able to parse it and convert it into a LastCheckpointInfo object that contains invalid data. In order to prevent using it, we do a checksum match on the read json to validate that it is consistent.
For old Delta versions, which do not have checksum logic, we want to make sure that the old fields (i.e. version, size, parts) are together in the beginning of last_checkpoint json. All these fields together are less than 50 bytes, so even in split read scenario, we want to make sure that old delta readers which do not do have checksum validation logic, gets all 3 fields from one read request. For this reason, we use
JsonPropertyOrderto force them in the beginning together.- version
the version of this checkpoint
- size
the number of actions in the checkpoint, -1 if the information is unavailable.
- parts
the number of parts when the checkpoint has multiple parts. None if this is a singular checkpoint
- sizeInBytes
the number of bytes of the checkpoint
- numOfAddFiles
the number of AddFile actions in the checkpoint
- checkpointSchema
the schema of the underlying checkpoint files
- checksum
the checksum of the LastCheckpointInfo.
- Annotations
- @JsonPropertyOrder()
- case class LastCheckpointV2(path: String, sizeInBytes: Long, modificationTime: Long, nonFileActions: Option[Seq[SingleAction]], sidecarFiles: Option[Seq[SidecarFile]]) extends Product with Serializable
Information about the V2 Checkpoint in the LAST_CHECKPOINT file
Information about the V2 Checkpoint in the LAST_CHECKPOINT file
- path
file name corresponding to the uuid-named v2 checkpoint
- sizeInBytes
size in bytes for the uuid-named v2 checkpoint
- modificationTime
modification time for the uuid-named v2 checkpoint
- nonFileActions
all non file actions for the v2 checkpoint. This info may or may not be available. A None value means that info is missing. If it is not None, then it should have all the non-FileAction corresponding to the checkpoint.
- sidecarFiles
sidecar files corresponding to the v2 checkpoint. This info may or may not be available. A None value means that this info is missing. An empty list denotes that the v2 checkpoint has no sidecars.
- abstract class LazyCompleteCheckpointProvider extends CheckpointProvider
A wrapper implementation of CheckpointProvider which wraps
underlyingCheckpointProviderFutureanduninitializedCheckpointProviderfor implementing all the UninitializedCheckpointProvider and CheckpointProvider APIs. - sealed trait LegacyFeatureType extends AnyRef
A trait to indicate a feature is legacy, i.e., released before Table Features.
- sealed abstract class LegacyReaderWriterFeature extends LegacyWriterFeature with ReaderWriterFeatureType
A base class for all legacy writer-only table features.
- sealed abstract class LegacyWriterFeature extends TableFeature with LegacyFeatureType
A base class for all table legacy writer-only features.
- case class LogSegment(logPath: Path, version: Long, deltas: Seq[FileStatus], checkpointProvider: UninitializedCheckpointProvider, lastCommitFileModificationTimestamp: Long) extends Product with Serializable
Provides information around which files in the transaction log need to be read to create the given version of the log.
Provides information around which files in the transaction log need to be read to create the given version of the log.
- logPath
The path to the _delta_log directory
- version
The Snapshot version to generate
- deltas
The delta commit files (.json) to read
- checkpointProvider
provider to give information about Checkpoint files.
- lastCommitFileModificationTimestamp
The "unadjusted" file modification timestamp of the last commit within this segment. By unadjusted, we mean that the commit timestamps may not necessarily be monotonically increasing for the commits within this segment.
- abstract class MaterializedRowTrackingColumn extends AnyRef
Represents a materialized row tracking column.
Represents a materialized row tracking column. Concrete implementations are MaterializedRowId and MaterializedRowCommitVersion.
- class MetadataChangedException extends io.delta.exceptions.DeltaConcurrentModificationException
This class is kept for backward compatibility.
This class is kept for backward compatibility. Use io.delta.exceptions.MetadataChangedException instead.
- trait MetadataCleanup extends DeltaLogging
Cleans up expired Delta table metadata.
- class MetadataMismatchErrorBuilder extends AnyRef
A helper class in building a helpful error message in case of metadata mismatches.
- case class MonthPartitionExpr(monthPart: String) extends OptimizablePartitionExpression with Product with Serializable
This is a placeholder to catch
month(col)so that we can merge YearPartitionExpr and MonthPartitionExprto YearMonthDayPartitionExpr.This is a placeholder to catch
month(col)so that we can merge YearPartitionExpr and MonthPartitionExprto YearMonthDayPartitionExpr.- monthPart
the month partition column name.
- class OptimisticTransaction extends OptimisticTransactionImpl with DeltaLogging
Used to perform a set of reads in a transaction and then commit a set of updates to the state of the log.
Used to perform a set of reads in a transaction and then commit a set of updates to the state of the log. All reads from the DeltaLog, MUST go through this instance rather than directly to the DeltaLog otherwise they will not be check for logical conflicts with concurrent updates.
This class is not thread-safe.
- trait OptimisticTransactionImpl extends TransactionalWrite with SQLMetricsReporting with DeltaScanGenerator with DeltaLogging
Used to perform a set of reads in a transaction and then commit a set of updates to the state of the log.
Used to perform a set of reads in a transaction and then commit a set of updates to the state of the log. All reads from the DeltaLog, MUST go through this instance rather than directly to the DeltaLog otherwise they will not be check for logical conflicts with concurrent updates.
This trait is not thread-safe.
- sealed trait OptimizablePartitionExpression extends AnyRef
Defines rules to convert a data filter to a partition filter for a special generation expression of a partition column.
Defines rules to convert a data filter to a partition filter for a special generation expression of a partition column.
Note: - This may be shared cross multiple
SparkSessions, implementations should not store any state (such as expressions) referring to a specificSparkSession. - Partition columns may have different behaviors than data columns. For example, writing an empty string to a partition column would becomenull(SPARK-24438). We need to pay attention to these slight behavior differences and make sure applying the auto generated partition filters would still return the same result as if they were not applied. - case class PostHocResolveUpCast(spark: SparkSession) extends Rule[LogicalPlan] with Product with Serializable
Post-hoc resolution rules PreprocessTableMerge and PreprocessTableUpdate may introduce new unresolved UpCast expressions that won't be resolved by ResolveUpCast that ran in the previous resolution phase.
Post-hoc resolution rules PreprocessTableMerge and PreprocessTableUpdate may introduce new unresolved UpCast expressions that won't be resolved by ResolveUpCast that ran in the previous resolution phase. This rule ensures these UpCast expressions get resolved in the Post-hoc resolution phase.
Note: we can't inject ResolveUpCast directly because we need an initialized analyzer instance for that which is not available at the time Delta rules are injected. PostHocResolveUpCast is delaying the access to the analyzer until after it's initialized.
- sealed abstract class PreDowngradeTableFeatureCommand extends AnyRef
A base class for implementing a preparation command for removing table features.
A base class for implementing a preparation command for removing table features. Must implement a run method. Note, the run method must be implemented in a way that when it finishes, the table does not use the feature that is being removed, and nobody is allowed to start using it again implicitly. One way to achieve this is by disabling the feature on the table before proceeding to the actual removal. See RemovableFeature.preDowngradeCommand.
- case class PreloadedCheckpointProvider(topLevelFiles: Seq[FileStatus], lastCheckpointInfoOpt: Option[LastCheckpointInfo]) extends CheckpointProvider with DeltaLogging with Product with Serializable
An implementation of CheckpointProvider where the information about checkpoint files (i.e.
An implementation of CheckpointProvider where the information about checkpoint files (i.e. Seq[FileStatus]) is already known in advance.
- topLevelFiles
- file statuses that describes the checkpoint
- lastCheckpointInfoOpt
- optional LastCheckpointInfo corresponding to this checkpoint. This comes from _last_checkpoint file
- case class PreprocessTableDelete(sqlConf: SQLConf) extends Rule[LogicalPlan] with Product with Serializable
Preprocess the DeltaDelete plan to convert to DeleteCommand.
- case class PreprocessTableMerge(conf: SQLConf) extends Rule[LogicalPlan] with UpdateExpressionsSupport with Product with Serializable
- case class PreprocessTableUpdate(sqlConf: SQLConf) extends Rule[LogicalPlan] with UpdateExpressionsSupport with Product with Serializable
Preprocesses the DeltaUpdateTable logical plan before converting it to UpdateCommand.
Preprocesses the DeltaUpdateTable logical plan before converting it to UpdateCommand. - Adjusts the column order, which could be out of order, based on the destination table - Generates expressions to compute the value of all target columns in Delta table, while taking into account that the specified SET clause may only update some columns or nested fields of columns.
- trait PreprocessTableWithDVs extends SubqueryTransformerHelper
Plan transformer to inject a filter that removes the rows marked as deleted according to deletion vectors.
Plan transformer to inject a filter that removes the rows marked as deleted according to deletion vectors. For tables with no deletion vectors, this transformation has no effect.
It modifies for plan for tables with deletion vectors as follows: Before rule: <Parent Node> -> Delta Scan (key, value).
- Here we are reading
key,valuecolumns from the Delta table After rule: <Parent Node> -> Project(key, value) -> Filter (skip_row == 0) -> Delta Scan (key, value, skip_row) - Here we insert a new column
skip_rowin Delta scan. This value is populated by the Parquet reader using the DV corresponding to the Parquet file read (See DeltaParquetFileFormat) and it contains 0 if we want to keep the row. - Filter created filters out rows with skip_row equals to 0
- And at the end we have a Project to keep the plan node output same as before the rule is applied.
- Here we are reading
- case class PreprocessTableWithDVsStrategy(session: SparkSession) extends Strategy with PreprocessTableWithDVs with Product with Serializable
Strategy to process tables with DVs and add the skip row column and filters.
Strategy to process tables with DVs and add the skip row column and filters.
This strategy will apply all transformations needed to tables with DVs and delegate to FileSourceStrategy to create the final plan. The DV filter will be the bottom-most filter in the plan and so it will be pushed down to the FileSourceScanExec at the beginning of the filter list.
- case class PreprocessTimeTravel(sparkSession: SparkSession) extends Rule[LogicalPlan] with Product with Serializable
Resolves the UnresolvedRelation in command 's child TimeTravel.
Resolves the UnresolvedRelation in command 's child TimeTravel. Currently Delta depends on Spark 3.2 which does not resolve the UnresolvedRelation in TimeTravel. Once Delta upgrades to Spark 3.3, this code can be removed.
TODO: refactoring this analysis using Spark's native TimeTravelRelation logical plan
- class ProtocolChangedException extends io.delta.exceptions.DeltaConcurrentModificationException
This class is kept for backward compatibility.
This class is kept for backward compatibility. Use io.delta.exceptions.ProtocolChangedException instead.
- class ProtocolDowngradeException extends RuntimeException with DeltaThrowable
- trait ProvidesUniFormConverters extends AnyRef
- trait ReadChecksum extends DeltaLogging
Read checksum files.
- sealed abstract class ReaderWriterFeature extends WriterFeature with ReaderWriterFeatureType
A base class for all reader-writer table features that can only be explicitly supported.
- sealed trait ReaderWriterFeatureType extends AnyRef
A trait to indicate a feature applies to readers and writers.
- trait RecordChecksum extends DeltaLogging
Record the state of the table as a checksum file along with a commit.
- sealed trait RemovableFeature extends AnyRef
A trait indicating a feature can be removed.
A trait indicating a feature can be removed. Classes that extend the trait need to implement the following three functions:
a) preDowngradeCommand. This is where all required actions for removing the feature are implemented. For example, to remove the DVs feature we need to remove metadata config and purge all DVs from table. This action takes place before the protocol downgrade in separate commit(s). Note, the command needs to be implemented in a way concurrent transactions do not nullify the effect. For example, disabling DVs on a table before purging will stop concurrent transactions from adding DVs. During protocol downgrade we perform a validation in validateRemoval to make sure all invariants still hold.
b) validateRemoval. Add any feature-specific checks before proceeding to the protocol downgrade. This function is guaranteed to be called at the latest version before the protocol downgrade is committed to the table. When the protocol downgrade txn conflicts, the validation is repeated against the winning txn snapshot. As soon as the protocol downgrade succeeds, all subsequent interleaved txns are aborted.
c) actionUsesFeature. For reader+writer features we check whether past versions contain any traces of the removed feature. This is achieved by calling actionUsesFeature for every action of every reachable commit version in the log. Note, a feature may leave traces in both data and metadata. Depending on the feature, we need to check several types of actions such as Metadata, AddFile, RemoveFile etc. Writer features should directly return false.
WARNING: actionUsesFeature should not check Protocol actions for the feature being removed, because at the time actionUsesFeature is invoked the protocol downgrade did not happen yet. Thus, the feature-to-remove is still active. As a result, any unrelated operations that produce a protocol action (while we are waiting for the retention period to expire) will "carry" the feature-to-remove. Checking protocol for that feature would result in an unnecessary failure during the history validation of the next DROP FEATURE call. Note, while the feature-to-remove is supported in the protocol we cannot generate a legit protocol action that adds support for that feature since it is already supported.
- case class RequiredDeltaTableProperty[T](deltaConfig: DeltaConfig[T], validator: (T) => Boolean, autoSetValue: String) extends Product with Serializable
Wrapper class for table property validation
Wrapper class for table property validation
- deltaConfig
DeltaConfig we are checking
- validator
A generic method to validate the given value
- autoSetValue
The value to set if we can auto-set this value (e.g. during table creation)
- case class ResolveDeltaPathTable(sparkSession: SparkSession) extends Rule[LogicalPlan] with Product with Serializable
Replaces UnresolvedTables if the plan is for direct query on files.
- case class ResolvedPathBasedNonDeltaTable(path: String, options: Map[String, String], commandName: String) extends LogicalPlan with LeafNode with Product with Serializable
This operator is a placeholder that identifies a non-Delta path-based table.
This operator is a placeholder that identifies a non-Delta path-based table. Given the fact that some Delta commands (e.g. DescribeDeltaDetail) support non-Delta table, we introduced ResolvedPathBasedNonDeltaTable as the resolved placeholder after analysis on a non delta path from UnresolvedPathBasedTable.
- trait RowIndexFilter extends AnyRef
Provides filtering information for each row index within given range.
Provides filtering information for each row index within given range. Specific filters are implemented in subclasses.
- sealed final class RowIndexFilterType extends Enum[RowIndexFilterType]
Filter types corresponding to every row index filter implementations.
- case class SerializableFileStatus(path: String, length: Long, isDir: Boolean, modificationTime: Long) extends Product with Serializable
A serializable variant of HDFS's FileStatus.
- class Snapshot extends SnapshotDescriptor with SnapshotStateManager with StateCache with StatisticsCollection with DataSkippingReader with DeltaLogging
An immutable snapshot of the state of the log at some delta version.
An immutable snapshot of the state of the log at some delta version. Internally this class manages the replay of actions stored in checkpoint or delta files.
After resolving any new actions, it caches the result and collects the following basic information to the driver:
- Protocol Version
- Metadata
- Transaction state
- trait SnapshotDescriptor extends AnyRef
A description of a Delta Snapshot, including basic information such its DeltaLog metadata, protocol, and version.
- trait SnapshotManagement extends AnyRef
Manages the creation, computation, and access of Snapshot's for Delta tables.
Manages the creation, computation, and access of Snapshot's for Delta tables. Responsibilities include:
- Figuring out the set of files that are required to compute a specific version of a table
- Updating and exposing the latest snapshot of the Delta table in a thread-safe manner
- case class SnapshotState(sizeInBytes: Long, numOfSetTransactions: Long, numOfFiles: Long, numOfRemoves: Long, numOfMetadata: Long, numOfProtocol: Long, setTransactions: Seq[SetTransaction], domainMetadata: Seq[DomainMetadata], metadata: Metadata, protocol: Protocol, fileSizeHistogram: Option[FileSizeHistogram] = None) extends Product with Serializable
Metrics and metadata computed around the Delta table.
Metrics and metadata computed around the Delta table.
- sizeInBytes
The total size of the table (of active files, not including tombstones).
- numOfSetTransactions
Number of streams writing to this table.
- numOfFiles
The number of files in this table.
- numOfRemoves
The number of tombstones in the state.
- numOfMetadata
The number of metadata actions in the state. Should be 1.
- numOfProtocol
The number of protocol actions in the state. Should be 1.
- setTransactions
The streaming queries writing to this table.
- metadata
The metadata of the table.
- protocol
The protocol version of the Delta table.
- fileSizeHistogram
A Histogram class tracking the file counts and total bytes in different size ranges.
- trait SnapshotStateManager extends DeltaLogging
A helper class that manages the SnapshotState for a given snapshot.
A helper class that manages the SnapshotState for a given snapshot. Will generate it only when necessary.
- case class StartingVersion(version: Long) extends DeltaStartingVersion with Product with Serializable
- trait SubqueryTransformerHelper extends AnyRef
Trait to allow processing a special transformation of SubqueryExpression instances in a query plan.
- case class SubstringPartitionExpr(partitionColumn: String, substringPos: Int, substringLen: Int) extends OptimizablePartitionExpression with Product with Serializable
The rules for the generation expression
SUBSTRING(col, pos, len).The rules for the generation expression
SUBSTRING(col, pos, len). Note: - Writing an empty string to a partition column would becomenull(SPARK-24438) so generated partition filters always pick up thenullpartition for safety. - Whenposis 0, we also support optimizations for comparison operators. Whenposis not 0, we only support optimizations for EqualTo.- partitionColumn
the partition column name using SUBSTRING in its generation expression.
- substringPos
the
posparameter of SUBSTRING in the generation expression.- substringLen
the
lenparameter of SUBSTRING in the generation expression.
- case class TableChanges(child: LogicalPlan, fnName: String, cdcAttr: Seq[Attribute] = CDCReader.cdcAttributes) extends LogicalPlan with UnaryNode with Product with Serializable
- sealed abstract class TableFeature extends Serializable
A base class for all table features.
A base class for all table features.
A feature can be explicitly supported by a table's protocol when the protocol contains a feature's
name. Writers (for writer-only features) or readers and writers (for reader-writer features) must recognize supported features and must handle them appropriately.A table feature that released before Delta Table Features (reader version 3 and writer version 7) is considered as a legacy feature. Legacy features are implicitly supported when (a) the protocol does not support table features, i.e., has reader version less than 3 or writer version less than 7 and (b) the feature's minimum reader/writer version is less than or equal to the current protocol's reader/writer version.
Separately, a feature can be automatically supported by a table's metadata when certain feature-specific table properties are set. For example,
changeDataFeedis automatically supported when there's a table propertydelta.enableChangeDataFeed=true. This is independent of the table's enabled features. When a feature is supported (explicitly or implicitly) by the table protocol but its metadata requirements are not satisfied, then clients still have to understand the feature (at least to the extent that they can read and preserve the existing data in the table that uses the feature). See the documentation of FeatureAutomaticallyEnabledByMetadata for more information. - case class TestLegacyReaderWriterFeaturePreDowngradeCommand(table: DeltaTableV2) extends PreDowngradeTableFeatureCommand with Product with Serializable
- case class TestLegacyWriterFeaturePreDowngradeCommand(table: DeltaTableV2) extends PreDowngradeTableFeatureCommand with Product with Serializable
- case class TestReaderWriterFeaturePreDowngradeCommand(table: DeltaTableV2) extends PreDowngradeTableFeatureCommand with DeltaLogging with Product with Serializable
- case class TestWriterFeaturePreDowngradeCommand(table: DeltaTableV2) extends PreDowngradeTableFeatureCommand with DeltaLogging with Product with Serializable
- case class TimestampTruncPartitionExpr(format: String, partitionColumn: String) extends OptimizablePartitionExpression with Product with Serializable
The rules for the generation expression
date_trunc(field, col). - trait TransactionExecutionObserver extends AnyRef
Track different stages of the execution of a transaction.
Track different stages of the execution of a transaction.
This is mostly meant for test instrumentation.
The default is a no-op implementation.
- case class TruncDatePartitionExpr(partitionColumn: String, format: String) extends OptimizablePartitionExpression with Product with Serializable
The rules for generation expression that use the function trunc(col, format) such as trunc(timestamp, 'year'), trunc(date, 'week') and trunc(timestampStr, 'hour')
The rules for generation expression that use the function trunc(col, format) such as trunc(timestamp, 'year'), trunc(date, 'week') and trunc(timestampStr, 'hour')
- partitionColumn
partition column using trunc function in the generation expression
- format
the format that specifies the unit of truncation applied to the partitionColumn
- case class TypeWideningPreDowngradeCommand(table: DeltaTableV2) extends PreDowngradeTableFeatureCommand with DeltaLogging with Product with Serializable
- trait UninitializedCheckpointProvider extends AnyRef
Represents basic information about a checkpoint.
Represents basic information about a checkpoint. This is the info we always can know about a checkpoint, without doing any additional I/O.
- case class UninitializedV1OrV2ParquetCheckpointProvider(version: Long, fileStatus: FileStatus, logPath: Path, lastCheckpointInfoOpt: Option[LastCheckpointInfo]) extends UninitializedV2LikeCheckpointProvider with Product with Serializable
An implementation of UninitializedCheckpointProvider to represent a parquet checkpoint which could be either a v1 checkpoint or v2 checkpoint.
An implementation of UninitializedCheckpointProvider to represent a parquet checkpoint which could be either a v1 checkpoint or v2 checkpoint. This needs to be resolved into a PreloadedCheckpointProvider or a V2CheckpointProvider depending on whether the CheckpointMetadata action is present or not in the underlying parquet file.
- case class UninitializedV2CheckpointProvider(version: Long, fileStatus: FileStatus, logPath: Path, hadoopConf: Configuration, deltaLogOptions: Map[String, String], logStore: LogStore, lastCheckpointInfoOpt: Option[LastCheckpointInfo]) extends UninitializedV2LikeCheckpointProvider with Product with Serializable
An implementation of UninitializedCheckpointProvider to for v2 checkpoints.
An implementation of UninitializedCheckpointProvider to for v2 checkpoints. This needs to be resolved into a V2CheckpointProvider. This class starts an I/O to fetch the V2 actions (CheckpointMetadata, SidecarFile) as soon as the class is initialized so that the extra overhead could be parallelized with other operations like reading CRC.
- trait UninitializedV2LikeCheckpointProvider extends UninitializedCheckpointProvider
A trait representing a v2 UninitializedCheckpointProvider
- abstract class UniversalFormatConverter extends AnyRef
Class to facilitate the conversion of Delta into other table formats.
- case class UnresolvedPathBasedDeltaTable(path: String, options: Map[String, String], commandName: String) extends UnresolvedPathBasedDeltaTableBase with Product with Serializable
Resolves to a ResolvedTable if the DeltaTable exists
- sealed abstract class UnresolvedPathBasedDeltaTableBase extends LogicalPlan with UnresolvedLeafNode
- case class UnresolvedPathBasedDeltaTableRelation(path: String, options: CaseInsensitiveStringMap) extends UnresolvedPathBasedDeltaTableBase with Product with Serializable
Resolves to a DataSourceV2Relation if the DeltaTable exists
- case class UnresolvedPathBasedTable(path: String, options: Map[String, String], commandName: String) extends LogicalPlan with LeafNode with Product with Serializable
This operator represents path-based tables in general including both Delta or non-Delta tables.
This operator represents path-based tables in general including both Delta or non-Delta tables. It resolves to a ResolvedTable if the path is for delta table, ResolvedPathBasedNonDeltaTable if the path is for a non-Delta table.
- trait UpdateExpressionsSupport extends SQLConfHelper with AnalysisHelper with DeltaLogging
Trait with helper functions to generate expressions to update target columns, even if they are nested fields.
- case class V2CheckpointPreDowngradeCommand(table: DeltaTableV2) extends PreDowngradeTableFeatureCommand with DeltaLogging with Product with Serializable
- case class V2CheckpointProvider(version: Long, v2CheckpointFile: FileStatus, v2CheckpointFormat: Format, checkpointMetadata: CheckpointMetadata, sidecarFiles: Seq[SidecarFile], lastCheckpointInfoOpt: Option[LastCheckpointInfo], logPath: Path) extends CheckpointProvider with DeltaLogging with Product with Serializable
CheckpointProvider implementation for Json/Parquet V2 checkpoints.
CheckpointProvider implementation for Json/Parquet V2 checkpoints.
- version
checkpoint version for the underlying checkpoint
- v2CheckpointFile
FileStatus for the json/parquet v2 checkpoint file
- v2CheckpointFormat
format (json/parquet) for the v2 checkpoint
- checkpointMetadata
CheckpointMetadata for the v2 checkpoint
- sidecarFiles
seq of SidecarFile for the v2 checkpoint
- lastCheckpointInfoOpt
optional last checkpoint info for the v2 checkpoint
- logPath
delta log path for the underlying delta table
- case class VersionChecksum(txnId: Option[String], tableSizeBytes: Long, numFiles: Long, numMetadata: Long, numProtocol: Long, inCommitTimestampOpt: Option[Long], setTransactions: Option[Seq[SetTransaction]], domainMetadata: Option[Seq[DomainMetadata]], metadata: Metadata, protocol: Protocol, histogramOpt: Option[FileSizeHistogram], allFiles: Option[Seq[AddFile]]) extends Product with Serializable
Stats calculated within a snapshot, which we store along individual transactions for verification.
Stats calculated within a snapshot, which we store along individual transactions for verification.
- txnId
Optional transaction identifier
- tableSizeBytes
The size of the table in bytes
- numFiles
Number of
AddFileactions in the snapshot- numMetadata
Number of
Metadataactions in the snapshot- numProtocol
Number of
Protocolactions in the snapshot- histogramOpt
Optional file size histogram
- case class VersionNotFoundException(userVersion: Long, earliest: Long, latest: Long) extends AnalysisException with Product with Serializable
Thrown when time travelling to a version that does not exist in the Delta Log.
Thrown when time travelling to a version that does not exist in the Delta Log.
- userVersion
- the version time travelling to
- earliest
- earliest version available in the Delta Log
- latest
- The latest version available in the Delta Log
- sealed abstract class WriterFeature extends TableFeature
A base class for all writer-only table features that can only be explicitly supported.
- case class YearMonthDayHourPartitionExpr(yearPart: String, monthPart: String, dayPart: String, hourPart: String) extends OptimizablePartitionExpression with Product with Serializable
Optimize the case that four partition columns uses YEAR, MONTH, DAY and HOUR using the same column, such as
YEAR(eventTime),MONTH(eventTime),DAY(eventTime),HOUR(eventTime).Optimize the case that four partition columns uses YEAR, MONTH, DAY and HOUR using the same column, such as
YEAR(eventTime),MONTH(eventTime),DAY(eventTime),HOUR(eventTime).- yearPart
the year partition column name
- monthPart
the month partition column name
- dayPart
the day partition column name
- hourPart
the hour partition column name
- case class YearMonthDayPartitionExpr(yearPart: String, monthPart: String, dayPart: String) extends OptimizablePartitionExpression with Product with Serializable
Optimize the case that three partition columns uses YEAR, MONTH and DAY using the same column, such as
YEAR(eventTime),MONTH(eventTime)andDAY(eventTime).Optimize the case that three partition columns uses YEAR, MONTH and DAY using the same column, such as
YEAR(eventTime),MONTH(eventTime)andDAY(eventTime).- yearPart
the year partition column name
- monthPart
the month partition column name
- dayPart
the day partition column name
- case class YearMonthPartitionExpr(yearPart: String, monthPart: String) extends OptimizablePartitionExpression with Product with Serializable
Optimize the case that two partition columns uses YEAR and MONTH using the same column, such as
YEAR(eventTime)andMONTH(eventTime).Optimize the case that two partition columns uses YEAR and MONTH using the same column, such as
YEAR(eventTime)andMONTH(eventTime).- yearPart
the year partition column name
- monthPart
the month partition column name
- case class YearPartitionExpr(yearPart: String) extends OptimizablePartitionExpression with Product with Serializable
The rules for the generation expression
YEAR(col).The rules for the generation expression
YEAR(col).- yearPart
the year partition column name.
Value Members
- object AllowColumnDefaultsTableFeature extends WriterFeature
This table feature represents support for column DEFAULT values for Delta Lake.
This table feature represents support for column DEFAULT values for Delta Lake. With this feature, it is possible to assign default values to columns either at table creation time or later by using commands of the form: ALTER TABLE t ALTER COLUMN c SET DEFAULT v. Thereafter, queries from the table will return the specified default value instead of NULL when the corresponding field is not present in storage.
We create this as a writer-only feature rather than a reader/writer feature in order to simplify the query execution implementation for scanning Delta tables. This means that commands of the following form are not allowed: ALTER TABLE t ADD COLUMN c DEFAULT v. The reason is that when commands of that form execute (such as for other data sources like CSV or JSON), then the data source scan implementation must take responsibility to return the supplied default value for all rows, including those previously present in the table before the command executed. We choose to avoid this complexity for Delta table scans, so we make this a writer-only feature instead. Therefore, the analyzer can take care of the entire job when processing commands that introduce new rows into the table by injecting the column default value (if present) into the corresponding query plan. This comes at the expense of preventing ourselves from easily adding a default value to an existing non-empty table, because all data files would need to be rewritten to include the new column value in an expensive backfill.
- object AppendDelta
- object AppendOnlyTableFeature extends LegacyWriterFeature with FeatureAutomaticallyEnabledByMetadata
- case object BatchCDFSchemaEndVersion extends DeltaBatchCDFSchemaMode with Product with Serializable
endVersionbatch CDF schema mode specifies that the query range's end version's schema should be used for serving the CDF batch.endVersionbatch CDF schema mode specifies that the query range's end version's schema should be used for serving the CDF batch. This is the current default for column mapping enabled tables so we could read using the exact schema at the versions being queried to reduce schema read compatibility mismatches. - case object BatchCDFSchemaLatest extends DeltaBatchCDFSchemaMode with Product with Serializable
latestbatch CDF schema mode specifies that the latest schema should be used when serving the CDF batch. - case object BatchCDFSchemaLegacy extends DeltaBatchCDFSchemaMode with Product with Serializable
legacybatch CDF schema mode specifies that neither latest nor end version's schema is strictly used for serving the CDF batch, e.g.legacybatch CDF schema mode specifies that neither latest nor end version's schema is strictly used for serving the CDF batch, e.g. when user uses TimeTravel with batch CDF and wants to respect the time travelled schema. This is the current default for non-column mapping tables. - object ChangeDataFeedTableFeature extends LegacyWriterFeature with FeatureAutomaticallyEnabledByMetadata
- object CheckAddFileHasStats extends IcebergCompatCheck
- object CheckConstraintsTableFeature extends LegacyWriterFeature with FeatureAutomaticallyEnabledByMetadata
- object CheckNoDeletionVector extends IcebergCompatCheck
- object CheckNoListMapNullType extends IcebergCompatCheck
- object CheckNoPartitionEvolution extends IcebergCompatCheck
- object CheckOnlySingleVersionEnabled extends IcebergCompatCheck
Checks that ensures no more than one IcebergCompatVx is enabled.
- object CheckTypeInV2AllowList extends IcebergCompatCheck
- object CheckVersionChangeNeedsRewrite extends IcebergCompatCheck
Check if change IcebergCompat version needs a REORG operation
- object CheckpointInstance extends Serializable
- object CheckpointPolicy
- object CheckpointProvider extends DeltaLogging
- object Checkpoints extends DeltaLogging
- object ClusteringTableFeature extends WriterFeature
Clustering table feature is enabled when a table is created with CLUSTER BY clause.
- object ColumnMappingTableFeature extends LegacyReaderWriterFeature with FeatureAutomaticallyEnabledByMetadata
- object ColumnWithDefaultExprUtils extends DeltaLogging
Provide utilities to handle columns with default expressions.
- object ConcurrencyHelpers
- object DefaultRowCommitVersion
- object DeletionVectorsTableFeature extends ReaderWriterFeature with FeatureAutomaticallyEnabledByMetadata
- object DeltaBatchCDFSchemaMode
- object DeltaColumnMapping extends DeltaColumnMappingBase
- object DeltaColumnMappingMode
- object DeltaCommitTag
- object DeltaConfigs extends DeltaConfigsBase
- object DeltaErrors extends DeltaErrorsBase
- object DeltaFileProviderUtils
- object DeltaFullTable
Extractor Object for pulling out the full table scan of a Delta table.
- object DeltaHistory extends Serializable
- object DeltaHistoryManager extends DeltaLogging
Contains many utility methods that can also be executed on Spark executors.
- object DeltaLog extends DeltaLogging
- object DeltaLogFileIndex extends Serializable
- object DeltaOperations
Exhaustive list of operations that can be performed on a Delta table.
Exhaustive list of operations that can be performed on a Delta table. These operations are tracked as the first line in delta logs, and power
DESCRIBE HISTORYfor Delta tables. - object DeltaOptions extends DeltaLogging with Serializable
- object DeltaParquetFileFormat extends Serializable
- object DeltaRelation extends DeltaLogging
Matchers for dealing with a Delta table.
- object DeltaTable
Extractor Object for pulling out the table scan of a Delta table.
Extractor Object for pulling out the table scan of a Delta table. It could be a full scan or a partial scan.
- object DeltaTableIdentifier extends DeltaLogging with Serializable
Utilities for DeltaTableIdentifier.
Utilities for DeltaTableIdentifier. TODO(burak): Get rid of these utilities. DeltaCatalog should be the skinny-waist for figuring these things out.
- object DeltaTablePropertyValidationFailedSubClass
- object DeltaTableUtils extends PredicateHelper with DeltaLogging
- object DeltaTableValueFunctions
Resolve Delta specific table-value functions.
- object DeltaTableValueFunctionsShims
- object DeltaThrowableHelper
The helper object for Delta code base to pick error class template and compile the exception message.
- object DeltaThrowableHelperShims
- object DeltaTimeTravelSpec extends Serializable
- object DeltaTimeTravelSpecShims
- object DeltaUDF
Define a few templates for udfs used by Delta.
Define a few templates for udfs used by Delta. Use these templates to create
SparkUserDefinedFunctionto avoid creating new Encoders. This would save us from touchingScalaReflectionto reduce the lock contention in concurrent queries. - object DeltaViewHelper
- object DomainMetadataTableFeature extends WriterFeature
- object DomainMetadataUtils extends DeltaLogging
- object DynamicPartitionOverwriteDelta
- object EmptyCheckpointProvider extends CheckpointProvider
An implementation for CheckpointProvider which could be used to represent a scenario when checkpoint doesn't exist.
An implementation for CheckpointProvider which could be used to represent a scenario when checkpoint doesn't exist. This helps us simplify the code by making LogSegment.checkpointProvider as non-optional.
The CheckpointProvider.isEmpty method returns true for EmptyCheckpointProvider. Also version is returned as -1. For a real checkpoint, this will be returned true and version will be >= 0.
- object ExtractBaseColumn
Finds the full dot-separated path to a field and the data type of the field.
Finds the full dot-separated path to a field and the data type of the field. This unifies handling of nested and non-nested fields, and allows pattern matching on the data type.
- object GenerateRowIDs extends Rule[LogicalPlan]
This rule adds a Project on top of Delta tables that support the Row tracking table feature to provide a default generated Row ID and row commit version for rows that don't have them materialized in the data file.
- object GeneratedColumn extends DeltaLogging with AnalysisHelper
Provide utility methods to implement Generated Columns for Delta.
Provide utility methods to implement Generated Columns for Delta. Users can use the following SQL syntax to create a table with generated columns.
CREATE TABLE table_identifier( column_name column_type, column_name column_type GENERATED ALWAYS AS ( generation_expr ), ... ) USING delta [ PARTITIONED BY (partition_column_name, ...) ]This is an example:
CREATE TABLE foo( id bigint, type string, subType string GENERATED ALWAYS AS ( SUBSTRING(type FROM 0 FOR 4) ), data string, eventTime timestamp, day date GENERATED ALWAYS AS ( days(eventTime) ) USING delta PARTITIONED BY (type, day)When writing to a table, for these generated columns: - If the output is missing a generated column, we will add an expression to generate it. - If a generated column exists in the output, in other words, we will add a constraint to ensure the given value doesn't violate the generation expression.
- object GeneratedColumnsTableFeature extends LegacyWriterFeature with FeatureAutomaticallyEnabledByMetadata
- object IcebergCompat extends DeltaLogging with Serializable
Util methods to manage between IcebergCompat versions
- object IcebergCompatV1 extends IcebergCompat
Utils to validate the IcebergCompatV1 table feature, which is responsible for keeping Delta tables in valid states (see the Delta spec for full invariants, dependencies, and requirements) so that they are capable of having Delta to Iceberg metadata conversion applied to them.
Utils to validate the IcebergCompatV1 table feature, which is responsible for keeping Delta tables in valid states (see the Delta spec for full invariants, dependencies, and requirements) so that they are capable of having Delta to Iceberg metadata conversion applied to them. The IcebergCompatV1 table feature does not implement, specify, or control the actual metadata conversion; that is handled by the Delta UniForm feature.
Note that UniForm (Iceberg) depends on IcebergCompatV1, but IcebergCompatV1 does not depend on or require UniForm (Iceberg). It is perfectly valid for a Delta table to have IcebergCompatV1 enabled but UniForm (Iceberg) not enabled.
- object IcebergCompatV1TableFeature extends WriterFeature with FeatureAutomaticallyEnabledByMetadata
- object IcebergCompatV2 extends IcebergCompat
- object IcebergCompatV2TableFeature extends WriterFeature with FeatureAutomaticallyEnabledByMetadata
- case object IdMapping extends DeltaColumnMappingMode with Product with Serializable
Id Mapping uses column ID as the true identifier of a column.
Id Mapping uses column ID as the true identifier of a column. Column IDs are stored as StructField metadata in the schema and will be used when reading and writing Parquet files. The Parquet files in this mode will also have corresponding field Ids for each column in their file schema.
This mode is used for tables converted from Iceberg.
- object IdentityColumn extends DeltaLogging
Provide utility methods related to IDENTITY column support for Delta.
- object IdentityColumnsTableFeature extends LegacyWriterFeature with FeatureAutomaticallyEnabledByMetadata
- object InCommitTimestampTableFeature extends WriterFeature with FeatureAutomaticallyEnabledByMetadata with RemovableFeature
inCommitTimestamp table feature is a writer feature that makes every writer write a monotonically increasing timestamp inside the commit file.
- object InCommitTimestampUtils
- object InvariantsTableFeature extends LegacyWriterFeature with FeatureAutomaticallyEnabledByMetadata
- object IsolationLevel
- object LastCheckpointInfo extends Serializable
- object LastCheckpointV2 extends Serializable
- object LogSegment extends Serializable
- object ManagedCommitTableFeature extends ReaderWriterFeature with FeatureAutomaticallyEnabledByMetadata
Table feature to represent tables whose commits are managed by separate commit-owner
- object MaterializedRowCommitVersion extends MaterializedRowTrackingColumn
- object MaterializedRowId extends MaterializedRowTrackingColumn
- case object NameMapping extends DeltaColumnMappingMode with Product with Serializable
Name Mapping uses the physical column name as the true identifier of a column.
Name Mapping uses the physical column name as the true identifier of a column. The physical name is stored as part of StructField metadata in the schema and will be used when reading and writing Parquet files. Even if id mapping can be used for reading the physical files, name mapping is used for reading statistics and partition values in the DeltaLog.
- case object NoMapping extends DeltaColumnMappingMode with Product with Serializable
No mapping mode uses a column's display name as its true identifier to read and write data.
No mapping mode uses a column's display name as its true identifier to read and write data.
This is the default mode and is the same mode as Delta always has been.
- object NoOpTransactionExecutionObserver extends TransactionExecutionObserver
Default observer does nothing.
- object OptimisticTransaction
- object OptimizablePartitionExpression
- object OverwriteDelta
- object RelationFileIndex
Extractor Object for pulling out the file index of a logical relation.
- object RequireColumnMapping extends RequiredDeltaTableProperty[DeltaColumnMappingMode]
- object ResolveDeltaMergeInto
Implements logic to resolve conditions and actions in MERGE clauses and handles schema evolution.
- object ResolveDeltaPathTable extends Serializable
- object RowCommitVersion
- object RowId
Collection of helpers to handle Row IDs.
Collection of helpers to handle Row IDs.
This file includes the following Row ID features: - Enabling Row IDs using table feature and table property. - Assigning fresh Row IDs. - Reading back Row IDs. - Preserving stable Row IDs.
- object RowTracking
Utility functions for Row Tracking that are shared between Row IDs and Row Commit Versions.
- object RowTrackingFeature extends WriterFeature with FeatureAutomaticallyEnabledByMetadata
- object ScanWithDeletionVectors
- case object Serializable extends IsolationLevel with Product with Serializable
This isolation level will ensure serializability between all read and write operations.
This isolation level will ensure serializability between all read and write operations. Specifically, for write operations, this mode will ensure that the result of the table will be perfectly consistent with the visible history of operations, that is, as if all the operations were executed sequentially one by one.
- object SerializableFileStatus extends Serializable
- object Snapshot extends DeltaLogging
- case object SnapshotIsolation extends IsolationLevel with Product with Serializable
This isolation level will ensure that all reads will see a consistent snapshot of the table and any transactional write will successfully commit only if the values updated by the transaction have not been changed externally since the snapshot was read by the transaction.
This isolation level will ensure that all reads will see a consistent snapshot of the table and any transactional write will successfully commit only if the values updated by the transaction have not been changed externally since the snapshot was read by the transaction.
This provides a lower consistency guarantee than WriteSerializable but a higher availability than that. For example, unlike WriteSerializable, this level allows two concurrent UPDATE operations reading the same data to be committed successfully as long as they don't modify the same data.
Note that for operations that do not modify data in the table, Snapshot isolation is same as Serializablity. Hence such operations can be safely committed with Snapshot isolation level.
- object SnapshotManagement
- case object StartingVersionLatest extends DeltaStartingVersion with Product with Serializable
- object SupportedGenerationExpressions
This class defines the list of expressions that can be used in a generated column.
- object TableFeature extends Serializable
- object TestFeatureWithDependency extends ReaderWriterFeature with FeatureAutomaticallyEnabledByMetadata
- object TestFeatureWithTransitiveDependency extends ReaderWriterFeature
- object TestLegacyReaderWriterFeature extends LegacyReaderWriterFeature
- object TestLegacyWriterFeature extends LegacyWriterFeature
Features below are for testing only, and are being registered to the system only in the testing environment.
Features below are for testing only, and are being registered to the system only in the testing environment. See TableFeature.allSupportedFeaturesMap for the registration.
- object TestReaderWriterFeature extends ReaderWriterFeature
- object TestReaderWriterMetadataAutoUpdateFeature extends ReaderWriterFeature with FeatureAutomaticallyEnabledByMetadata
- object TestReaderWriterMetadataNoAutoUpdateFeature extends ReaderWriterFeature with FeatureAutomaticallyEnabledByMetadata
- object TestRemovableLegacyReaderWriterFeature extends LegacyReaderWriterFeature with FeatureAutomaticallyEnabledByMetadata with RemovableFeature
- object TestRemovableLegacyWriterFeature extends LegacyWriterFeature with FeatureAutomaticallyEnabledByMetadata with RemovableFeature
- object TestWriterFeature extends WriterFeature
- object TestWriterFeatureWithTransitiveDependency extends WriterFeature
- object TestWriterMetadataNoAutoUpdateFeature extends WriterFeature with FeatureAutomaticallyEnabledByMetadata
- object TimestampNTZTableFeature extends ReaderWriterFeature with FeatureAutomaticallyEnabledByMetadata
- object TransactionExecutionObserver
- object TypeWidening
- object TypeWideningTableFeature extends ReaderWriterFeature with FeatureAutomaticallyEnabledByMetadata with RemovableFeature
- object UniversalFormat extends DeltaLogging
Utils to validate the Universal Format (UniForm) Delta feature (NOT a table feature).
Utils to validate the Universal Format (UniForm) Delta feature (NOT a table feature).
The UniForm Delta feature governs and implements the actual conversion of Delta metadata into other formats.
Currently, UniForm only supports Iceberg. When
delta.universalFormat.enabledFormatscontains "iceberg", we say that Universal Format (Iceberg) is enabled.enforceInvariantsAndDependencies ensures that all of UniForm's requirements for the specified format are met (e.g. for 'iceberg' that IcebergCompatV1 or V2 is enabled). It doesn't verify that its nested requirements are met (e.g. IcebergCompat's requirements, like Column Mapping). That is the responsibility of format-specific validations such as IcebergCompatV1.enforceInvariantsAndDependencies and IcebergCompatV2.enforceInvariantsAndDependencies.
Note that UniForm (Iceberg) depends on IcebergCompat, but IcebergCompat does not depend on or require UniForm (Iceberg). It is perfectly valid for a Delta table to have IcebergCompatV1 or V2 enabled but UniForm (Iceberg) not enabled.
- object UnresolvedDeltaPathOrIdentifier
A helper object with an apply method to transform a path or table identifier to a LogicalPlan.
A helper object with an apply method to transform a path or table identifier to a LogicalPlan. If the path is set, it will be resolved to an UnresolvedPathBasedDeltaTable whereas if the tableIdentifier is set, the LogicalPlan will be an UnresolvedTable. If neither of the two options or both of them are set, apply will throw an exception.
- object UnresolvedPathOrIdentifier
A helper object with an apply method to transform a path or table identifier to a LogicalPlan.
A helper object with an apply method to transform a path or table identifier to a LogicalPlan. This is required by Delta commands that can also run against non-Delta tables, e.g. DESC DETAIL, VACUUM command. If the tableIdentifier is set, the LogicalPlan will be an UnresolvedTable. If the tableIdentifier is not set but the path is set, it will be resolved to an UnresolvedPathBasedTable since we can not tell if the path is for delta table or non delta table at this stage. If neither of the two are set, throws an exception.
- object V2Checkpoint
- object V2CheckpointProvider extends Serializable
- object V2CheckpointTableFeature extends ReaderWriterFeature with RemovableFeature with FeatureAutomaticallyEnabledByMetadata
V2 Checkpoint table feature is for checkpoints with sidecars and the new format and file naming scheme.
- object VacuumProtocolCheckTableFeature extends ReaderWriterFeature
A ReaderWriter table feature for VACUUM.
A ReaderWriter table feature for VACUUM. If this feature is enabled: A writer should follow one of the following:
- Non-Support for Vacuum: Writers can explicitly state that they do not support VACUUM for any table, regardless of whether the Vacuum Protocol Check Table feature exists. 2. Implement Writer Protocol Check: Ensure that the VACUUM implementation includes a writer protocol check before any file deletions occur. Readers don't need to understand or change anything new; they just need to acknowledge the feature exists
- case object WriteSerializable extends IsolationLevel with Product with Serializable
This isolation level will ensure snapshot isolation consistency guarantee between write operations only.
This isolation level will ensure snapshot isolation consistency guarantee between write operations only. In other words, if only the write operations are considered, then there exists a serializable sequence between them that would produce the same result as seen in the table. However, if both read and write operations are considered, then there may not exist a serializable sequence that would explain all the observed reads.
This provides a lower consistency guarantee than Serializable but a higher availability than that. For example, unlike Serializable, this level allows an UPDATE operation to be committed even if there was a concurrent INSERT operation that has already added data that should have been read by the UPDATE. It will be as if the UPDATE was executed before the INSERT even if the former was committed after the latter. As a side effect, the visible history of operations may not be consistent with the result expected if these operations were executed sequentially one by one.