package delta
- Alphabetic
- Public
- All
Type Members
-
case class
CheckpointInstance(version: Long, numParts: Option[Int]) extends Ordered[CheckpointInstance] with Product with Serializable
A class to help with comparing checkpoints with each other, where we may have had concurrent writers that checkpoint with different number of parts.
-
case class
CheckpointMetaData(version: Long, size: Long, parts: Option[Int]) extends Product with Serializable
Records information about a checkpoint.
Records information about a checkpoint.
- version
the version of this checkpoint
- size
the number of actions in the checkpoint
- parts
the number of parts when the checkpoint has multiple parts. None if this is a singular checkpoint
- trait Checkpoints extends DeltaLogging
-
case class
CommitStats(startVersion: Long, commitVersion: Long, readVersion: Long, txnDurationMs: Long, commitDurationMs: Long, numAdd: Int, numRemove: Int, bytesNew: Long, numFilesTotal: Long, sizeInBytesTotal: Long, protocol: Protocol, info: CommitInfo, newMetadata: Option[Metadata], numAbsolutePathsInAdd: Int, numDistinctPartitionsInAdd: Int, isolationLevel: String) extends Product with Serializable
Record metrics about a successful commit.
-
class
ConcurrentAppendException extends DeltaConcurrentModificationException
Thrown when files are added that would have been read by the current transaction.
-
class
ConcurrentDeleteDeleteException extends DeltaConcurrentModificationException
Thrown when the current transaction deletes data that was deleted by a concurrent transaction.
-
class
ConcurrentDeleteReadException extends DeltaConcurrentModificationException
Thrown when the current transaction reads data that was deleted by a concurrent transaction.
-
class
ConcurrentTransactionException extends DeltaConcurrentModificationException
Thrown when concurrent transaction both attempt to update the same idempotent transaction.
-
class
ConcurrentWriteException extends DeltaConcurrentModificationException
Thrown when a concurrent transaction has written data after the current transaction read the table.
-
abstract
class
DeltaConcurrentModificationException extends ConcurrentModificationException
The basic class for all Tahoe commit conflict exceptions.
- case class DeltaConfig[T](key: String, defaultValue: String, fromString: (String) ⇒ T, validationFunction: (T) ⇒ Boolean, helpMessage: String, minimumProtocolVersion: Option[Protocol] = None) extends Product with Serializable
- trait DeltaFileFormat extends AnyRef
-
class
DeltaHistoryManager extends DeltaLogging
This class keeps tracks of the version of commits and their timestamps for a Delta table to help with operations like describing the history of a table.
-
class
DeltaLog extends Checkpoints with MetadataCleanup with LogStoreProvider with SnapshotManagement with ReadChecksum
Used to query the current state of the log as well as modify it by adding new atomic collections of actions.
Used to query the current state of the log as well as modify it by adding new atomic collections of actions.
Internally, this class implements an optimistic concurrency control algorithm to handle multiple readers or writers. Any single read is guaranteed to see a consistent snapshot of the table.
-
case class
DeltaLogFileIndex extends FileIndex with Logging with Product with Serializable
A specialized file index for files found in the _delta_log directory.
A specialized file index for files found in the _delta_log directory. By using this file index, we avoid any additional file listing, partitioning inference, and file existence checks when computing the state of a Delta table.
- trait DeltaOptionParser extends AnyRef
-
class
DeltaOptions extends DeltaWriteOptions with DeltaReadOptions with Serializable
Options for the Delta data source.
- trait DeltaReadOptions extends DeltaOptionParser
-
case class
DeltaTableIdentifier(path: Option[String] = None, table: Option[TableIdentifier] = None) extends Product with Serializable
An identifier for a Delta table containing one of the path or the table identifier.
-
case class
DeltaTimeTravelSpec(timestamp: Option[Expression], version: Option[Long], creationSource: Option[String]) extends DeltaLogging with Product with Serializable
The specification to time travel a Delta Table to the given
timestamporversion.The specification to time travel a Delta Table to the given
timestamporversion.- timestamp
An expression that can be evaluated into a timestamp. The expression cannot be a subquery.
- version
The version of the table to time travel to. Must be >= 0.
- creationSource
The API used to perform time travel, e.g.
atSyntax,dfReaderor SQL
- trait DeltaWriteOptions extends DeltaWriteOptionsImpl with DeltaOptionParser
- trait DeltaWriteOptionsImpl extends DeltaOptionParser
- trait DocsPath extends AnyRef
-
class
InitialSnapshot extends Snapshot
An initial snapshot with only metadata specified.
An initial snapshot with only metadata specified. Useful for creating a DataFrame from an existing parquet table during its conversion to delta.
-
sealed
trait
IsolationLevel extends AnyRef
Trait that defines the level consistency guarantee is going to be provided by
OptimisticTransaction.commit().Trait that defines the level consistency guarantee is going to be provided by
OptimisticTransaction.commit(). Serializable is the most strict level and SnapshotIsolation is the least strict one.- See also
IsolationLevel.allLevelsInDescOrder for all the levels in the descending order of strictness and IsolationLevel.DEFAULT for the default table isolation level.
-
case class
LogSegment(version: Long, deltas: Seq[FileStatus], checkpoint: Seq[FileStatus], checkpointVersion: Option[Long], lastCommitTimestamp: Long) extends Product with Serializable
Provides information around which files in the transaction log need to be read to create the given version of the log.
Provides information around which files in the transaction log need to be read to create the given version of the log.
- version
The Snapshot version to generate
- deltas
The delta files to read
- checkpoint
The checkpoint file to read
- checkpointVersion
The checkpoint version used to start replay
- lastCommitTimestamp
The "unadjusted" timestamp of the last commit within this segment. By unadjusted, we mean that the commit timestamps may not necessarily be monotonically increasing for the commits within this segment.
-
class
MetadataChangedException extends DeltaConcurrentModificationException
Thrown when the metadata of the Delta table has changed between the time of read and the time of commit.
-
trait
MetadataCleanup extends DeltaLogging
Cleans up expired Delta table metadata.
-
class
MetadataMismatchErrorBuilder extends AnyRef
A helper class in building a helpful error message in case of metadata mismatches.
-
class
OptimisticTransaction extends OptimisticTransactionImpl with DeltaLogging
Used to perform a set of reads in a transaction and then commit a set of updates to the state of the log.
Used to perform a set of reads in a transaction and then commit a set of updates to the state of the log. All reads from the DeltaLog, MUST go through this instance rather than directly to the DeltaLog otherwise they will not be check for logical conflicts with concurrent updates.
This class is not thread-safe.
-
trait
OptimisticTransactionImpl extends TransactionalWrite with SQLMetricsReporting with DeltaLogging
Used to perform a set of reads in a transaction and then commit a set of updates to the state of the log.
Used to perform a set of reads in a transaction and then commit a set of updates to the state of the log. All reads from the DeltaLog, MUST go through this instance rather than directly to the DeltaLog otherwise they will not be check for logical conflicts with concurrent updates.
This trait is not thread-safe.
- trait PartitionFiltering extends AnyRef
- case class PreprocessTableMerge(conf: SQLConf) extends UpdateExpressionsSupport with Product with Serializable
- case class PreprocessTableUpdate(conf: SQLConf) extends UpdateExpressionsSupport with Product with Serializable
-
class
ProtocolChangedException extends DeltaConcurrentModificationException
Thrown when the protocol version has changed between the time of read and the time of commit.
-
trait
ReadChecksum extends DeltaLogging
Read checksum files.
-
trait
RecordChecksum extends DeltaLogging
Record the state of the table as a checksum file along with a commit.
-
class
Snapshot extends StateCache with PartitionFiltering with DeltaFileFormat with DeltaLogging
An immutable snapshot of the state of the log at some delta version.
An immutable snapshot of the state of the log at some delta version. Internally this class manages the replay of actions stored in checkpoint or delta files.
After resolving any new actions, it caches the result and collects the following basic information to the driver:
- Protocol Version
- Metadata
- Transaction state
-
trait
SnapshotManagement extends AnyRef
Manages the creation, computation, and access of Snapshot's for Delta tables.
Manages the creation, computation, and access of Snapshot's for Delta tables. Responsibilities include:
- Figuring out the set of files that are required to compute a specific version of a table
- Updating and exposing the latest snapshot of the Delta table in a thread-safe manner
-
trait
UpdateExpressionsSupport extends CastSupport
Trait with helper functions to generate expressions to update target columns, even if they are nested fields.
-
trait
ValidateChecksum extends DeltaLogging
Verify the state of the table using the checksum information.
-
case class
VersionChecksum(tableSizeBytes: Long, numFiles: Long, numMetadata: Long, numProtocol: Long, numTransactions: Long) extends Product with Serializable
Stats calculated within a snapshot, which we store along individual transactions for verification.
Stats calculated within a snapshot, which we store along individual transactions for verification.
- tableSizeBytes
The size of the table in bytes
- numFiles
Number of
AddFileactions in the snapshot- numMetadata
Number of
Metadataactions in the snapshot- numProtocol
Number of
Protocolactions in the snapshot- numTransactions
Number of
SetTransactionactions in the snapshot
Value Members
- object CheckpointInstance extends Serializable
- object Checkpoints
-
object
DeltaConfigs extends DeltaLogging
Contains list of reservoir configs and validation checks.
-
object
DeltaErrors extends DocsPath with DeltaLogging
A holder object for Delta errors.
A holder object for Delta errors.
IMPORTANT: Any time you add a test that references the docs, add to the Seq defined in DeltaErrorsSuite so that the doc links that are generated can be verified to work in Azure, docs.databricks.com and docs.delta.io
-
object
DeltaFullTable
Extractor Object for pulling out the full table scan of a Delta table.
-
object
DeltaHistoryManager extends DeltaLogging
Contains many utility methods that can also be executed on Spark executors.
- object DeltaLog extends DeltaLogging
- object DeltaLogFileIndex extends Serializable
-
object
DeltaOperations
Exhaustive list of operations that can be performed on a Delta table.
Exhaustive list of operations that can be performed on a Delta table. These operations are tracked as the first line in delta logs, and power
DESCRIBE HISTORYfor Delta tables. - object DeltaOptions extends DeltaLogging with Serializable
-
object
DeltaTable
Extractor Object for pulling out the table scan of a Delta table.
Extractor Object for pulling out the table scan of a Delta table. It could be a full scan or a partial scan.
-
object
DeltaTableIdentifier extends Serializable
Utilities for DeltaTableIdentifier.
- object DeltaTableUtils extends PredicateHelper with DeltaLogging
- object DeltaTimeTravelSpec extends Serializable
- object IsolationLevel
- object LogSegment extends Serializable
- object OptimisticTransaction
-
object
Serializable extends IsolationLevel with Product with Serializable
This isolation level will ensure serializability between all read and write operations.
This isolation level will ensure serializability between all read and write operations. Specifically, for write operations, this mode will ensure that the result of the table will be perfectly consistent with the visible history of operations, that is, as if all the operations were executed sequentially one by one.
- object Snapshot extends DeltaLogging
-
object
SnapshotIsolation extends IsolationLevel with Product with Serializable
This isolation level will ensure that all reads will see a consistent snapshot of the table and any transactional write will successfully commit only if the values updated by the transaction have not been changed externally since the snapshot was read by the transaction.
This isolation level will ensure that all reads will see a consistent snapshot of the table and any transactional write will successfully commit only if the values updated by the transaction have not been changed externally since the snapshot was read by the transaction.
This provides a lower consistency guarantee than WriteSerializable but a higher availability than that. For example, unlike WriteSerializable, this level allows two concurrent UPDATE operations reading the same data to be committed successfully as long as they don't modify the same data.
Note that for operations that do not modify data in the table, Snapshot isolation is same as Serializablity. Hence such operations can be safely committed with Snapshot isolation level.
- object SnapshotManagement
-
object
WriteSerializable extends IsolationLevel with Product with Serializable
This isolation level will ensure snapshot isolation consistency guarantee between write operations only.
This isolation level will ensure snapshot isolation consistency guarantee between write operations only. In other words, if only the write operations are considered, then there exists a serializable sequence between them that would produce the same result as seen in the table. However, if both read and write operations are considered, then there may not exist a serializable sequence that would explain all the observed reads.
This provides a lower consistency guarantee than Serializable but a higher availability than that. For example, unlike Serializable, this level allows an UPDATE operation to be committed even if there was a concurrent INSERT operation that has already added data that should have been read by the UPDATE. It will be as if the UPDATE was executed before the INSERT even if the former was committed after the latter. As a side effect, the visible history of operations may not be consistent with the result expected if these operations were executed sequentially one by one.