class DeltaLog extends Checkpoints with MetadataCleanup with LogStoreProvider with SnapshotManagement with DeltaFileFormat with ProvidesUniFormConverters with ReadChecksum
Used to query the current state of the log as well as modify it by adding new atomic collections of actions.
Internally, this class implements an optimistic concurrency control algorithm to handle multiple readers or writers. Any single read is guaranteed to see a consistent snapshot of the table.
- Alphabetic
- By Inheritance
- DeltaLog
- ReadChecksum
- ProvidesUniFormConverters
- DeltaFileFormat
- SnapshotManagement
- LogStoreProvider
- MetadataCleanup
- Checkpoints
- DeltaLogging
- DatabricksLogging
- DeltaProgressReporter
- LoggingShims
- Logging
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Type Members
- implicit class LogStringContext extends AnyRef
- Definition Classes
- LoggingShims
- class SidecarDeletionMetrics extends AnyRef
Class to track metrics related to V2 Checkpoint Sidecars deletion.
Class to track metrics related to V2 Checkpoint Sidecars deletion.
- Attributes
- protected
- Definition Classes
- MetadataCleanup
- class V2CompatCheckpointMetrics extends AnyRef
Class to track metrics related to V2 Compatibility checkpoint creation.
Class to track metrics related to V2 Compatibility checkpoint creation.
- Attributes
- protected[delta]
- Definition Classes
- MetadataCleanup
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- val LAST_CHECKPOINT: Path
The path to the file that holds metadata about the most recent checkpoint.
The path to the file that holds metadata about the most recent checkpoint.
- Definition Classes
- Checkpoints
- lazy val _hudiConverter: UniversalFormatConverter
- Attributes
- protected
- Definition Classes
- ProvidesUniFormConverters
- lazy val _icebergConverter: UniversalFormatConverter
Helper trait to instantiate the icebergConverter member variable of the DeltaLog.
Helper trait to instantiate the icebergConverter member variable of the DeltaLog. We do this through reflection so that delta-spark doesn't have a compile-time dependency on the shaded iceberg module.
- Attributes
- protected
- Definition Classes
- ProvidesUniFormConverters
- val allOptions: Map[String, String]
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def assertTableFeaturesMatchMetadata(targetProtocol: Protocol, targetMetadata: Metadata): Unit
Asserts that the table's protocol enabled all features that are active in the metadata.
Asserts that the table's protocol enabled all features that are active in the metadata.
A mismatch shouldn't happen when the table has gone through a proper write process because we require all active features during writes. However, other clients may void this guarantee.
- def buildHadoopFsRelationWithFileIndex(snapshot: SnapshotDescriptor, fileIndex: TahoeFileIndex, bucketSpec: Option[BucketSpec], dropNullTypeColumnsFromSchema: Boolean = true): HadoopFsRelation
- def checkLogStoreConfConflicts(sparkConf: SparkConf): Unit
- Definition Classes
- LogStoreProvider
- def checkRequiredConfigurations(): Unit
Verify the required Spark conf for delta Throw
DeltaErrors.configureSparkSessionWithExtensionAndCatalogexception ifspark.sql.catalog.spark_catalogconfig is missing.Verify the required Spark conf for delta Throw
DeltaErrors.configureSparkSessionWithExtensionAndCatalogexception ifspark.sql.catalog.spark_catalogconfig is missing. We do not check forspark.sql.extensionsbecause DeltaSparkSessionExtension can alternatively be activated using the.withExtension()API. This check can be disabled by setting DELTA_CHECK_REQUIRED_SPARK_CONF to false.- Attributes
- protected
- def checkpoint(snapshotToCheckpoint: Snapshot, catalogTableOpt: Option[CatalogTable] = None): Unit
Creates a checkpoint using snapshotToCheckpoint.
Creates a checkpoint using snapshotToCheckpoint. By default it uses the current log version. Note that this function captures and logs all exceptions, since the checkpoint shouldn't fail the overall commit operation.
- Definition Classes
- Checkpoints
- def checkpointAndCleanUpDeltaLog(snapshotToCheckpoint: Snapshot, catalogTableOpt: Option[CatalogTable] = None): Unit
- Definition Classes
- Checkpoints
- def checkpointInterval(metadata: Metadata): Int
Returns the checkpoint interval for this log.
Returns the checkpoint interval for this log. Not transactional.
- Definition Classes
- Checkpoints
- val clock: Clock
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- def createCheckpointAtVersion(version: Long): Unit
Creates a checkpoint at given version.
Creates a checkpoint at given version. Does not invoke metadata cleanup as part of it.
- version
- version at which we want to create a checkpoint.
- Definition Classes
- Checkpoints
- def createDataFrame(snapshot: SnapshotDescriptor, addFiles: Seq[AddFile], isStreaming: Boolean = false, actionTypeOpt: Option[String] = None): DataFrame
Returns a org.apache.spark.sql.DataFrame containing the new files within the specified version range.
- def createLogDirectoriesIfNotExists(): Unit
Creates the log directory and commit directory if it does not exist.
- def createLogSegment(versionToLoad: Option[Long] = None, oldCheckpointProviderOpt: Option[UninitializedCheckpointProvider] = None, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient] = None, catalogTableOpt: Option[CatalogTable] = None, lastCheckpointInfo: Option[LastCheckpointInfo] = None): Option[LogSegment]
Get a list of files that can be used to compute a Snapshot at version
versionToLoad, IfversionToLoadis not provided, will generate the list of files that are needed to load the latest version of the Delta table.Get a list of files that can be used to compute a Snapshot at version
versionToLoad, IfversionToLoadis not provided, will generate the list of files that are needed to load the latest version of the Delta table. This method also performs checks to ensure that the delta files are contiguous.- versionToLoad
A specific version to load. Typically used with time travel and the Delta streaming source. If not provided, we will try to load the latest version of the table.
- oldCheckpointProviderOpt
The CheckpointProvider from the previous snapshot. This is used as a start version for the listing when
startCheckpointis unavailable. This is also used to initialize the LogSegment.- tableCommitCoordinatorClientOpt
the optional commit-coordinator client to use for fetching un-backfilled commits.
- catalogTableOpt
the optional catalog table to pass to the commit coordinator client.
- lastCheckpointInfo
LastCheckpointInfo from the _last_checkpoint. This could be used to initialize the Snapshot's LogSegment.
- returns
Some LogSegment to build a Snapshot if files do exist after the given startCheckpoint. None, if the directory was missing or empty.
- Attributes
- protected
- Definition Classes
- SnapshotManagement
- def createLogStore(sparkConf: SparkConf, hadoopConf: Configuration): LogStore
- Definition Classes
- LogStoreProvider
- def createLogStore(spark: SparkSession): LogStore
- Definition Classes
- LogStoreProvider
- def createRelation(partitionFilters: Seq[Expression] = Nil, snapshotToUseOpt: Option[Snapshot] = None, catalogTableOpt: Option[CatalogTable] = None, isTimeTravelQuery: Boolean = false): BaseRelation
Returns a BaseRelation that contains all of the data present in the table.
Returns a BaseRelation that contains all of the data present in the table. This relation will be continually updated as files are added or removed from the table. However, new BaseRelation must be requested in order to see changes to the schema.
- def createSinglePartCheckpointForBackwardCompat(snapshotToCleanup: Snapshot, metrics: V2CompatCheckpointMetrics): Unit
Helper method to create a compatibility classic single file checkpoint file for this table.
Helper method to create a compatibility classic single file checkpoint file for this table. This is needed so that any legacy reader which do not understand V2CheckpointTableFeature could read the legacy classic checkpoint file and fail gracefully with Protocol requirement failure.
- Attributes
- protected[delta]
- Definition Classes
- MetadataCleanup
- def createSnapshot(initSegment: LogSegment, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], checksumOpt: Option[VersionChecksum]): Snapshot
- Attributes
- protected
- Definition Classes
- SnapshotManagement
- def createSnapshotAfterCommit(initSegment: LogSegment, newChecksumOpt: Option[VersionChecksum], tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], committedVersion: Long): Snapshot
Creates a snapshot for a new delta commit.
Creates a snapshot for a new delta commit.
- Attributes
- protected
- Definition Classes
- SnapshotManagement
- def createSnapshotAtInit(initialCatalogTable: Option[CatalogTable]): Unit
Load the Snapshot for this Delta table at initialization.
Load the Snapshot for this Delta table at initialization. This method uses the
lastCheckpointfile as a hint on where to start listing the transaction log directory. If the _delta_log directory doesn't exist, this method will return anInitialSnapshot.- Attributes
- protected
- Definition Classes
- SnapshotManagement
- def createSnapshotFromGivenOrEquivalentLogSegment(initSegment: LogSegment, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable])(snapshotCreator: (LogSegment) => Snapshot): Snapshot
Create a Snapshot from the given LogSegment.
Create a Snapshot from the given LogSegment. If failing to create the snapshot, we will search an equivalent LogSegment using a different checkpoint and retry up to DeltaSQLConf.DELTA_SNAPSHOT_LOADING_MAX_RETRIES times.
- Attributes
- protected
- Definition Classes
- SnapshotManagement
- val currentSnapshot: CapturedSnapshot
Cached latest snapshot.
Cached latest snapshot. This is initialized in
createSnapshotAtInit- Attributes
- protected
- Definition Classes
- SnapshotManagement
- Annotations
- @volatile()
- val dataPath: Path
- Definition Classes
- DeltaLog → Checkpoints
- val defaultLogStoreClass: String
- Definition Classes
- LogStoreProvider
- def deltaAssert(check: => Boolean, name: String, msg: String, deltaLog: DeltaLog = null, data: AnyRef = null, path: Option[Path] = None): Unit
Helper method to check invariants in Delta code.
Helper method to check invariants in Delta code. Fails when running in tests, records a delta assertion event and logs a warning otherwise.
- Attributes
- protected
- Definition Classes
- DeltaLogging
- def deltaRetentionMillis(metadata: Metadata): Long
Returns the duration in millis for how long to keep around obsolete logs.
Returns the duration in millis for how long to keep around obsolete logs. We may keep logs beyond this duration until the next calendar day to avoid constantly creating checkpoints.
- Definition Classes
- MetadataCleanup
- def doLogCleanup(snapshotToCleanup: Snapshot): Unit
- Definition Classes
- MetadataCleanup
- def enableExpiredLogCleanup(metadata: Metadata): Boolean
Whether to clean up expired log files and checkpoints.
Whether to clean up expired log files and checkpoints.
- Definition Classes
- MetadataCleanup
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def fileFormat(protocol: Protocol, metadata: Metadata): FileFormat
Build the underlying Spark
FileFormatof the Delta table with specified metadata.Build the underlying Spark
FileFormatof the Delta table with specified metadata.With column mapping, some properties of the underlying file format might change during transaction, so if possible, we should always pass in the latest transaction's metadata instead of one from a past snapshot.
- Definition Classes
- DeltaFileFormat
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- def findEarliestReliableCheckpoint: Option[Long]
Finds a checkpoint such that we are able to construct table snapshot for all versions at or greater than the checkpoint version returned.
Finds a checkpoint such that we are able to construct table snapshot for all versions at or greater than the checkpoint version returned.
- Definition Classes
- MetadataCleanup
- def getChangeLogFiles(startVersion: Long, failOnDataLoss: Boolean = false): Iterator[(Long, FileStatus)]
Get access to all actions starting from "startVersion" (inclusive) via FileStatus.
Get access to all actions starting from "startVersion" (inclusive) via FileStatus. If
startVersiondoesn't exist, return an empty Iterator. Callers are encouraged to use the other override which takes the endVersion if available to avoid I/O and improve performance of this method. - def getChanges(startVersion: Long, failOnDataLoss: Boolean = false): Iterator[(Long, Seq[Action])]
Get all actions starting from "startVersion" (inclusive).
Get all actions starting from "startVersion" (inclusive). If
startVersiondoesn't exist, return an empty Iterator. Callers are encouraged to use the other override which takes the endVersion if available to avoid I/O and improve performance of this method. - def getCheckpointVersion(lastCheckpointInfoOpt: Option[LastCheckpointInfo], oldCheckpointProviderOpt: Option[UninitializedCheckpointProvider]): Long
Returns the last known checkpoint version based on LastCheckpointInfo or CheckpointProvider.
Returns the last known checkpoint version based on LastCheckpointInfo or CheckpointProvider. Returns -1 if both the info is not available.
- Attributes
- protected
- Definition Classes
- SnapshotManagement
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def getCommonTags(deltaLog: DeltaLog, tahoeId: String): Map[TagDefinition, String]
- Definition Classes
- DeltaLogging
- def getDeltaFileOrCheckpointVersion(filePath: Path): Long
Helper function for getting the version of a checkpoint or a commit.
Helper function for getting the version of a checkpoint or a commit.
- Definition Classes
- MetadataCleanup
- def getErrorData(e: Throwable): Map[String, Any]
- Definition Classes
- DeltaLogging
- def getLatestCompleteCheckpointFromList(instances: Array[CheckpointInstance], notLaterThanVersion: Option[Long] = None): Option[CheckpointInstance]
Given a list of checkpoint files, pick the latest complete checkpoint instance which is not later than
notLaterThan.Given a list of checkpoint files, pick the latest complete checkpoint instance which is not later than
notLaterThan.- Attributes
- protected[delta]
- Definition Classes
- Checkpoints
- def getLogSegmentAfterCommit(tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], oldCheckpointProvider: UninitializedCheckpointProvider): LogSegment
- Attributes
- protected[delta]
- Definition Classes
- SnapshotManagement
- def getLogSegmentAfterCommit(committedVersion: Long, newChecksumOpt: Option[VersionChecksum], preCommitLogSegment: LogSegment, commit: Commit, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], oldCheckpointProvider: CheckpointProvider): LogSegment
Used to compute the LogSegment after a commit, by adding the delta file with the specified version to the preCommitLogSegment (which must match the immediately preceding version).
Used to compute the LogSegment after a commit, by adding the delta file with the specified version to the preCommitLogSegment (which must match the immediately preceding version).
- Attributes
- protected[delta]
- Definition Classes
- SnapshotManagement
- def getLogSegmentForVersion(versionToLoad: Option[Long], files: Option[Array[FileStatus]], validateLogSegmentWithoutCompactedDeltas: Boolean, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], oldCheckpointProviderOpt: Option[UninitializedCheckpointProvider], lastCheckpointInfo: Option[LastCheckpointInfo]): Option[LogSegment]
Helper function for the getLogSegmentForVersion above.
Helper function for the getLogSegmentForVersion above. Called with a provided files list, and will then try to construct a new LogSegment using that. *Note*: If table is a coordinated-commits table, the commit-coordinator MUST be passed to correctly list the commits.
- Attributes
- protected
- Definition Classes
- SnapshotManagement
- def getLogStoreConfValue(key: String, sparkConf: SparkConf): Option[String]
We accept keys both with and without the
spark.prefix to maintain compatibility across the Delta ecosystemWe accept keys both with and without the
spark.prefix to maintain compatibility across the Delta ecosystem- key
the spark-prefixed key to access
- Definition Classes
- LogStoreProvider
- def getSnapshotAt(version: Long, lastCheckpointHint: Option[CheckpointInstance] = None, catalogTableOpt: Option[CatalogTable] = None): Snapshot
Get the snapshot at
version.Get the snapshot at
version.- Definition Classes
- SnapshotManagement
- def getSnapshotForLogSegmentInternal(previousSnapshotOpt: Option[Snapshot], segmentOpt: Option[LogSegment], tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], isAsync: Boolean): Snapshot
Creates a Snapshot for the given
segmentOptCreates a Snapshot for the given
segmentOpt- Attributes
- protected
- Definition Classes
- SnapshotManagement
- def getUpdatedLogSegment(oldLogSegment: LogSegment, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable]): (LogSegment, Seq[FileStatus])
Get the newest logSegment, using the previous logSegment as a hint.
Get the newest logSegment, using the previous logSegment as a hint. This is faster than doing a full update, but it won't work if the table's log directory was replaced.
- Definition Classes
- SnapshotManagement
- def getUpdatedSnapshot(oldSnapshotOpt: Option[Snapshot], initialSegmentForNewSnapshot: Option[LogSegment], initialTableCommitCoordinatorClient: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], isAsync: Boolean): Snapshot
Updates and installs a new snapshot in the
currentSnapshot.Updates and installs a new snapshot in the
currentSnapshot. This method takes care of recursively creating new snapshots if the commit-coordinator has changed.- oldSnapshotOpt
The previous snapshot, if any.
- initialSegmentForNewSnapshot
the log segment constructed for the new snapshot
- initialTableCommitCoordinatorClient
the commit-coordinator used for constructing the
initialSegmentForNewSnapshot- catalogTableOpt
the optional catalog table to pass to the commit coordinator client.
- isAsync
Whether the update is async.
- returns
The new snapshot.
- Attributes
- protected
- Definition Classes
- SnapshotManagement
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- lazy val history: DeltaHistoryManager
Delta History Manager containing version and commit history.
- def hudiConverter: UniversalFormatConverter
- Definition Classes
- ProvidesUniFormConverters
- def icebergConverter: UniversalFormatConverter
- Definition Classes
- ProvidesUniFormConverters
- def identifyAndDeleteUnreferencedSidecarFiles(snapshotToCleanup: Snapshot, checkpointRetention: Long, metrics: SidecarDeletionMetrics): Unit
Deletes any unreferenced files from the sidecar directory
_delta_log/_sidecarDeletes any unreferenced files from the sidecar directory
_delta_log/_sidecar- Attributes
- protected
- Definition Classes
- MetadataCleanup
- def indexToRelation(index: DeltaLogFileIndex, schema: StructType = Action.logSchema): LogicalRelation
Creates a LogicalRelation for a given DeltaLogFileIndex, with all necessary file source options taken from the Delta Log.
Creates a LogicalRelation for a given DeltaLogFileIndex, with all necessary file source options taken from the Delta Log. All reads of Delta metadata files should use this method.
- def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def installSnapshot(newSnapshot: Snapshot, updateTimestamp: Long): Snapshot
Installs the given
newSnapshotas thecurrentSnapshotInstalls the given
newSnapshotas thecurrentSnapshot- Attributes
- protected
- Definition Classes
- SnapshotManagement
- def isCurrentlyStale: (Long) => Boolean
Checks if the given timestamp is outside the current staleness window
Checks if the given timestamp is outside the current staleness window
- Attributes
- protected
- Definition Classes
- SnapshotManagement
- def isDeltaCommitOrCheckpointFile(path: Path): Boolean
Returns true if the path is delta log files.
Returns true if the path is delta log files. Delta log files can be delta commit file (e.g., 000000000.json), or checkpoint file. (e.g., 000000001.checkpoint.00001.00003.parquet)
- path
Path of a file
- returns
Boolean Whether the file is delta log files
- Attributes
- protected
- Definition Classes
- SnapshotManagement
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def isSameLogAs(otherLog: DeltaLog): Boolean
- def isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- val lastSeenChecksumFileStatusOpt: Option[FileStatus]
Cached fileStatus for the latest CRC file seen in the deltaLog.
Cached fileStatus for the latest CRC file seen in the deltaLog.
- Attributes
- protected
- Definition Classes
- SnapshotManagement
- Annotations
- @volatile()
- final def listDeltaCompactedDeltaAndCheckpointFiles(startVersion: Long, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], versionToLoad: Option[Long], includeMinorCompactions: Boolean): Option[Array[FileStatus]]
This method is designed to efficiently and reliably list delta, compacted delta, and checkpoint files associated with a Delta Lake table.
This method is designed to efficiently and reliably list delta, compacted delta, and checkpoint files associated with a Delta Lake table. It makes parallel calls to both the file-system and a commit-coordinator (if available), reconciles the results to account for asynchronous backfill operations, and ensures a comprehensive list of file statuses without missing any concurrently backfilled files. *Note*: If table is a coordinated-commits table, the commit-coordinator client MUST be passed to correctly list the commits.
- startVersion
the version to start. Inclusive.
- tableCommitCoordinatorClientOpt
the optional commit-coordinator client to use for fetching un-backfilled commits.
- catalogTableOpt
the optional catalog table to pass to the commit coordinator client.
- versionToLoad
the optional parameter to set the max version we should return. Inclusive.
- includeMinorCompactions
Whether to include minor compaction files in the result
- returns
Some array of files found (possibly empty, if no usable commit files are present), or None if the listing returned no files at all.
- Attributes
- protected
- Definition Classes
- SnapshotManagement
- def listDeltaCompactedDeltaCheckpointFilesAndLatestChecksumFile(startVersion: Long, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], versionToLoad: Option[Long], includeMinorCompactions: Boolean): (Option[Array[FileStatus]], Option[FileStatus])
This method is designed to efficiently and reliably list delta, compacted delta, and checkpoint files associated with a Delta Lake table.
This method is designed to efficiently and reliably list delta, compacted delta, and checkpoint files associated with a Delta Lake table. It makes parallel calls to both the file-system and a commit-coordinator (if available), reconciles the results to account for asynchronous backfill operations, and ensures a comprehensive list of file statuses without missing any concurrently backfilled files. *Note*: If table is a coordinated-commits table, the commit coordinator MUST be passed to correctly list the commits. The function also collects the latest checksum file found in the listings and returns it.
- startVersion
the version to start. Inclusive.
- tableCommitCoordinatorClientOpt
the optional commit coordinator to use for fetching un-backfilled commits.
- catalogTableOpt
the optional catalog table to pass to the commit coordinator client.
- versionToLoad
the optional parameter to set the max version we should return. Inclusive.
- includeMinorCompactions
Whether to include minor compaction files in the result
- returns
A tuple where the first element is an array of log files (possibly empty, if no usable log files are found), and the second element is the latest checksum file found which has a version less than or equal to
versionToLoad.
- Attributes
- protected
- Definition Classes
- SnapshotManagement
- def listFromOrNone(startVersion: Long): Option[Iterator[FileStatus]]
Returns an iterator containing a list of files found from the provided path
Returns an iterator containing a list of files found from the provided path
- Attributes
- protected
- Definition Classes
- SnapshotManagement
- def loadIndex(index: DeltaLogFileIndex, schema: StructType = Action.logSchema): DataFrame
Load the data using the FileIndex.
Load the data using the FileIndex. This allows us to skip many checks that add overhead, e.g. file existence checks, partitioning schema inference.
- def loadMetadataFromFile(tries: Int): Option[LastCheckpointInfo]
Loads the checkpoint metadata from the _last_checkpoint file.
Loads the checkpoint metadata from the _last_checkpoint file.
- Attributes
- protected
- Definition Classes
- Checkpoints
- def log: Logger
- Attributes
- protected
- Definition Classes
- Logging
- def logConsole(line: String): Unit
- Definition Classes
- DatabricksLogging
- def logDebug(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logDebug(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logDebug(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logError(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logError(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logInfo(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logInfo(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logName: String
- Attributes
- protected
- Definition Classes
- Logging
- val logPath: Path
- Definition Classes
- DeltaLog → ReadChecksum → Checkpoints
- val logStoreClassConfKey: String
- Definition Classes
- LogStoreProvider
- def logStoreSchemeConfKey(scheme: String): String
- Definition Classes
- LogStoreProvider
- def logTrace(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logTrace(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logTrace(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logWarning(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logWarning(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def manuallyLoadCheckpoint(cv: CheckpointInstance): LastCheckpointInfo
Loads the given checkpoint manually to come up with the LastCheckpointInfo
Loads the given checkpoint manually to come up with the LastCheckpointInfo
- Attributes
- protected
- Definition Classes
- Checkpoints
- def maxSnapshotLineageLength: Int
The max lineage length of a Snapshot before Delta forces to build a Snapshot from scratch.
The max lineage length of a Snapshot before Delta forces to build a Snapshot from scratch. Delta will build a Snapshot on top of the previous one if it doesn't see a checkpoint. However, there is a race condition that when two writers are writing at the same time, a writer may fail to pick up checkpoints written by another one, and the lineage will grow and finally cause StackOverflowError. Hence we have to force to build a Snapshot from scratch when the lineage length is too large to avoid hitting StackOverflowError.
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def newDeltaHadoopConf(): Configuration
Returns the Hadoop Configuration object which can be used to access the file system.
Returns the Hadoop Configuration object which can be used to access the file system. All Delta code should use this method to create the Hadoop Configuration object, so that the hadoop file system configurations specified in DataFrame options will come into effect.
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- val options: Map[String, String]
- def protocolRead(protocol: Protocol): Unit
Asserts that the client is up to date with the protocol and allowed to read the table that is using the given
protocol. - def protocolWrite(protocol: Protocol): Unit
Asserts that the client is up to date with the protocol and allowed to write to the table that is using the given
protocol. - def recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null, path: Option[Path] = None): Unit
Used to record the occurrence of a single event or report detailed, operation specific statistics.
Used to record the occurrence of a single event or report detailed, operation specific statistics.
- path
Used to log the path of the delta table when
deltaLogis null.
- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: => A): A
Used to report the duration as well as the success or failure of an operation on a
deltaLog.Used to report the duration as well as the success or failure of an operation on a
deltaLog.- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordDeltaOperationForTablePath[A](tablePath: String, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: => A): A
Used to report the duration as well as the success or failure of an operation on a
tahoePath.Used to report the duration as well as the success or failure of an operation on a
tahoePath.- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
- def recordFrameProfile[T](group: String, name: String)(thunk: => T): T
- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = METRIC_OPERATION_DURATION, silent: Boolean = true)(thunk: => S): S
- Definition Classes
- DatabricksLogging
- def recordProductEvent(metric: MetricDefinition with CentralizableMetric, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
- def recordProductUsage(metric: MetricDefinition with CentralizableMetric, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
- def recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
- lazy val sidecarDirPath: Path
Path to sidecar directory.
Path to sidecar directory. This is intentionally kept
lazy valas otherwise any other constructor codepaths in DeltaLog (e.g. SnapshotManagement etc) will see it as null as they are executed before this line is called. - val snapshotLock: ReentrantLock
Use ReentrantLock to allow us to call
lockInterruptiblyUse ReentrantLock to allow us to call
lockInterruptibly- Attributes
- protected
- Definition Classes
- SnapshotManagement
- def spark: SparkSession
Return the current Spark session used.
Return the current Spark session used.
- Attributes
- protected
- Definition Classes
- DeltaLog → DeltaFileFormat
- def startTransaction(catalogTableOpt: Option[CatalogTable], snapshotOpt: Option[Snapshot] = None): OptimisticTransaction
Returns a new OptimisticTransaction that can be used to read the current state of the log and then commit updates.
Returns a new OptimisticTransaction that can be used to read the current state of the log and then commit updates. The reads and updates will be checked for logical conflicts with any concurrent writes to the log, and post-commit hooks can be used to notify the table's catalog of schema changes, etc.
Note that all reads in a transaction must go through the returned transaction object, and not directly to the DeltaLog otherwise they will not be checked for conflicts.
- catalogTableOpt
The CatalogTable for the table this transaction updates. Passing None asserts this is a path-based table with no catalog entry.
- snapshotOpt
THe Snapshot this transaction should use, if not latest.
- lazy val store: LogStore
Used to read and write physical log files and checkpoints.
Used to read and write physical log files and checkpoints.
- Definition Classes
- DeltaLog → ReadChecksum → Checkpoints
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def tableExists: Boolean
Whether a Delta table exists at this directory.
Whether a Delta table exists at this directory. It is okay to use the cached volatile snapshot here, since the worst case is that the table has recently started existing which hasn't been picked up here. If so, any subsequent command that updates the table will see the right value.
- def tableId: String
The unique identifier for this table.
- def throwNonExistentVersionError(versionToLoad: Long): Unit
- Definition Classes
- SnapshotManagement
- def toString(): String
- Definition Classes
- AnyRef → Any
- def unsafeLoadMetadataFromFile(): LastCheckpointInfo
Reads the checkpoint metadata from the
_last_checkpointfile.Reads the checkpoint metadata from the
_last_checkpointfile. This method doesn't handle any exceptions that can be thrown, for example IOExceptions thrown when reading the data such as FileNotFoundExceptions which is expected for a new Delta table or JSON deserialization errors.- Attributes
- protected
- Definition Classes
- Checkpoints
- def unsafeVolatileSnapshot: Snapshot
Returns the current snapshot.
Returns the current snapshot. This does not automatically
update().WARNING: This is not guaranteed to give you the latest snapshot of the log, nor stay consistent across multiple accesses. If you need the latest snapshot, it is recommended to fetch it using
deltaLog.update(); and save the returned snapshot so it does not unexpectedly change from under you. See how OptimisticTransaction and DeltaScan use the snapshot as examples for write/read paths respectively. This API should only be used in scenarios where any recent snapshot will suffice and an update is undesired, or by internal code that holds the DeltaLog lock to prevent races.- Definition Classes
- SnapshotManagement
- def update(stalenessAcceptable: Boolean = false, checkIfUpdatedSinceTs: Option[Long] = None, catalogTableOpt: Option[CatalogTable] = None): Snapshot
Update ActionLog by applying the new delta files if any.
Update ActionLog by applying the new delta files if any.
- stalenessAcceptable
Whether we can accept working with a stale version of the table. If the table has surpassed our staleness tolerance, we will update to the latest state of the table synchronously. If staleness is acceptable, and the table hasn't passed the staleness tolerance, we will kick off a job in the background to update the table state, and can return a stale snapshot in the meantime.
- checkIfUpdatedSinceTs
Skip the update if we've already updated the snapshot since the specified timestamp.
- catalogTableOpt
The catalog table of the current table.
- Definition Classes
- SnapshotManagement
- def updateAfterCommit(committedVersion: Long, commit: Commit, newChecksumOpt: Option[VersionChecksum], preCommitLogSegment: LogSegment, catalogTableOpt: Option[CatalogTable]): Snapshot
Called after committing a transaction and updating the state of the table.
Called after committing a transaction and updating the state of the table.
- committedVersion
the version that was committed
- commit
information about the commit file.
- newChecksumOpt
the checksum for the new commit, if available. Usually None, since the commit would have just finished.
- preCommitLogSegment
the log segment of the table prior to commit
- catalogTableOpt
the current catalog table
- Definition Classes
- SnapshotManagement
- def updateInternal(isAsync: Boolean, catalogTableOpt: Option[CatalogTable]): Snapshot
Queries the store for new delta files and applies them to the current state.
Queries the store for new delta files and applies them to the current state. Note: the caller should hold
snapshotLockbefore calling this method.- Attributes
- protected
- Definition Classes
- SnapshotManagement
- def upgradeProtocol(catalogTable: Option[CatalogTable], snapshot: Snapshot, newVersion: Protocol): Unit
Upgrade the table's protocol version, by default to the maximum recognized reader and writer versions in this Delta release.
Upgrade the table's protocol version, by default to the maximum recognized reader and writer versions in this Delta release. This method only upgrades protocol version, and will fail if the new protocol version is not a superset of the original one used by the snapshot.
- def useCompactedDeltasForLogSegment(deltasAndCompactedDeltas: Seq[FileStatus], deltasAfterCheckpoint: Array[FileStatus], latestCommitVersion: Long, checkpointVersionToUse: Long): Array[FileStatus]
- deltasAndCompactedDeltas
- all deltas or compacted deltas which could be used
- deltasAfterCheckpoint
- deltas after the last checkpoint file
- latestCommitVersion
- commit version for which we are trying to create Snapshot for
- checkpointVersionToUse
- underlying checkpoint version to use in Snapshot, -1 if no checkpoint is used.
- returns
Returns a list of deltas/compacted-deltas which can be used to construct the LogSegment instead of
deltasAfterCheckpoint.
- Attributes
- protected
- Definition Classes
- SnapshotManagement
- def verifyLogStoreConfs(sparkConf: SparkConf): Unit
Check for conflicting LogStore configs in the spark configuration.
Check for conflicting LogStore configs in the spark configuration.
To maintain compatibility across the Delta ecosystem, we accept keys both with and without the "spark." prefix. This means for setting the class conf, we accept both "spark.delta.logStore.class" and "delta.logStore.class" and for scheme confs we accept both "spark.delta.logStore.${scheme}.impl" and "delta.logStore.${scheme}.impl"
If a conf is set both with and without the spark prefix, it must be set to the same value, otherwise we throw an error.
- Definition Classes
- LogStoreProvider
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- def withCheckpointExceptionHandling(deltaLog: DeltaLog, opType: String)(thunk: => Unit): Unit
Catch non-fatal exceptions related to checkpointing, since the checkpoint is written after the commit has completed.
Catch non-fatal exceptions related to checkpointing, since the checkpoint is written after the commit has completed. From the perspective of the user, the commit has completed successfully. However, throw if this is in a testing environment - that way any breaking changes can be caught in unit tests.
- Attributes
- protected
- Definition Classes
- Checkpoints
- def withNewTransaction[T](catalogTableOpt: Option[CatalogTable], snapshotOpt: Option[Snapshot] = None)(thunk: (OptimisticTransaction) => T): T
Execute a piece of code within a new OptimisticTransaction.
Execute a piece of code within a new OptimisticTransaction. Reads/write sets will be recorded for this table, and all other tables will be read at a snapshot that is pinned on the first access.
- catalogTableOpt
The CatalogTable for the table this transaction updates. Passing None asserts this is a path-based table with no catalog entry.
- snapshotOpt
THe Snapshot this transaction should use, if not latest.
- Note
This uses thread-local variable to make the active transaction visible. So do not use multi-threaded code in the provided thunk.
- def withSnapshotLockInterruptibly[T](body: => T): T
Run
bodyinsidesnapshotLocklock usinglockInterruptiblyso that the thread can be interrupted when waiting for the lock.Run
bodyinsidesnapshotLocklock usinglockInterruptiblyso that the thread can be interrupted when waiting for the lock.- Definition Classes
- SnapshotManagement
- def withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: => T): T
Report a log to indicate some command is running.
Report a log to indicate some command is running.
- Definition Classes
- DeltaProgressReporter
- def writeCheckpointFiles(snapshotToCheckpoint: Snapshot, catalogTableOpt: Option[CatalogTable] = None): LastCheckpointInfo
- Attributes
- protected
- Definition Classes
- Checkpoints
- def writeLastCheckpointFile(deltaLog: DeltaLog, lastCheckpointInfo: LastCheckpointInfo, addChecksum: Boolean): Unit
- Attributes
- protected[delta]
- Definition Classes
- Checkpoints
Deprecated Value Members
- def checkpoint(): Unit
Creates a checkpoint using the default snapshot.
Creates a checkpoint using the default snapshot.
WARNING: This API is being deprecated, and will be removed in future versions. Please use the checkpoint(Snapshot) function below to write checkpoints to the delta log.
- Definition Classes
- Checkpoints
- Annotations
- @deprecated
- Deprecated
(Since version 12.0) This method is deprecated and will be removed in future versions.
- def snapshot: Snapshot
WARNING: This API is unsafe and deprecated.
WARNING: This API is unsafe and deprecated. It will be removed in future versions. Use the above unsafeVolatileSnapshot to get the most recently cached snapshot on the cluster.
- Definition Classes
- SnapshotManagement
- Annotations
- @deprecated
- Deprecated
(Since version 12.0) This method is deprecated and will be removed in future versions. Use unsafeVolatileSnapshot instead
- def startTransaction(): OptimisticTransaction
Legacy/compat overload that does not require catalog table information.
Legacy/compat overload that does not require catalog table information. Avoid prod use.
- Annotations
- @deprecated
- Deprecated
(Since version 3.0) Please use the CatalogTable overload instead
- def withNewTransaction[T](thunk: (OptimisticTransaction) => T): T
Legacy/compat overload that does not require catalog table information.
Legacy/compat overload that does not require catalog table information. Avoid prod use.
- Annotations
- @deprecated
- Deprecated
(Since version 3.0) Please use the CatalogTable overload instead