trait SnapshotManagement extends AnyRef
Manages the creation, computation, and access of Snapshot's for Delta tables. Responsibilities include:
- Figuring out the set of files that are required to compute a specific version of a table
- Updating and exposing the latest snapshot of the Delta table in a thread-safe manner
- Self Type
- DeltaLog
- Alphabetic
- By Inheritance
- SnapshotManagement
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- def createLogSegment(versionToLoad: Option[Long] = None, oldCheckpointProviderOpt: Option[UninitializedCheckpointProvider] = None, tableCommitOwnerClientOpt: Option[TableCommitOwnerClient] = None, lastCheckpointInfo: Option[LastCheckpointInfo] = None): Option[LogSegment]
Get a list of files that can be used to compute a Snapshot at version
versionToLoad, IfversionToLoadis not provided, will generate the list of files that are needed to load the latest version of the Delta table.Get a list of files that can be used to compute a Snapshot at version
versionToLoad, IfversionToLoadis not provided, will generate the list of files that are needed to load the latest version of the Delta table. This method also performs checks to ensure that the delta files are contiguous.- versionToLoad
A specific version to load. Typically used with time travel and the Delta streaming source. If not provided, we will try to load the latest version of the table.
- oldCheckpointProviderOpt
The CheckpointProvider from the previous snapshot. This is used as a start version for the listing when
startCheckpointis unavailable. This is also used to initialize the LogSegment.- lastCheckpointInfo
LastCheckpointInfo from the _last_checkpoint. This could be used to initialize the Snapshot's LogSegment.
- returns
Some LogSegment to build a Snapshot if files do exist after the given startCheckpoint. None, if the directory was missing or empty.
- Attributes
- protected
- def createSnapshot(initSegment: LogSegment, tableCommitOwnerClientOpt: Option[TableCommitOwnerClient], checksumOpt: Option[VersionChecksum]): Snapshot
- Attributes
- protected
- def createSnapshotAfterCommit(initSegment: LogSegment, newChecksumOpt: Option[VersionChecksum], tableCommitOwnerClientOpt: Option[TableCommitOwnerClient], committedVersion: Long): Snapshot
Creates a snapshot for a new delta commit.
Creates a snapshot for a new delta commit.
- Attributes
- protected
- def createSnapshotFromGivenOrEquivalentLogSegment(initSegment: LogSegment, tableCommitOwnerClientOpt: Option[TableCommitOwnerClient])(snapshotCreator: (LogSegment) => Snapshot): Snapshot
Create a Snapshot from the given LogSegment.
Create a Snapshot from the given LogSegment. If failing to create the snapshot, we will search an equivalent LogSegment using a different checkpoint and retry up to DeltaSQLConf.DELTA_SNAPSHOT_LOADING_MAX_RETRIES times.
- Attributes
- protected
- val currentSnapshot: CapturedSnapshot
- Attributes
- protected
- Annotations
- @volatile()
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- def getCheckpointVersion(lastCheckpointInfoOpt: Option[LastCheckpointInfo], oldCheckpointProviderOpt: Option[UninitializedCheckpointProvider]): Long
Returns the last known checkpoint version based on LastCheckpointInfo or CheckpointProvider.
Returns the last known checkpoint version based on LastCheckpointInfo or CheckpointProvider. Returns -1 if both the info is not available.
- Attributes
- protected
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def getLogSegmentAfterCommit(tableCommitOwnerClientOpt: Option[TableCommitOwnerClient], oldCheckpointProvider: UninitializedCheckpointProvider): LogSegment
- Attributes
- protected[delta]
- def getLogSegmentAfterCommit(committedVersion: Long, newChecksumOpt: Option[VersionChecksum], preCommitLogSegment: LogSegment, commit: Commit, tableCommitOwnerClientOpt: Option[TableCommitOwnerClient], oldCheckpointProvider: CheckpointProvider): LogSegment
Used to compute the LogSegment after a commit, by adding the delta file with the specified version to the preCommitLogSegment (which must match the immediately preceding version).
Used to compute the LogSegment after a commit, by adding the delta file with the specified version to the preCommitLogSegment (which must match the immediately preceding version).
- Attributes
- protected[delta]
- def getLogSegmentForVersion(versionToLoad: Option[Long], files: Option[Array[FileStatus]], validateLogSegmentWithoutCompactedDeltas: Boolean, tableCommitOwnerClientOpt: Option[TableCommitOwnerClient], oldCheckpointProviderOpt: Option[UninitializedCheckpointProvider], lastCheckpointInfo: Option[LastCheckpointInfo]): Option[LogSegment]
Helper function for the getLogSegmentForVersion above.
Helper function for the getLogSegmentForVersion above. Called with a provided files list, and will then try to construct a new LogSegment using that. *Note*: If table is a managed-commit table, the commit-owner MUST be passed to correctly list the commits.
- Attributes
- protected
- def getSnapshotAt(version: Long, lastCheckpointHint: Option[CheckpointInstance] = None): Snapshot
Get the snapshot at
version. - def getSnapshotAtInit: CapturedSnapshot
Load the Snapshot for this Delta table at initialization.
Load the Snapshot for this Delta table at initialization. This method uses the
lastCheckpointfile as a hint on where to start listing the transaction log directory. If the _delta_log directory doesn't exist, this method will return anInitialSnapshot.- Attributes
- protected
- def getSnapshotForLogSegmentInternal(previousSnapshotOpt: Option[Snapshot], segmentOpt: Option[LogSegment], tableCommitOwnerClientOpt: Option[TableCommitOwnerClient], isAsync: Boolean): Snapshot
Creates a Snapshot for the given
segmentOptCreates a Snapshot for the given
segmentOpt- Attributes
- protected
- def getUpdatedLogSegment(oldLogSegment: LogSegment, tableCommitOwnerClientOpt: Option[TableCommitOwnerClient]): (LogSegment, Seq[FileStatus])
Get the newest logSegment, using the previous logSegment as a hint.
Get the newest logSegment, using the previous logSegment as a hint. This is faster than doing a full update, but it won't work if the table's log directory was replaced.
- def getUpdatedSnapshot(oldSnapshotOpt: Option[Snapshot], initialSegmentForNewSnapshot: Option[LogSegment], initialTableCommitOwnerClient: Option[TableCommitOwnerClient], isAsync: Boolean): Snapshot
Updates and installs a new snapshot in the
currentSnapshot.Updates and installs a new snapshot in the
currentSnapshot. This method takes care of recursively creating new snapshots if the commit-owner has changed.- oldSnapshotOpt
The previous snapshot, if any.
- initialSegmentForNewSnapshot
the log segment constructed for the new snapshot
- initialTableCommitOwnerClient
the commit-owner used for constructing the
initialSegmentForNewSnapshot- isAsync
Whether the update is async.
- returns
The new snapshot.
- Attributes
- protected
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def installSnapshot(newSnapshot: Snapshot, updateTimestamp: Long): Snapshot
Installs the given
newSnapshotas thecurrentSnapshotInstalls the given
newSnapshotas thecurrentSnapshot- Attributes
- protected
- def isCurrentlyStale: (Long) => Boolean
Checks if the given timestamp is outside the current staleness window
Checks if the given timestamp is outside the current staleness window
- Attributes
- protected
- def isDeltaCommitOrCheckpointFile(path: Path): Boolean
Returns true if the path is delta log files.
Returns true if the path is delta log files. Delta log files can be delta commit file (e.g., 000000000.json), or checkpoint file. (e.g., 000000001.checkpoint.00001.00003.parquet)
- path
Path of a file
- returns
Boolean Whether the file is delta log files
- Attributes
- protected
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- val lastSeenChecksumFileStatusOpt: Option[FileStatus]
Cached fileStatus for the latest CRC file seen in the deltaLog.
Cached fileStatus for the latest CRC file seen in the deltaLog.
- Attributes
- protected
- Annotations
- @volatile()
- final def listDeltaCompactedDeltaAndCheckpointFiles(startVersion: Long, tableCommitOwnerClientOpt: Option[TableCommitOwnerClient], versionToLoad: Option[Long], includeMinorCompactions: Boolean): Option[Array[FileStatus]]
This method is designed to efficiently and reliably list delta, compacted delta, and checkpoint files associated with a Delta Lake table.
This method is designed to efficiently and reliably list delta, compacted delta, and checkpoint files associated with a Delta Lake table. It makes parallel calls to both the file-system and a commit-owner (if available), reconciles the results to account for asynchronous backfill operations, and ensures a comprehensive list of file statuses without missing any concurrently backfilled files. *Note*: If table is a managed-commit table, the commit-owner client MUST be passed to correctly list the commits.
- startVersion
the version to start. Inclusive.
- tableCommitOwnerClientOpt
the optional commit-owner client to use for fetching un-backfilled commits.
- versionToLoad
the optional parameter to set the max version we should return. Inclusive.
- includeMinorCompactions
Whether to include minor compaction files in the result
- returns
Some array of files found (possibly empty, if no usable commit files are present), or None if the listing returned no files at all.
- Attributes
- protected
- def listFromOrNone(startVersion: Long): Option[Iterator[FileStatus]]
Returns an iterator containing a list of files found from the provided path
Returns an iterator containing a list of files found from the provided path
- Attributes
- protected
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- val snapshotLock: ReentrantLock
Use ReentrantLock to allow us to call
lockInterruptiblyUse ReentrantLock to allow us to call
lockInterruptibly- Attributes
- protected
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def throwNonExistentVersionError(versionToLoad: Long): Unit
- def toString(): String
- Definition Classes
- AnyRef → Any
- def unsafeVolatileSnapshot: Snapshot
Returns the current snapshot.
Returns the current snapshot. This does not automatically
update().WARNING: This is not guaranteed to give you the latest snapshot of the log, nor stay consistent across multiple accesses. If you need the latest snapshot, it is recommended to fetch it using
deltaLog.update(); and save the returned snapshot so it does not unexpectedly change from under you. See how OptimisticTransaction and DeltaScan use the snapshot as examples for write/read paths respectively. This API should only be used in scenarios where any recent snapshot will suffice and an update is undesired, or by internal code that holds the DeltaLog lock to prevent races. - def update(stalenessAcceptable: Boolean = false, checkIfUpdatedSinceTs: Option[Long] = None): Snapshot
Update ActionLog by applying the new delta files if any.
Update ActionLog by applying the new delta files if any.
- stalenessAcceptable
Whether we can accept working with a stale version of the table. If the table has surpassed our staleness tolerance, we will update to the latest state of the table synchronously. If staleness is acceptable, and the table hasn't passed the staleness tolerance, we will kick off a job in the background to update the table state, and can return a stale snapshot in the meantime.
- checkIfUpdatedSinceTs
Skip the update if we've already updated the snapshot since the specified timestamp.
- def updateAfterCommit(committedVersion: Long, commit: Commit, newChecksumOpt: Option[VersionChecksum], preCommitLogSegment: LogSegment): Snapshot
Called after committing a transaction and updating the state of the table.
Called after committing a transaction and updating the state of the table.
- committedVersion
the version that was committed
- commit
information about the commit file.
- newChecksumOpt
the checksum for the new commit, if available. Usually None, since the commit would have just finished.
- preCommitLogSegment
the log segment of the table prior to commit
- def updateInternal(isAsync: Boolean): Snapshot
Queries the store for new delta files and applies them to the current state.
Queries the store for new delta files and applies them to the current state. Note: the caller should hold
snapshotLockbefore calling this method.- Attributes
- protected
- def useCompactedDeltasForLogSegment(deltasAndCompactedDeltas: Seq[FileStatus], deltasAfterCheckpoint: Array[FileStatus], latestCommitVersion: Long, checkpointVersionToUse: Long): Array[FileStatus]
- deltasAndCompactedDeltas
- all deltas or compacted deltas which could be used
- deltasAfterCheckpoint
- deltas after the last checkpoint file
- latestCommitVersion
- commit version for which we are trying to create Snapshot for
- checkpointVersionToUse
- underlying checkpoint version to use in Snapshot, -1 if no checkpoint is used.
- returns
Returns a list of deltas/compacted-deltas which can be used to construct the LogSegment instead of
deltasAfterCheckpoint.
- Attributes
- protected
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- def withSnapshotLockInterruptibly[T](body: => T): T
Run
bodyinsidesnapshotLocklock usinglockInterruptiblyso that the thread can be interrupted when waiting for the lock.
Deprecated Value Members
- def snapshot: Snapshot
WARNING: This API is unsafe and deprecated.
WARNING: This API is unsafe and deprecated. It will be removed in future versions. Use the above unsafeVolatileSnapshot to get the most recently cached snapshot on the cluster.
- Annotations
- @deprecated
- Deprecated
(Since version 12.0) This method is deprecated and will be removed in future versions. Use unsafeVolatileSnapshot instead