trait SnapshotManagement extends AnyRef
Manages the creation, computation, and access of Snapshot's for Delta tables. Responsibilities include:
- Figuring out the set of files that are required to compute a specific version of a table
- Updating and exposing the latest snapshot of the Delta table in a thread-safe manner
- Self Type
- DeltaLog
- Alphabetic
- By Inheritance
- SnapshotManagement
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
createLogSegment(versionToLoad: Option[Long] = None, oldCheckpointProviderOpt: Option[UninitializedCheckpointProvider] = None, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient] = None, catalogTableOpt: Option[CatalogTable] = None, lastCheckpointInfo: Option[LastCheckpointInfo] = None): Option[LogSegment]
Get a list of files that can be used to compute a Snapshot at version
versionToLoad, IfversionToLoadis not provided, will generate the list of files that are needed to load the latest version of the Delta table.Get a list of files that can be used to compute a Snapshot at version
versionToLoad, IfversionToLoadis not provided, will generate the list of files that are needed to load the latest version of the Delta table. This method also performs checks to ensure that the delta files are contiguous.- versionToLoad
A specific version to load. Typically used with time travel and the Delta streaming source. If not provided, we will try to load the latest version of the table.
- oldCheckpointProviderOpt
The CheckpointProvider from the previous snapshot. This is used as a start version for the listing when
startCheckpointis unavailable. This is also used to initialize the LogSegment.- tableCommitCoordinatorClientOpt
the optional commit-coordinator client to use for fetching un-backfilled commits.
- catalogTableOpt
the optional catalog table to pass to the commit coordinator client.
- lastCheckpointInfo
LastCheckpointInfo from the _last_checkpoint. This could be used to initialize the Snapshot's LogSegment.
- returns
Some LogSegment to build a Snapshot if files do exist after the given startCheckpoint. None, if the directory was missing or empty.
- Attributes
- protected
-
def
createSnapshot(initSegment: LogSegment, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], checksumOpt: Option[VersionChecksum]): Snapshot
- Attributes
- protected
-
def
createSnapshotAfterCommit(initSegment: LogSegment, newChecksumOpt: Option[VersionChecksum], tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], committedVersion: Long): Snapshot
Creates a snapshot for a new delta commit.
Creates a snapshot for a new delta commit.
- Attributes
- protected
-
def
createSnapshotAtInit(initialCatalogTable: Option[CatalogTable]): Unit
Load the Snapshot for this Delta table at initialization.
Load the Snapshot for this Delta table at initialization. This method uses the
lastCheckpointfile as a hint on where to start listing the transaction log directory. If the _delta_log directory doesn't exist, this method will return anInitialSnapshot.- Attributes
- protected
-
def
createSnapshotFromGivenOrEquivalentLogSegment(initSegment: LogSegment, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable])(snapshotCreator: (LogSegment) ⇒ Snapshot): Snapshot
Create a Snapshot from the given LogSegment.
Create a Snapshot from the given LogSegment. If failing to create the snapshot, we will search an equivalent LogSegment using a different checkpoint and retry up to DeltaSQLConf.DELTA_SNAPSHOT_LOADING_MAX_RETRIES times.
- Attributes
- protected
-
val
currentSnapshot: CapturedSnapshot
Cached latest snapshot.
Cached latest snapshot. This is initialized in
createSnapshotAtInit- Attributes
- protected
- Annotations
- @volatile()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
getCheckpointVersion(lastCheckpointInfoOpt: Option[LastCheckpointInfo], oldCheckpointProviderOpt: Option[UninitializedCheckpointProvider]): Long
Returns the last known checkpoint version based on LastCheckpointInfo or CheckpointProvider.
Returns the last known checkpoint version based on LastCheckpointInfo or CheckpointProvider. Returns -1 if both the info is not available.
- Attributes
- protected
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getLogSegmentAfterCommit(tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], oldCheckpointProvider: UninitializedCheckpointProvider): LogSegment
- Attributes
- protected[delta]
-
def
getLogSegmentAfterCommit(committedVersion: Long, newChecksumOpt: Option[VersionChecksum], preCommitLogSegment: LogSegment, commit: Commit, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], oldCheckpointProvider: CheckpointProvider): LogSegment
Used to compute the LogSegment after a commit, by adding the delta file with the specified version to the preCommitLogSegment (which must match the immediately preceding version).
Used to compute the LogSegment after a commit, by adding the delta file with the specified version to the preCommitLogSegment (which must match the immediately preceding version).
- Attributes
- protected[delta]
-
def
getLogSegmentForVersion(versionToLoad: Option[Long], files: Option[Array[FileStatus]], validateLogSegmentWithoutCompactedDeltas: Boolean, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], oldCheckpointProviderOpt: Option[UninitializedCheckpointProvider], lastCheckpointInfo: Option[LastCheckpointInfo]): Option[LogSegment]
Helper function for the getLogSegmentForVersion above.
Helper function for the getLogSegmentForVersion above. Called with a provided files list, and will then try to construct a new LogSegment using that. *Note*: If table is a coordinated-commits table, the commit-coordinator MUST be passed to correctly list the commits.
- Attributes
- protected
-
def
getSnapshotAt(version: Long, lastCheckpointHint: Option[CheckpointInstance] = None, catalogTableOpt: Option[CatalogTable] = None): Snapshot
Get the snapshot at
version. -
def
getSnapshotForLogSegmentInternal(previousSnapshotOpt: Option[Snapshot], segmentOpt: Option[LogSegment], tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], isAsync: Boolean): Snapshot
Creates a Snapshot for the given
segmentOptCreates a Snapshot for the given
segmentOpt- Attributes
- protected
-
def
getUpdatedLogSegment(oldLogSegment: LogSegment, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable]): (LogSegment, Seq[FileStatus])
Get the newest logSegment, using the previous logSegment as a hint.
Get the newest logSegment, using the previous logSegment as a hint. This is faster than doing a full update, but it won't work if the table's log directory was replaced.
-
def
getUpdatedSnapshot(oldSnapshotOpt: Option[Snapshot], initialSegmentForNewSnapshot: Option[LogSegment], initialTableCommitCoordinatorClient: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], isAsync: Boolean): Snapshot
Updates and installs a new snapshot in the
currentSnapshot.Updates and installs a new snapshot in the
currentSnapshot. This method takes care of recursively creating new snapshots if the commit-coordinator has changed.- oldSnapshotOpt
The previous snapshot, if any.
- initialSegmentForNewSnapshot
the log segment constructed for the new snapshot
- initialTableCommitCoordinatorClient
the commit-coordinator used for constructing the
initialSegmentForNewSnapshot- catalogTableOpt
the optional catalog table to pass to the commit coordinator client.
- isAsync
Whether the update is async.
- returns
The new snapshot.
- Attributes
- protected
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
installSnapshot(newSnapshot: Snapshot, updateTimestamp: Long): Snapshot
Installs the given
newSnapshotas thecurrentSnapshotInstalls the given
newSnapshotas thecurrentSnapshot- Attributes
- protected
-
def
isCurrentlyStale: (Long) ⇒ Boolean
Checks if the given timestamp is outside the current staleness window
Checks if the given timestamp is outside the current staleness window
- Attributes
- protected
-
def
isDeltaCommitOrCheckpointFile(path: Path): Boolean
Returns true if the path is delta log files.
Returns true if the path is delta log files. Delta log files can be delta commit file (e.g., 000000000.json), or checkpoint file. (e.g., 000000001.checkpoint.00001.00003.parquet)
- path
Path of a file
- returns
Boolean Whether the file is delta log files
- Attributes
- protected
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
val
lastSeenChecksumFileStatusOpt: Option[FileStatus]
Cached fileStatus for the latest CRC file seen in the deltaLog.
Cached fileStatus for the latest CRC file seen in the deltaLog.
- Attributes
- protected
- Annotations
- @volatile()
-
final
def
listDeltaCompactedDeltaAndCheckpointFiles(startVersion: Long, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], versionToLoad: Option[Long], includeMinorCompactions: Boolean): Option[Array[FileStatus]]
This method is designed to efficiently and reliably list delta, compacted delta, and checkpoint files associated with a Delta Lake table.
This method is designed to efficiently and reliably list delta, compacted delta, and checkpoint files associated with a Delta Lake table. It makes parallel calls to both the file-system and a commit-coordinator (if available), reconciles the results to account for asynchronous backfill operations, and ensures a comprehensive list of file statuses without missing any concurrently backfilled files. *Note*: If table is a coordinated-commits table, the commit-coordinator client MUST be passed to correctly list the commits.
- startVersion
the version to start. Inclusive.
- tableCommitCoordinatorClientOpt
the optional commit-coordinator client to use for fetching un-backfilled commits.
- catalogTableOpt
the optional catalog table to pass to the commit coordinator client.
- versionToLoad
the optional parameter to set the max version we should return. Inclusive.
- includeMinorCompactions
Whether to include minor compaction files in the result
- returns
Some array of files found (possibly empty, if no usable commit files are present), or None if the listing returned no files at all.
- Attributes
- protected
-
def
listDeltaCompactedDeltaCheckpointFilesAndLatestChecksumFile(startVersion: Long, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], versionToLoad: Option[Long], includeMinorCompactions: Boolean): (Option[Array[FileStatus]], Option[FileStatus])
This method is designed to efficiently and reliably list delta, compacted delta, and checkpoint files associated with a Delta Lake table.
This method is designed to efficiently and reliably list delta, compacted delta, and checkpoint files associated with a Delta Lake table. It makes parallel calls to both the file-system and a commit-coordinator (if available), reconciles the results to account for asynchronous backfill operations, and ensures a comprehensive list of file statuses without missing any concurrently backfilled files. *Note*: If table is a coordinated-commits table, the commit coordinator MUST be passed to correctly list the commits. The function also collects the latest checksum file found in the listings and returns it.
- startVersion
the version to start. Inclusive.
- tableCommitCoordinatorClientOpt
the optional commit coordinator to use for fetching un-backfilled commits.
- catalogTableOpt
the optional catalog table to pass to the commit coordinator client.
- versionToLoad
the optional parameter to set the max version we should return. Inclusive.
- includeMinorCompactions
Whether to include minor compaction files in the result
- returns
A tuple where the first element is an array of log files (possibly empty, if no usable log files are found), and the second element is the latest checksum file found which has a version less than or equal to
versionToLoad.
- Attributes
- protected
-
def
listFromOrNone(startVersion: Long): Option[Iterator[FileStatus]]
Returns an iterator containing a list of files found from the provided path
Returns an iterator containing a list of files found from the provided path
- Attributes
- protected
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
val
snapshotLock: ReentrantLock
Use ReentrantLock to allow us to call
lockInterruptiblyUse ReentrantLock to allow us to call
lockInterruptibly- Attributes
- protected
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
- def throwNonExistentVersionError(versionToLoad: Long): Unit
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
unsafeVolatileSnapshot: Snapshot
Returns the current snapshot.
Returns the current snapshot. This does not automatically
update().WARNING: This is not guaranteed to give you the latest snapshot of the log, nor stay consistent across multiple accesses. If you need the latest snapshot, it is recommended to fetch it using
deltaLog.update(); and save the returned snapshot so it does not unexpectedly change from under you. See how OptimisticTransaction and DeltaScan use the snapshot as examples for write/read paths respectively. This API should only be used in scenarios where any recent snapshot will suffice and an update is undesired, or by internal code that holds the DeltaLog lock to prevent races. -
def
update(stalenessAcceptable: Boolean = false, checkIfUpdatedSinceTs: Option[Long] = None, catalogTableOpt: Option[CatalogTable] = None): Snapshot
Update ActionLog by applying the new delta files if any.
Update ActionLog by applying the new delta files if any.
- stalenessAcceptable
Whether we can accept working with a stale version of the table. If the table has surpassed our staleness tolerance, we will update to the latest state of the table synchronously. If staleness is acceptable, and the table hasn't passed the staleness tolerance, we will kick off a job in the background to update the table state, and can return a stale snapshot in the meantime.
- checkIfUpdatedSinceTs
Skip the update if we've already updated the snapshot since the specified timestamp.
- catalogTableOpt
The catalog table of the current table.
-
def
updateAfterCommit(committedVersion: Long, commit: Commit, newChecksumOpt: Option[VersionChecksum], preCommitLogSegment: LogSegment, catalogTableOpt: Option[CatalogTable]): Snapshot
Called after committing a transaction and updating the state of the table.
Called after committing a transaction and updating the state of the table.
- committedVersion
the version that was committed
- commit
information about the commit file.
- newChecksumOpt
the checksum for the new commit, if available. Usually None, since the commit would have just finished.
- preCommitLogSegment
the log segment of the table prior to commit
- catalogTableOpt
the current catalog table
-
def
updateInternal(isAsync: Boolean, catalogTableOpt: Option[CatalogTable]): Snapshot
Queries the store for new delta files and applies them to the current state.
Queries the store for new delta files and applies them to the current state. Note: the caller should hold
snapshotLockbefore calling this method.- Attributes
- protected
-
def
useCompactedDeltasForLogSegment(deltasAndCompactedDeltas: Seq[FileStatus], deltasAfterCheckpoint: Array[FileStatus], latestCommitVersion: Long, checkpointVersionToUse: Long): Array[FileStatus]
- deltasAndCompactedDeltas
- all deltas or compacted deltas which could be used
- deltasAfterCheckpoint
- deltas after the last checkpoint file
- latestCommitVersion
- commit version for which we are trying to create Snapshot for
- checkpointVersionToUse
- underlying checkpoint version to use in Snapshot, -1 if no checkpoint is used.
- returns
Returns a list of deltas/compacted-deltas which can be used to construct the LogSegment instead of
deltasAfterCheckpoint.
- Attributes
- protected
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
withSnapshotLockInterruptibly[T](body: ⇒ T): T
Run
bodyinsidesnapshotLocklock usinglockInterruptiblyso that the thread can be interrupted when waiting for the lock.