Packages

trait SnapshotManagement extends AnyRef

Manages the creation, computation, and access of Snapshot's for Delta tables. Responsibilities include:

  • Figuring out the set of files that are required to compute a specific version of a table
  • Updating and exposing the latest snapshot of the Delta table in a thread-safe manner
Self Type
DeltaLog
Linear Supertypes
AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. SnapshotManagement
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  6. def createLogSegment(versionToLoad: Option[Long] = None, oldCheckpointProviderOpt: Option[UninitializedCheckpointProvider] = None, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient] = None, catalogTableOpt: Option[CatalogTable] = None, lastCheckpointInfo: Option[LastCheckpointInfo] = None): Option[LogSegment]

    Get a list of files that can be used to compute a Snapshot at version versionToLoad, If versionToLoad is not provided, will generate the list of files that are needed to load the latest version of the Delta table.

    Get a list of files that can be used to compute a Snapshot at version versionToLoad, If versionToLoad is not provided, will generate the list of files that are needed to load the latest version of the Delta table. This method also performs checks to ensure that the delta files are contiguous.

    versionToLoad

    A specific version to load. Typically used with time travel and the Delta streaming source. If not provided, we will try to load the latest version of the table.

    oldCheckpointProviderOpt

    The CheckpointProvider from the previous snapshot. This is used as a start version for the listing when startCheckpoint is unavailable. This is also used to initialize the LogSegment.

    tableCommitCoordinatorClientOpt

    the optional commit-coordinator client to use for fetching un-backfilled commits.

    catalogTableOpt

    the optional catalog table to pass to the commit coordinator client.

    lastCheckpointInfo

    LastCheckpointInfo from the _last_checkpoint. This could be used to initialize the Snapshot's LogSegment.

    returns

    Some LogSegment to build a Snapshot if files do exist after the given startCheckpoint. None, if the directory was missing or empty.

    Attributes
    protected
  7. def createSnapshot(initSegment: LogSegment, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], checksumOpt: Option[VersionChecksum]): Snapshot
    Attributes
    protected
  8. def createSnapshotAfterCommit(initSegment: LogSegment, newChecksumOpt: Option[VersionChecksum], tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], committedVersion: Long): Snapshot

    Creates a snapshot for a new delta commit.

    Creates a snapshot for a new delta commit.

    Attributes
    protected
  9. def createSnapshotAtInit(initialCatalogTable: Option[CatalogTable]): Unit

    Load the Snapshot for this Delta table at initialization.

    Load the Snapshot for this Delta table at initialization. This method uses the lastCheckpoint file as a hint on where to start listing the transaction log directory. If the _delta_log directory doesn't exist, this method will return an InitialSnapshot.

    Attributes
    protected
  10. def createSnapshotFromGivenOrEquivalentLogSegment(initSegment: LogSegment, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable])(snapshotCreator: (LogSegment) ⇒ Snapshot): Snapshot

    Create a Snapshot from the given LogSegment.

    Create a Snapshot from the given LogSegment. If failing to create the snapshot, we will search an equivalent LogSegment using a different checkpoint and retry up to DeltaSQLConf.DELTA_SNAPSHOT_LOADING_MAX_RETRIES times.

    Attributes
    protected
  11. val currentSnapshot: CapturedSnapshot

    Cached latest snapshot.

    Cached latest snapshot. This is initialized in createSnapshotAtInit

    Attributes
    protected
    Annotations
    @volatile()
  12. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  13. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  14. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  15. def getCheckpointVersion(lastCheckpointInfoOpt: Option[LastCheckpointInfo], oldCheckpointProviderOpt: Option[UninitializedCheckpointProvider]): Long

    Returns the last known checkpoint version based on LastCheckpointInfo or CheckpointProvider.

    Returns the last known checkpoint version based on LastCheckpointInfo or CheckpointProvider. Returns -1 if both the info is not available.

    Attributes
    protected
  16. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  17. def getLogSegmentAfterCommit(tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], oldCheckpointProvider: UninitializedCheckpointProvider): LogSegment
    Attributes
    protected[delta]
  18. def getLogSegmentAfterCommit(committedVersion: Long, newChecksumOpt: Option[VersionChecksum], preCommitLogSegment: LogSegment, commit: Commit, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], oldCheckpointProvider: CheckpointProvider): LogSegment

    Used to compute the LogSegment after a commit, by adding the delta file with the specified version to the preCommitLogSegment (which must match the immediately preceding version).

    Used to compute the LogSegment after a commit, by adding the delta file with the specified version to the preCommitLogSegment (which must match the immediately preceding version).

    Attributes
    protected[delta]
  19. def getLogSegmentForVersion(versionToLoad: Option[Long], files: Option[Array[FileStatus]], validateLogSegmentWithoutCompactedDeltas: Boolean, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], oldCheckpointProviderOpt: Option[UninitializedCheckpointProvider], lastCheckpointInfo: Option[LastCheckpointInfo]): Option[LogSegment]

    Helper function for the getLogSegmentForVersion above.

    Helper function for the getLogSegmentForVersion above. Called with a provided files list, and will then try to construct a new LogSegment using that. *Note*: If table is a coordinated-commits table, the commit-coordinator MUST be passed to correctly list the commits.

    Attributes
    protected
  20. def getSnapshotAt(version: Long, lastCheckpointHint: Option[CheckpointInstance] = None, catalogTableOpt: Option[CatalogTable] = None): Snapshot

    Get the snapshot at version.

  21. def getSnapshotForLogSegmentInternal(previousSnapshotOpt: Option[Snapshot], segmentOpt: Option[LogSegment], tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], isAsync: Boolean): Snapshot

    Creates a Snapshot for the given segmentOpt

    Creates a Snapshot for the given segmentOpt

    Attributes
    protected
  22. def getUpdatedLogSegment(oldLogSegment: LogSegment, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable]): (LogSegment, Seq[FileStatus])

    Get the newest logSegment, using the previous logSegment as a hint.

    Get the newest logSegment, using the previous logSegment as a hint. This is faster than doing a full update, but it won't work if the table's log directory was replaced.

  23. def getUpdatedSnapshot(oldSnapshotOpt: Option[Snapshot], initialSegmentForNewSnapshot: Option[LogSegment], initialTableCommitCoordinatorClient: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], isAsync: Boolean): Snapshot

    Updates and installs a new snapshot in the currentSnapshot.

    Updates and installs a new snapshot in the currentSnapshot. This method takes care of recursively creating new snapshots if the commit-coordinator has changed.

    oldSnapshotOpt

    The previous snapshot, if any.

    initialSegmentForNewSnapshot

    the log segment constructed for the new snapshot

    initialTableCommitCoordinatorClient

    the commit-coordinator used for constructing the initialSegmentForNewSnapshot

    catalogTableOpt

    the optional catalog table to pass to the commit coordinator client.

    isAsync

    Whether the update is async.

    returns

    The new snapshot.

    Attributes
    protected
  24. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  25. def installSnapshot(newSnapshot: Snapshot, updateTimestamp: Long): Snapshot

    Installs the given newSnapshot as the currentSnapshot

    Installs the given newSnapshot as the currentSnapshot

    Attributes
    protected
  26. def isCurrentlyStale: (Long) ⇒ Boolean

    Checks if the given timestamp is outside the current staleness window

    Checks if the given timestamp is outside the current staleness window

    Attributes
    protected
  27. def isDeltaCommitOrCheckpointFile(path: Path): Boolean

    Returns true if the path is delta log files.

    Returns true if the path is delta log files. Delta log files can be delta commit file (e.g., 000000000.json), or checkpoint file. (e.g., 000000001.checkpoint.00001.00003.parquet)

    path

    Path of a file

    returns

    Boolean Whether the file is delta log files

    Attributes
    protected
  28. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  29. val lastSeenChecksumFileStatusOpt: Option[FileStatus]

    Cached fileStatus for the latest CRC file seen in the deltaLog.

    Cached fileStatus for the latest CRC file seen in the deltaLog.

    Attributes
    protected
    Annotations
    @volatile()
  30. final def listDeltaCompactedDeltaAndCheckpointFiles(startVersion: Long, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], versionToLoad: Option[Long], includeMinorCompactions: Boolean): Option[Array[FileStatus]]

    This method is designed to efficiently and reliably list delta, compacted delta, and checkpoint files associated with a Delta Lake table.

    This method is designed to efficiently and reliably list delta, compacted delta, and checkpoint files associated with a Delta Lake table. It makes parallel calls to both the file-system and a commit-coordinator (if available), reconciles the results to account for asynchronous backfill operations, and ensures a comprehensive list of file statuses without missing any concurrently backfilled files. *Note*: If table is a coordinated-commits table, the commit-coordinator client MUST be passed to correctly list the commits.

    startVersion

    the version to start. Inclusive.

    tableCommitCoordinatorClientOpt

    the optional commit-coordinator client to use for fetching un-backfilled commits.

    catalogTableOpt

    the optional catalog table to pass to the commit coordinator client.

    versionToLoad

    the optional parameter to set the max version we should return. Inclusive.

    includeMinorCompactions

    Whether to include minor compaction files in the result

    returns

    Some array of files found (possibly empty, if no usable commit files are present), or None if the listing returned no files at all.

    Attributes
    protected
  31. def listDeltaCompactedDeltaCheckpointFilesAndLatestChecksumFile(startVersion: Long, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], versionToLoad: Option[Long], includeMinorCompactions: Boolean): (Option[Array[FileStatus]], Option[FileStatus])

    This method is designed to efficiently and reliably list delta, compacted delta, and checkpoint files associated with a Delta Lake table.

    This method is designed to efficiently and reliably list delta, compacted delta, and checkpoint files associated with a Delta Lake table. It makes parallel calls to both the file-system and a commit-coordinator (if available), reconciles the results to account for asynchronous backfill operations, and ensures a comprehensive list of file statuses without missing any concurrently backfilled files. *Note*: If table is a coordinated-commits table, the commit coordinator MUST be passed to correctly list the commits. The function also collects the latest checksum file found in the listings and returns it.

    startVersion

    the version to start. Inclusive.

    tableCommitCoordinatorClientOpt

    the optional commit coordinator to use for fetching un-backfilled commits.

    catalogTableOpt

    the optional catalog table to pass to the commit coordinator client.

    versionToLoad

    the optional parameter to set the max version we should return. Inclusive.

    includeMinorCompactions

    Whether to include minor compaction files in the result

    returns

    A tuple where the first element is an array of log files (possibly empty, if no usable log files are found), and the second element is the latest checksum file found which has a version less than or equal to versionToLoad.

    Attributes
    protected
  32. def listFromOrNone(startVersion: Long): Option[Iterator[FileStatus]]

    Returns an iterator containing a list of files found from the provided path

    Returns an iterator containing a list of files found from the provided path

    Attributes
    protected
  33. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  34. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  35. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  36. val snapshotLock: ReentrantLock

    Use ReentrantLock to allow us to call lockInterruptibly

    Use ReentrantLock to allow us to call lockInterruptibly

    Attributes
    protected
  37. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  38. def throwNonExistentVersionError(versionToLoad: Long): Unit
  39. def toString(): String
    Definition Classes
    AnyRef → Any
  40. def unsafeVolatileSnapshot: Snapshot

    Returns the current snapshot.

    Returns the current snapshot. This does not automatically update().

    WARNING: This is not guaranteed to give you the latest snapshot of the log, nor stay consistent across multiple accesses. If you need the latest snapshot, it is recommended to fetch it using deltaLog.update(); and save the returned snapshot so it does not unexpectedly change from under you. See how OptimisticTransaction and DeltaScan use the snapshot as examples for write/read paths respectively. This API should only be used in scenarios where any recent snapshot will suffice and an update is undesired, or by internal code that holds the DeltaLog lock to prevent races.

  41. def update(stalenessAcceptable: Boolean = false, checkIfUpdatedSinceTs: Option[Long] = None, catalogTableOpt: Option[CatalogTable] = None): Snapshot

    Update ActionLog by applying the new delta files if any.

    Update ActionLog by applying the new delta files if any.

    stalenessAcceptable

    Whether we can accept working with a stale version of the table. If the table has surpassed our staleness tolerance, we will update to the latest state of the table synchronously. If staleness is acceptable, and the table hasn't passed the staleness tolerance, we will kick off a job in the background to update the table state, and can return a stale snapshot in the meantime.

    checkIfUpdatedSinceTs

    Skip the update if we've already updated the snapshot since the specified timestamp.

    catalogTableOpt

    The catalog table of the current table.

  42. def updateAfterCommit(committedVersion: Long, commit: Commit, newChecksumOpt: Option[VersionChecksum], preCommitLogSegment: LogSegment, catalogTableOpt: Option[CatalogTable]): Snapshot

    Called after committing a transaction and updating the state of the table.

    Called after committing a transaction and updating the state of the table.

    committedVersion

    the version that was committed

    commit

    information about the commit file.

    newChecksumOpt

    the checksum for the new commit, if available. Usually None, since the commit would have just finished.

    preCommitLogSegment

    the log segment of the table prior to commit

    catalogTableOpt

    the current catalog table

  43. def updateInternal(isAsync: Boolean, catalogTableOpt: Option[CatalogTable]): Snapshot

    Queries the store for new delta files and applies them to the current state.

    Queries the store for new delta files and applies them to the current state. Note: the caller should hold snapshotLock before calling this method.

    Attributes
    protected
  44. def useCompactedDeltasForLogSegment(deltasAndCompactedDeltas: Seq[FileStatus], deltasAfterCheckpoint: Array[FileStatus], latestCommitVersion: Long, checkpointVersionToUse: Long): Array[FileStatus]

    deltasAndCompactedDeltas

    - all deltas or compacted deltas which could be used

    deltasAfterCheckpoint

    - deltas after the last checkpoint file

    latestCommitVersion

    - commit version for which we are trying to create Snapshot for

    checkpointVersionToUse

    - underlying checkpoint version to use in Snapshot, -1 if no checkpoint is used.

    returns

    Returns a list of deltas/compacted-deltas which can be used to construct the LogSegment instead of deltasAfterCheckpoint.

    Attributes
    protected
  45. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  46. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  47. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  48. def withSnapshotLockInterruptibly[T](body: ⇒ T): T

    Run body inside snapshotLock lock using lockInterruptibly so that the thread can be interrupted when waiting for the lock.

Deprecated Value Members

  1. def snapshot: Snapshot

    WARNING: This API is unsafe and deprecated.

    WARNING: This API is unsafe and deprecated. It will be removed in future versions. Use the above unsafeVolatileSnapshot to get the most recently cached snapshot on the cluster.

    Annotations
    @deprecated
    Deprecated

    (Since version 12.0)

Inherited from AnyRef

Inherited from Any

Ungrouped