class DeltaLog extends Checkpoints with MetadataCleanup with LogStoreProvider with SnapshotManagement with DeltaFileFormat with ProvidesUniFormConverters with ReadChecksum
Used to query the current state of the log as well as modify it by adding new atomic collections of actions.
Internally, this class implements an optimistic concurrency control algorithm to handle multiple readers or writers. Any single read is guaranteed to see a consistent snapshot of the table.
- Alphabetic
- By Inheritance
- DeltaLog
- ReadChecksum
- ProvidesUniFormConverters
- DeltaFileFormat
- SnapshotManagement
- LogStoreProvider
- MetadataCleanup
- Checkpoints
- DeltaLogging
- DatabricksLogging
- DeltaProgressReporter
- LoggingShims
- Logging
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
-
implicit
class
LogStringContext extends AnyRef
- Definition Classes
- LoggingShims
-
class
SidecarDeletionMetrics extends AnyRef
Class to track metrics related to V2 Checkpoint Sidecars deletion.
Class to track metrics related to V2 Checkpoint Sidecars deletion.
- Attributes
- protected
- Definition Classes
- MetadataCleanup
-
class
V2CompatCheckpointMetrics extends AnyRef
Class to track metrics related to V2 Compatibility checkpoint creation.
Class to track metrics related to V2 Compatibility checkpoint creation.
- Attributes
- protected[delta]
- Definition Classes
- MetadataCleanup
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
val
LAST_CHECKPOINT: Path
The path to the file that holds metadata about the most recent checkpoint.
The path to the file that holds metadata about the most recent checkpoint.
- Definition Classes
- Checkpoints
-
lazy val
_hudiConverter: UniversalFormatConverter
- Attributes
- protected
- Definition Classes
- ProvidesUniFormConverters
-
lazy val
_icebergConverter: UniversalFormatConverter
Helper trait to instantiate the icebergConverter member variable of the DeltaLog.
Helper trait to instantiate the icebergConverter member variable of the DeltaLog. We do this through reflection so that delta-spark doesn't have a compile-time dependency on the shaded iceberg module.
- Attributes
- protected
- Definition Classes
- ProvidesUniFormConverters
- val allOptions: Map[String, String]
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
assertTableFeaturesMatchMetadata(targetProtocol: Protocol, targetMetadata: Metadata): Unit
Asserts that the table's protocol enabled all features that are active in the metadata.
Asserts that the table's protocol enabled all features that are active in the metadata.
A mismatch shouldn't happen when the table has gone through a proper write process because we require all active features during writes. However, other clients may void this guarantee.
- def buildHadoopFsRelationWithFileIndex(snapshot: SnapshotDescriptor, fileIndex: TahoeFileIndex, bucketSpec: Option[BucketSpec], dropNullTypeColumnsFromSchema: Boolean = true): HadoopFsRelation
-
def
checkLogStoreConfConflicts(sparkConf: SparkConf): Unit
- Definition Classes
- LogStoreProvider
-
def
checkRequiredConfigurations(): Unit
Verify the required Spark conf for delta Throw
DeltaErrors.configureSparkSessionWithExtensionAndCatalogexception ifspark.sql.catalog.spark_catalogconfig is missing.Verify the required Spark conf for delta Throw
DeltaErrors.configureSparkSessionWithExtensionAndCatalogexception ifspark.sql.catalog.spark_catalogconfig is missing. We do not check forspark.sql.extensionsbecause DeltaSparkSessionExtension can alternatively be activated using the.withExtension()API. This check can be disabled by setting DELTA_CHECK_REQUIRED_SPARK_CONF to false.- Attributes
- protected
-
def
checkpoint(snapshotToCheckpoint: Snapshot, catalogTableOpt: Option[CatalogTable] = None): Unit
Creates a checkpoint using snapshotToCheckpoint.
Creates a checkpoint using snapshotToCheckpoint. By default it uses the current log version. Note that this function captures and logs all exceptions, since the checkpoint shouldn't fail the overall commit operation.
- Definition Classes
- Checkpoints
-
def
checkpointAndCleanUpDeltaLog(snapshotToCheckpoint: Snapshot, catalogTableOpt: Option[CatalogTable] = None): Unit
- Definition Classes
- Checkpoints
-
def
checkpointInterval(metadata: Metadata): Int
Returns the checkpoint interval for this log.
Returns the checkpoint interval for this log. Not transactional.
- Definition Classes
- Checkpoints
- val clock: Clock
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
createCheckpointAtVersion(version: Long): Unit
Creates a checkpoint at given version.
Creates a checkpoint at given version. Does not invoke metadata cleanup as part of it.
- version
- version at which we want to create a checkpoint.
- Definition Classes
- Checkpoints
-
def
createDataFrame(snapshot: SnapshotDescriptor, addFiles: Seq[AddFile], isStreaming: Boolean = false, actionTypeOpt: Option[String] = None): DataFrame
Returns a org.apache.spark.sql.DataFrame containing the new files within the specified version range.
-
def
createLogDirectoriesIfNotExists(): Unit
Creates the log directory and commit directory if it does not exist.
-
def
createLogSegment(versionToLoad: Option[Long] = None, oldCheckpointProviderOpt: Option[UninitializedCheckpointProvider] = None, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient] = None, catalogTableOpt: Option[CatalogTable] = None, lastCheckpointInfo: Option[LastCheckpointInfo] = None): Option[LogSegment]
Get a list of files that can be used to compute a Snapshot at version
versionToLoad, IfversionToLoadis not provided, will generate the list of files that are needed to load the latest version of the Delta table.Get a list of files that can be used to compute a Snapshot at version
versionToLoad, IfversionToLoadis not provided, will generate the list of files that are needed to load the latest version of the Delta table. This method also performs checks to ensure that the delta files are contiguous.- versionToLoad
A specific version to load. Typically used with time travel and the Delta streaming source. If not provided, we will try to load the latest version of the table.
- oldCheckpointProviderOpt
The CheckpointProvider from the previous snapshot. This is used as a start version for the listing when
startCheckpointis unavailable. This is also used to initialize the LogSegment.- tableCommitCoordinatorClientOpt
the optional commit-coordinator client to use for fetching un-backfilled commits.
- catalogTableOpt
the optional catalog table to pass to the commit coordinator client.
- lastCheckpointInfo
LastCheckpointInfo from the _last_checkpoint. This could be used to initialize the Snapshot's LogSegment.
- returns
Some LogSegment to build a Snapshot if files do exist after the given startCheckpoint. None, if the directory was missing or empty.
- Attributes
- protected
- Definition Classes
- SnapshotManagement
-
def
createLogStore(sparkConf: SparkConf, hadoopConf: Configuration): LogStore
- Definition Classes
- LogStoreProvider
-
def
createLogStore(spark: SparkSession): LogStore
- Definition Classes
- LogStoreProvider
-
def
createRelation(partitionFilters: Seq[Expression] = Nil, snapshotToUseOpt: Option[Snapshot] = None, catalogTableOpt: Option[CatalogTable] = None, isTimeTravelQuery: Boolean = false): BaseRelation
Returns a BaseRelation that contains all of the data present in the table.
Returns a BaseRelation that contains all of the data present in the table. This relation will be continually updated as files are added or removed from the table. However, new BaseRelation must be requested in order to see changes to the schema.
-
def
createSinglePartCheckpointForBackwardCompat(snapshotToCleanup: Snapshot, metrics: V2CompatCheckpointMetrics): Unit
Helper method to create a compatibility classic single file checkpoint file for this table.
Helper method to create a compatibility classic single file checkpoint file for this table. This is needed so that any legacy reader which do not understand V2CheckpointTableFeature could read the legacy classic checkpoint file and fail gracefully with Protocol requirement failure.
- Attributes
- protected[delta]
- Definition Classes
- MetadataCleanup
-
def
createSnapshot(initSegment: LogSegment, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], checksumOpt: Option[VersionChecksum]): Snapshot
- Attributes
- protected
- Definition Classes
- SnapshotManagement
-
def
createSnapshotAfterCommit(initSegment: LogSegment, newChecksumOpt: Option[VersionChecksum], tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], committedVersion: Long): Snapshot
Creates a snapshot for a new delta commit.
Creates a snapshot for a new delta commit.
- Attributes
- protected
- Definition Classes
- SnapshotManagement
-
def
createSnapshotAtInit(initialCatalogTable: Option[CatalogTable]): Unit
Load the Snapshot for this Delta table at initialization.
Load the Snapshot for this Delta table at initialization. This method uses the
lastCheckpointfile as a hint on where to start listing the transaction log directory. If the _delta_log directory doesn't exist, this method will return anInitialSnapshot.- Attributes
- protected
- Definition Classes
- SnapshotManagement
-
def
createSnapshotFromGivenOrEquivalentLogSegment(initSegment: LogSegment, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable])(snapshotCreator: (LogSegment) ⇒ Snapshot): Snapshot
Create a Snapshot from the given LogSegment.
Create a Snapshot from the given LogSegment. If failing to create the snapshot, we will search an equivalent LogSegment using a different checkpoint and retry up to DeltaSQLConf.DELTA_SNAPSHOT_LOADING_MAX_RETRIES times.
- Attributes
- protected
- Definition Classes
- SnapshotManagement
-
val
currentSnapshot: CapturedSnapshot
Cached latest snapshot.
Cached latest snapshot. This is initialized in
createSnapshotAtInit- Attributes
- protected
- Definition Classes
- SnapshotManagement
- Annotations
- @volatile()
-
val
dataPath: Path
- Definition Classes
- DeltaLog → Checkpoints
-
val
defaultLogStoreClass: String
- Definition Classes
- LogStoreProvider
-
def
deltaAssert(check: ⇒ Boolean, name: String, msg: String, deltaLog: DeltaLog = null, data: AnyRef = null, path: Option[Path] = None): Unit
Helper method to check invariants in Delta code.
Helper method to check invariants in Delta code. Fails when running in tests, records a delta assertion event and logs a warning otherwise.
- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
deltaRetentionMillis(metadata: Metadata): Long
Returns the duration in millis for how long to keep around obsolete logs.
Returns the duration in millis for how long to keep around obsolete logs. We may keep logs beyond this duration until the next calendar day to avoid constantly creating checkpoints.
- Definition Classes
- MetadataCleanup
-
def
doLogCleanup(snapshotToCleanup: Snapshot): Unit
- Definition Classes
- MetadataCleanup
-
def
enableExpiredLogCleanup(metadata: Metadata): Boolean
Whether to clean up expired log files and checkpoints.
Whether to clean up expired log files and checkpoints.
- Definition Classes
- MetadataCleanup
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
fileFormat(protocol: Protocol, metadata: Metadata): FileFormat
Build the underlying Spark
FileFormatof the Delta table with specified metadata.Build the underlying Spark
FileFormatof the Delta table with specified metadata.With column mapping, some properties of the underlying file format might change during transaction, so if possible, we should always pass in the latest transaction's metadata instead of one from a past snapshot.
- Definition Classes
- DeltaFileFormat
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
findEarliestReliableCheckpoint: Option[Long]
Finds a checkpoint such that we are able to construct table snapshot for all versions at or greater than the checkpoint version returned.
Finds a checkpoint such that we are able to construct table snapshot for all versions at or greater than the checkpoint version returned.
- Definition Classes
- MetadataCleanup
-
def
getChangeLogFiles(startVersion: Long, failOnDataLoss: Boolean = false): Iterator[(Long, FileStatus)]
Get access to all actions starting from "startVersion" (inclusive) via FileStatus.
Get access to all actions starting from "startVersion" (inclusive) via FileStatus. If
startVersiondoesn't exist, return an empty Iterator. Callers are encouraged to use the other override which takes the endVersion if available to avoid I/O and improve performance of this method. -
def
getChanges(startVersion: Long, failOnDataLoss: Boolean = false): Iterator[(Long, Seq[Action])]
Get all actions starting from "startVersion" (inclusive).
Get all actions starting from "startVersion" (inclusive). If
startVersiondoesn't exist, return an empty Iterator. Callers are encouraged to use the other override which takes the endVersion if available to avoid I/O and improve performance of this method. -
def
getCheckpointVersion(lastCheckpointInfoOpt: Option[LastCheckpointInfo], oldCheckpointProviderOpt: Option[UninitializedCheckpointProvider]): Long
Returns the last known checkpoint version based on LastCheckpointInfo or CheckpointProvider.
Returns the last known checkpoint version based on LastCheckpointInfo or CheckpointProvider. Returns -1 if both the info is not available.
- Attributes
- protected
- Definition Classes
- SnapshotManagement
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getCommonTags(deltaLog: DeltaLog, tahoeId: String): Map[TagDefinition, String]
- Definition Classes
- DeltaLogging
-
def
getDeltaFileChecksumOrCheckpointVersion(filePath: Path): Long
Helper function for getting the version of a checkpoint or a commit.
Helper function for getting the version of a checkpoint or a commit.
- Definition Classes
- MetadataCleanup
-
def
getErrorData(e: Throwable): Map[String, Any]
- Definition Classes
- DeltaLogging
-
def
getLatestCompleteCheckpointFromList(instances: Array[CheckpointInstance], notLaterThanVersion: Option[Long] = None): Option[CheckpointInstance]
Given a list of checkpoint files, pick the latest complete checkpoint instance which is not later than
notLaterThan.Given a list of checkpoint files, pick the latest complete checkpoint instance which is not later than
notLaterThan.- Attributes
- protected[delta]
- Definition Classes
- Checkpoints
-
def
getLogSegmentAfterCommit(tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], oldCheckpointProvider: UninitializedCheckpointProvider): LogSegment
- Attributes
- protected[delta]
- Definition Classes
- SnapshotManagement
-
def
getLogSegmentAfterCommit(committedVersion: Long, newChecksumOpt: Option[VersionChecksum], preCommitLogSegment: LogSegment, commit: Commit, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], oldCheckpointProvider: CheckpointProvider): LogSegment
Used to compute the LogSegment after a commit, by adding the delta file with the specified version to the preCommitLogSegment (which must match the immediately preceding version).
Used to compute the LogSegment after a commit, by adding the delta file with the specified version to the preCommitLogSegment (which must match the immediately preceding version).
- Attributes
- protected[delta]
- Definition Classes
- SnapshotManagement
-
def
getLogSegmentForVersion(versionToLoad: Option[Long], files: Option[Array[FileStatus]], validateLogSegmentWithoutCompactedDeltas: Boolean, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], oldCheckpointProviderOpt: Option[UninitializedCheckpointProvider], lastCheckpointInfo: Option[LastCheckpointInfo]): Option[LogSegment]
Helper function for the getLogSegmentForVersion above.
Helper function for the getLogSegmentForVersion above. Called with a provided files list, and will then try to construct a new LogSegment using that. *Note*: If table is a coordinated-commits table, the commit-coordinator MUST be passed to correctly list the commits.
- Attributes
- protected
- Definition Classes
- SnapshotManagement
-
def
getLogStoreConfValue(key: String, sparkConf: SparkConf): Option[String]
We accept keys both with and without the
spark.prefix to maintain compatibility across the Delta ecosystemWe accept keys both with and without the
spark.prefix to maintain compatibility across the Delta ecosystem- key
the spark-prefixed key to access
- Definition Classes
- LogStoreProvider
-
def
getSnapshotAt(version: Long, lastCheckpointHint: Option[CheckpointInstance] = None, catalogTableOpt: Option[CatalogTable] = None): Snapshot
Get the snapshot at
version.Get the snapshot at
version.- Definition Classes
- SnapshotManagement
-
def
getSnapshotForLogSegmentInternal(previousSnapshotOpt: Option[Snapshot], segmentOpt: Option[LogSegment], tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], isAsync: Boolean): Snapshot
Creates a Snapshot for the given
segmentOptCreates a Snapshot for the given
segmentOpt- Attributes
- protected
- Definition Classes
- SnapshotManagement
-
def
getUpdatedLogSegment(oldLogSegment: LogSegment, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable]): (LogSegment, Seq[FileStatus])
Get the newest logSegment, using the previous logSegment as a hint.
Get the newest logSegment, using the previous logSegment as a hint. This is faster than doing a full update, but it won't work if the table's log directory was replaced.
- Definition Classes
- SnapshotManagement
-
def
getUpdatedSnapshot(oldSnapshotOpt: Option[Snapshot], initialSegmentForNewSnapshot: Option[LogSegment], initialTableCommitCoordinatorClient: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], isAsync: Boolean): Snapshot
Updates and installs a new snapshot in the
currentSnapshot.Updates and installs a new snapshot in the
currentSnapshot. This method takes care of recursively creating new snapshots if the commit-coordinator has changed.- oldSnapshotOpt
The previous snapshot, if any.
- initialSegmentForNewSnapshot
the log segment constructed for the new snapshot
- initialTableCommitCoordinatorClient
the commit-coordinator used for constructing the
initialSegmentForNewSnapshot- catalogTableOpt
the optional catalog table to pass to the commit coordinator client.
- isAsync
Whether the update is async.
- returns
The new snapshot.
- Attributes
- protected
- Definition Classes
- SnapshotManagement
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
lazy val
history: DeltaHistoryManager
Delta History Manager containing version and commit history.
-
def
hudiConverter: UniversalFormatConverter
- Definition Classes
- ProvidesUniFormConverters
-
def
icebergConverter: UniversalFormatConverter
- Definition Classes
- ProvidesUniFormConverters
-
def
identifyAndDeleteUnreferencedSidecarFiles(snapshotToCleanup: Snapshot, checkpointRetention: Long, metrics: SidecarDeletionMetrics): Unit
Deletes any unreferenced files from the sidecar directory
_delta_log/_sidecarDeletes any unreferenced files from the sidecar directory
_delta_log/_sidecar- Attributes
- protected
- Definition Classes
- MetadataCleanup
-
def
indexToRelation(index: DeltaLogFileIndex, schema: StructType = Action.logSchema): LogicalRelation
Creates a LogicalRelation for a given DeltaLogFileIndex, with all necessary file source options taken from the Delta Log.
Creates a LogicalRelation for a given DeltaLogFileIndex, with all necessary file source options taken from the Delta Log. All reads of Delta metadata files should use this method.
-
def
initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
installSnapshot(newSnapshot: Snapshot, updateTimestamp: Long): Snapshot
Installs the given
newSnapshotas thecurrentSnapshotInstalls the given
newSnapshotas thecurrentSnapshot- Attributes
- protected
- Definition Classes
- SnapshotManagement
-
def
isCurrentlyStale: (Long) ⇒ Boolean
Checks if the given timestamp is outside the current staleness window
Checks if the given timestamp is outside the current staleness window
- Attributes
- protected
- Definition Classes
- SnapshotManagement
-
def
isDeltaCommitOrCheckpointFile(path: Path): Boolean
Returns true if the path is delta log files.
Returns true if the path is delta log files. Delta log files can be delta commit file (e.g., 000000000.json), or checkpoint file. (e.g., 000000001.checkpoint.00001.00003.parquet)
- path
Path of a file
- returns
Boolean Whether the file is delta log files
- Attributes
- protected
- Definition Classes
- SnapshotManagement
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def isSameLogAs(otherLog: DeltaLog): Boolean
-
def
isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
val
lastSeenChecksumFileStatusOpt: Option[FileStatus]
Cached fileStatus for the latest CRC file seen in the deltaLog.
Cached fileStatus for the latest CRC file seen in the deltaLog.
- Attributes
- protected
- Definition Classes
- SnapshotManagement
- Annotations
- @volatile()
-
final
def
listDeltaCompactedDeltaAndCheckpointFiles(startVersion: Long, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], versionToLoad: Option[Long], includeMinorCompactions: Boolean): Option[Array[FileStatus]]
This method is designed to efficiently and reliably list delta, compacted delta, and checkpoint files associated with a Delta Lake table.
This method is designed to efficiently and reliably list delta, compacted delta, and checkpoint files associated with a Delta Lake table. It makes parallel calls to both the file-system and a commit-coordinator (if available), reconciles the results to account for asynchronous backfill operations, and ensures a comprehensive list of file statuses without missing any concurrently backfilled files. *Note*: If table is a coordinated-commits table, the commit-coordinator client MUST be passed to correctly list the commits.
- startVersion
the version to start. Inclusive.
- tableCommitCoordinatorClientOpt
the optional commit-coordinator client to use for fetching un-backfilled commits.
- catalogTableOpt
the optional catalog table to pass to the commit coordinator client.
- versionToLoad
the optional parameter to set the max version we should return. Inclusive.
- includeMinorCompactions
Whether to include minor compaction files in the result
- returns
Some array of files found (possibly empty, if no usable commit files are present), or None if the listing returned no files at all.
- Attributes
- protected
- Definition Classes
- SnapshotManagement
-
def
listDeltaCompactedDeltaCheckpointFilesAndLatestChecksumFile(startVersion: Long, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], versionToLoad: Option[Long], includeMinorCompactions: Boolean): (Option[Array[FileStatus]], Option[FileStatus])
This method is designed to efficiently and reliably list delta, compacted delta, and checkpoint files associated with a Delta Lake table.
This method is designed to efficiently and reliably list delta, compacted delta, and checkpoint files associated with a Delta Lake table. It makes parallel calls to both the file-system and a commit-coordinator (if available), reconciles the results to account for asynchronous backfill operations, and ensures a comprehensive list of file statuses without missing any concurrently backfilled files. *Note*: If table is a coordinated-commits table, the commit coordinator MUST be passed to correctly list the commits. The function also collects the latest checksum file found in the listings and returns it.
- startVersion
the version to start. Inclusive.
- tableCommitCoordinatorClientOpt
the optional commit coordinator to use for fetching un-backfilled commits.
- catalogTableOpt
the optional catalog table to pass to the commit coordinator client.
- versionToLoad
the optional parameter to set the max version we should return. Inclusive.
- includeMinorCompactions
Whether to include minor compaction files in the result
- returns
A tuple where the first element is an array of log files (possibly empty, if no usable log files are found), and the second element is the latest checksum file found which has a version less than or equal to
versionToLoad.
- Attributes
- protected
- Definition Classes
- SnapshotManagement
-
def
listFromOrNone(startVersion: Long): Option[Iterator[FileStatus]]
Returns an iterator containing a list of files found from the provided path
Returns an iterator containing a list of files found from the provided path
- Attributes
- protected
- Definition Classes
- SnapshotManagement
-
def
loadIndex(index: DeltaLogFileIndex, schema: StructType = Action.logSchema): DataFrame
Load the data using the FileIndex.
Load the data using the FileIndex. This allows us to skip many checks that add overhead, e.g. file existence checks, partitioning schema inference.
-
def
loadMetadataFromFile(tries: Int): Option[LastCheckpointInfo]
Loads the checkpoint metadata from the _last_checkpoint file.
Loads the checkpoint metadata from the _last_checkpoint file.
- Attributes
- protected
- Definition Classes
- Checkpoints
-
def
log: Logger
- Attributes
- protected
- Definition Classes
- Logging
-
def
logConsole(line: String): Unit
- Definition Classes
- DatabricksLogging
-
def
logDebug(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logDebug(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logDebug(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logError(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logError(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logInfo(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logInfo(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logName: String
- Attributes
- protected
- Definition Classes
- Logging
-
val
logPath: Path
- Definition Classes
- DeltaLog → ReadChecksum → Checkpoints
-
val
logStoreClassConfKey: String
- Definition Classes
- LogStoreProvider
-
def
logStoreSchemeConfKey(scheme: String): String
- Definition Classes
- LogStoreProvider
-
def
logTrace(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logTrace(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logTrace(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logWarning(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logWarning(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
manuallyLoadCheckpoint(cv: CheckpointInstance): LastCheckpointInfo
Loads the given checkpoint manually to come up with the LastCheckpointInfo
Loads the given checkpoint manually to come up with the LastCheckpointInfo
- Attributes
- protected
- Definition Classes
- Checkpoints
-
def
maxSnapshotLineageLength: Int
The max lineage length of a Snapshot before Delta forces to build a Snapshot from scratch.
The max lineage length of a Snapshot before Delta forces to build a Snapshot from scratch. Delta will build a Snapshot on top of the previous one if it doesn't see a checkpoint. However, there is a race condition that when two writers are writing at the same time, a writer may fail to pick up checkpoints written by another one, and the lineage will grow and finally cause StackOverflowError. Hence we have to force to build a Snapshot from scratch when the lineage length is too large to avoid hitting StackOverflowError.
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
newDeltaHadoopConf(): Configuration
Returns the Hadoop Configuration object which can be used to access the file system.
Returns the Hadoop Configuration object which can be used to access the file system. All Delta code should use this method to create the Hadoop Configuration object, so that the hadoop file system configurations specified in DataFrame options will come into effect.
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- val options: Map[String, String]
-
def
protocolRead(protocol: Protocol): Unit
Asserts that the client is up to date with the protocol and allowed to read the table that is using the given
protocol. -
def
protocolWrite(protocol: Protocol): Unit
Asserts that the client is up to date with the protocol and allowed to write to the table that is using the given
protocol. -
def
recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null, path: Option[Path] = None): Unit
Used to record the occurrence of a single event or report detailed, operation specific statistics.
Used to record the occurrence of a single event or report detailed, operation specific statistics.
- path
Used to log the path of the delta table when
deltaLogis null.
- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A
Used to report the duration as well as the success or failure of an operation on a
deltaLog.Used to report the duration as well as the success or failure of an operation on a
deltaLog.- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordDeltaOperationForTablePath[A](tablePath: String, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A
Used to report the duration as well as the success or failure of an operation on a
tahoePath.Used to report the duration as well as the success or failure of an operation on a
tahoePath.- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
-
def
recordFrameProfile[T](group: String, name: String)(thunk: ⇒ T): T
- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = METRIC_OPERATION_DURATION, silent: Boolean = true)(thunk: ⇒ S): S
- Definition Classes
- DatabricksLogging
-
def
recordProductEvent(metric: MetricDefinition with CentralizableMetric, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
-
def
recordProductUsage(metric: MetricDefinition with CentralizableMetric, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
-
def
recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
-
lazy val
sidecarDirPath: Path
Path to sidecar directory.
Path to sidecar directory. This is intentionally kept
lazy valas otherwise any other constructor codepaths in DeltaLog (e.g. SnapshotManagement etc) will see it as null as they are executed before this line is called. -
val
snapshotLock: ReentrantLock
Use ReentrantLock to allow us to call
lockInterruptiblyUse ReentrantLock to allow us to call
lockInterruptibly- Attributes
- protected
- Definition Classes
- SnapshotManagement
-
def
spark: SparkSession
Return the current Spark session used.
Return the current Spark session used.
- Attributes
- protected
- Definition Classes
- DeltaLog → DeltaFileFormat
-
def
startTransaction(catalogTableOpt: Option[CatalogTable], snapshotOpt: Option[Snapshot] = None): OptimisticTransaction
Returns a new OptimisticTransaction that can be used to read the current state of the log and then commit updates.
Returns a new OptimisticTransaction that can be used to read the current state of the log and then commit updates. The reads and updates will be checked for logical conflicts with any concurrent writes to the log, and post-commit hooks can be used to notify the table's catalog of schema changes, etc.
Note that all reads in a transaction must go through the returned transaction object, and not directly to the DeltaLog otherwise they will not be checked for conflicts.
- catalogTableOpt
The CatalogTable for the table this transaction updates. Passing None asserts this is a path-based table with no catalog entry.
- snapshotOpt
THe Snapshot this transaction should use, if not latest.
-
lazy val
store: LogStore
Used to read and write physical log files and checkpoints.
Used to read and write physical log files and checkpoints.
- Definition Classes
- DeltaLog → ReadChecksum → Checkpoints
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
tableExists: Boolean
Whether a Delta table exists at this directory.
Whether a Delta table exists at this directory. It is okay to use the cached volatile snapshot here, since the worst case is that the table has recently started existing which hasn't been picked up here. If so, any subsequent command that updates the table will see the right value.
-
def
tableId: String
The unique identifier for this table.
-
def
throwNonExistentVersionError(versionToLoad: Long): Unit
- Definition Classes
- SnapshotManagement
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
unsafeLoadMetadataFromFile(): LastCheckpointInfo
Reads the checkpoint metadata from the
_last_checkpointfile.Reads the checkpoint metadata from the
_last_checkpointfile. This method doesn't handle any exceptions that can be thrown, for example IOExceptions thrown when reading the data such as FileNotFoundExceptions which is expected for a new Delta table or JSON deserialization errors.- Attributes
- protected
- Definition Classes
- Checkpoints
-
def
unsafeVolatileSnapshot: Snapshot
Returns the current snapshot.
Returns the current snapshot. This does not automatically
update().WARNING: This is not guaranteed to give you the latest snapshot of the log, nor stay consistent across multiple accesses. If you need the latest snapshot, it is recommended to fetch it using
deltaLog.update(); and save the returned snapshot so it does not unexpectedly change from under you. See how OptimisticTransaction and DeltaScan use the snapshot as examples for write/read paths respectively. This API should only be used in scenarios where any recent snapshot will suffice and an update is undesired, or by internal code that holds the DeltaLog lock to prevent races.- Definition Classes
- SnapshotManagement
-
def
update(stalenessAcceptable: Boolean = false, checkIfUpdatedSinceTs: Option[Long] = None, catalogTableOpt: Option[CatalogTable] = None): Snapshot
Update ActionLog by applying the new delta files if any.
Update ActionLog by applying the new delta files if any.
- stalenessAcceptable
Whether we can accept working with a stale version of the table. If the table has surpassed our staleness tolerance, we will update to the latest state of the table synchronously. If staleness is acceptable, and the table hasn't passed the staleness tolerance, we will kick off a job in the background to update the table state, and can return a stale snapshot in the meantime.
- checkIfUpdatedSinceTs
Skip the update if we've already updated the snapshot since the specified timestamp.
- catalogTableOpt
The catalog table of the current table.
- Definition Classes
- SnapshotManagement
-
def
updateAfterCommit(committedVersion: Long, commit: Commit, newChecksumOpt: Option[VersionChecksum], preCommitLogSegment: LogSegment, catalogTableOpt: Option[CatalogTable]): Snapshot
Called after committing a transaction and updating the state of the table.
Called after committing a transaction and updating the state of the table.
- committedVersion
the version that was committed
- commit
information about the commit file.
- newChecksumOpt
the checksum for the new commit, if available. Usually None, since the commit would have just finished.
- preCommitLogSegment
the log segment of the table prior to commit
- catalogTableOpt
the current catalog table
- Definition Classes
- SnapshotManagement
-
def
updateInternal(isAsync: Boolean, catalogTableOpt: Option[CatalogTable]): Snapshot
Queries the store for new delta files and applies them to the current state.
Queries the store for new delta files and applies them to the current state. Note: the caller should hold
snapshotLockbefore calling this method.- Attributes
- protected
- Definition Classes
- SnapshotManagement
-
def
upgradeProtocol(catalogTable: Option[CatalogTable], snapshot: Snapshot, newVersion: Protocol): Unit
Upgrade the table's protocol version, by default to the maximum recognized reader and writer versions in this Delta release.
Upgrade the table's protocol version, by default to the maximum recognized reader and writer versions in this Delta release. This method only upgrades protocol version, and will fail if the new protocol version is not a superset of the original one used by the snapshot.
-
def
useCompactedDeltasForLogSegment(deltasAndCompactedDeltas: Seq[FileStatus], deltasAfterCheckpoint: Array[FileStatus], latestCommitVersion: Long, checkpointVersionToUse: Long): Array[FileStatus]
- deltasAndCompactedDeltas
- all deltas or compacted deltas which could be used
- deltasAfterCheckpoint
- deltas after the last checkpoint file
- latestCommitVersion
- commit version for which we are trying to create Snapshot for
- checkpointVersionToUse
- underlying checkpoint version to use in Snapshot, -1 if no checkpoint is used.
- returns
Returns a list of deltas/compacted-deltas which can be used to construct the LogSegment instead of
deltasAfterCheckpoint.
- Attributes
- protected
- Definition Classes
- SnapshotManagement
-
def
verifyLogStoreConfs(sparkConf: SparkConf): Unit
Check for conflicting LogStore configs in the spark configuration.
Check for conflicting LogStore configs in the spark configuration.
To maintain compatibility across the Delta ecosystem, we accept keys both with and without the "spark." prefix. This means for setting the class conf, we accept both "spark.delta.logStore.class" and "delta.logStore.class" and for scheme confs we accept both "spark.delta.logStore.${scheme}.impl" and "delta.logStore.${scheme}.impl"
If a conf is set both with and without the spark prefix, it must be set to the same value, otherwise we throw an error.
- Definition Classes
- LogStoreProvider
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
withCheckpointExceptionHandling(deltaLog: DeltaLog, opType: String)(thunk: ⇒ Unit): Unit
Catch non-fatal exceptions related to checkpointing, since the checkpoint is written after the commit has completed.
Catch non-fatal exceptions related to checkpointing, since the checkpoint is written after the commit has completed. From the perspective of the user, the commit has completed successfully. However, throw if this is in a testing environment - that way any breaking changes can be caught in unit tests.
- Attributes
- protected
- Definition Classes
- Checkpoints
-
def
withNewTransaction[T](catalogTableOpt: Option[CatalogTable], snapshotOpt: Option[Snapshot] = None)(thunk: (OptimisticTransaction) ⇒ T): T
Execute a piece of code within a new OptimisticTransaction.
Execute a piece of code within a new OptimisticTransaction. Reads/write sets will be recorded for this table, and all other tables will be read at a snapshot that is pinned on the first access.
- catalogTableOpt
The CatalogTable for the table this transaction updates. Passing None asserts this is a path-based table with no catalog entry.
- snapshotOpt
THe Snapshot this transaction should use, if not latest.
- Note
This uses thread-local variable to make the active transaction visible. So do not use multi-threaded code in the provided thunk.
-
def
withSnapshotLockInterruptibly[T](body: ⇒ T): T
Run
bodyinsidesnapshotLocklock usinglockInterruptiblyso that the thread can be interrupted when waiting for the lock.Run
bodyinsidesnapshotLocklock usinglockInterruptiblyso that the thread can be interrupted when waiting for the lock.- Definition Classes
- SnapshotManagement
-
def
withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: ⇒ T): T
Report a log to indicate some command is running.
Report a log to indicate some command is running.
- Definition Classes
- DeltaProgressReporter
-
def
writeCheckpointFiles(snapshotToCheckpoint: Snapshot, catalogTableOpt: Option[CatalogTable] = None): LastCheckpointInfo
- Attributes
- protected
- Definition Classes
- Checkpoints
-
def
writeLastCheckpointFile(deltaLog: DeltaLog, lastCheckpointInfo: LastCheckpointInfo, addChecksum: Boolean): Unit
- Attributes
- protected[delta]
- Definition Classes
- Checkpoints
Deprecated Value Members
-
def
checkpoint(): Unit
Creates a checkpoint using the default snapshot.
Creates a checkpoint using the default snapshot.
WARNING: This API is being deprecated, and will be removed in future versions. Please use the checkpoint(Snapshot) function below to write checkpoints to the delta log.
- Definition Classes
- Checkpoints
- Annotations
- @deprecated
- Deprecated
(Since version 12.0) This method is deprecated and will be removed in future versions.
-
def
snapshot: Snapshot
WARNING: This API is unsafe and deprecated.
WARNING: This API is unsafe and deprecated. It will be removed in future versions. Use the above unsafeVolatileSnapshot to get the most recently cached snapshot on the cluster.
- Definition Classes
- SnapshotManagement
- Annotations
- @deprecated
- Deprecated
(Since version 12.0)
-
def
startTransaction(): OptimisticTransaction
Legacy/compat overload that does not require catalog table information.
Legacy/compat overload that does not require catalog table information. Avoid prod use.
- Annotations
- @deprecated
- Deprecated
(Since version 3.0) Please use the CatalogTable overload instead
-
def
withNewTransaction[T](thunk: (OptimisticTransaction) ⇒ T): T
Legacy/compat overload that does not require catalog table information.
Legacy/compat overload that does not require catalog table information. Avoid prod use.
- Annotations
- @deprecated
- Deprecated
(Since version 3.0) Please use the CatalogTable overload instead