Packages

class DeltaLog extends Checkpoints with MetadataCleanup with LogStoreProvider with SnapshotManagement with DeltaFileFormat with ProvidesUniFormConverters with ReadChecksum

Used to query the current state of the log as well as modify it by adding new atomic collections of actions.

Internally, this class implements an optimistic concurrency control algorithm to handle multiple readers or writers. Any single read is guaranteed to see a consistent snapshot of the table.

Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DeltaLog
  2. ReadChecksum
  3. ProvidesUniFormConverters
  4. DeltaFileFormat
  5. SnapshotManagement
  6. LogStoreProvider
  7. MetadataCleanup
  8. Checkpoints
  9. DeltaLogging
  10. DatabricksLogging
  11. DeltaProgressReporter
  12. LoggingShims
  13. Logging
  14. AnyRef
  15. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. implicit class LogStringContext extends AnyRef
    Definition Classes
    LoggingShims
  2. class SidecarDeletionMetrics extends AnyRef

    Class to track metrics related to V2 Checkpoint Sidecars deletion.

    Class to track metrics related to V2 Checkpoint Sidecars deletion.

    Attributes
    protected
    Definition Classes
    MetadataCleanup
  3. class V2CompatCheckpointMetrics extends AnyRef

    Class to track metrics related to V2 Compatibility checkpoint creation.

    Class to track metrics related to V2 Compatibility checkpoint creation.

    Attributes
    protected[delta]
    Definition Classes
    MetadataCleanup

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. val LAST_CHECKPOINT: Path

    The path to the file that holds metadata about the most recent checkpoint.

    The path to the file that holds metadata about the most recent checkpoint.

    Definition Classes
    Checkpoints
  5. lazy val _hudiConverter: UniversalFormatConverter
    Attributes
    protected
    Definition Classes
    ProvidesUniFormConverters
  6. lazy val _icebergConverter: UniversalFormatConverter

    Helper trait to instantiate the icebergConverter member variable of the DeltaLog.

    Helper trait to instantiate the icebergConverter member variable of the DeltaLog. We do this through reflection so that delta-spark doesn't have a compile-time dependency on the shaded iceberg module.

    Attributes
    protected
    Definition Classes
    ProvidesUniFormConverters
  7. val allOptions: Map[String, String]
  8. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  9. def assertTableFeaturesMatchMetadata(targetProtocol: Protocol, targetMetadata: Metadata): Unit

    Asserts that the table's protocol enabled all features that are active in the metadata.

    Asserts that the table's protocol enabled all features that are active in the metadata.

    A mismatch shouldn't happen when the table has gone through a proper write process because we require all active features during writes. However, other clients may void this guarantee.

  10. def buildHadoopFsRelationWithFileIndex(snapshot: SnapshotDescriptor, fileIndex: TahoeFileIndex, bucketSpec: Option[BucketSpec], dropNullTypeColumnsFromSchema: Boolean = true): HadoopFsRelation
  11. def checkLogStoreConfConflicts(sparkConf: SparkConf): Unit
    Definition Classes
    LogStoreProvider
  12. def checkRequiredConfigurations(): Unit

    Verify the required Spark conf for delta Throw DeltaErrors.configureSparkSessionWithExtensionAndCatalog exception if spark.sql.catalog.spark_catalog config is missing.

    Verify the required Spark conf for delta Throw DeltaErrors.configureSparkSessionWithExtensionAndCatalog exception if spark.sql.catalog.spark_catalog config is missing. We do not check for spark.sql.extensions because DeltaSparkSessionExtension can alternatively be activated using the .withExtension() API. This check can be disabled by setting DELTA_CHECK_REQUIRED_SPARK_CONF to false.

    Attributes
    protected
  13. def checkpoint(snapshotToCheckpoint: Snapshot, catalogTableOpt: Option[CatalogTable] = None): Unit

    Creates a checkpoint using snapshotToCheckpoint.

    Creates a checkpoint using snapshotToCheckpoint. By default it uses the current log version. Note that this function captures and logs all exceptions, since the checkpoint shouldn't fail the overall commit operation.

    Definition Classes
    Checkpoints
  14. def checkpointAndCleanUpDeltaLog(snapshotToCheckpoint: Snapshot, catalogTableOpt: Option[CatalogTable] = None): Unit
    Definition Classes
    Checkpoints
  15. def checkpointInterval(metadata: Metadata): Int

    Returns the checkpoint interval for this log.

    Returns the checkpoint interval for this log. Not transactional.

    Definition Classes
    Checkpoints
  16. val clock: Clock
  17. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  18. def createCheckpointAtVersion(version: Long): Unit

    Creates a checkpoint at given version.

    Creates a checkpoint at given version. Does not invoke metadata cleanup as part of it.

    version

    - version at which we want to create a checkpoint.

    Definition Classes
    Checkpoints
  19. def createDataFrame(snapshot: SnapshotDescriptor, addFiles: Seq[AddFile], isStreaming: Boolean = false, actionTypeOpt: Option[String] = None): DataFrame

    Returns a org.apache.spark.sql.DataFrame containing the new files within the specified version range.

  20. def createLogDirectoriesIfNotExists(): Unit

    Creates the log directory and commit directory if it does not exist.

  21. def createLogSegment(versionToLoad: Option[Long] = None, oldCheckpointProviderOpt: Option[UninitializedCheckpointProvider] = None, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient] = None, catalogTableOpt: Option[CatalogTable] = None, lastCheckpointInfo: Option[LastCheckpointInfo] = None): Option[LogSegment]

    Get a list of files that can be used to compute a Snapshot at version versionToLoad, If versionToLoad is not provided, will generate the list of files that are needed to load the latest version of the Delta table.

    Get a list of files that can be used to compute a Snapshot at version versionToLoad, If versionToLoad is not provided, will generate the list of files that are needed to load the latest version of the Delta table. This method also performs checks to ensure that the delta files are contiguous.

    versionToLoad

    A specific version to load. Typically used with time travel and the Delta streaming source. If not provided, we will try to load the latest version of the table.

    oldCheckpointProviderOpt

    The CheckpointProvider from the previous snapshot. This is used as a start version for the listing when startCheckpoint is unavailable. This is also used to initialize the LogSegment.

    tableCommitCoordinatorClientOpt

    the optional commit-coordinator client to use for fetching un-backfilled commits.

    catalogTableOpt

    the optional catalog table to pass to the commit coordinator client.

    lastCheckpointInfo

    LastCheckpointInfo from the _last_checkpoint. This could be used to initialize the Snapshot's LogSegment.

    returns

    Some LogSegment to build a Snapshot if files do exist after the given startCheckpoint. None, if the directory was missing or empty.

    Attributes
    protected
    Definition Classes
    SnapshotManagement
  22. def createLogStore(sparkConf: SparkConf, hadoopConf: Configuration): LogStore
    Definition Classes
    LogStoreProvider
  23. def createLogStore(spark: SparkSession): LogStore
    Definition Classes
    LogStoreProvider
  24. def createRelation(partitionFilters: Seq[Expression] = Nil, snapshotToUseOpt: Option[Snapshot] = None, catalogTableOpt: Option[CatalogTable] = None, isTimeTravelQuery: Boolean = false): BaseRelation

    Returns a BaseRelation that contains all of the data present in the table.

    Returns a BaseRelation that contains all of the data present in the table. This relation will be continually updated as files are added or removed from the table. However, new BaseRelation must be requested in order to see changes to the schema.

  25. def createSinglePartCheckpointForBackwardCompat(snapshotToCleanup: Snapshot, metrics: V2CompatCheckpointMetrics): Unit

    Helper method to create a compatibility classic single file checkpoint file for this table.

    Helper method to create a compatibility classic single file checkpoint file for this table. This is needed so that any legacy reader which do not understand V2CheckpointTableFeature could read the legacy classic checkpoint file and fail gracefully with Protocol requirement failure.

    Attributes
    protected[delta]
    Definition Classes
    MetadataCleanup
  26. def createSnapshot(initSegment: LogSegment, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], checksumOpt: Option[VersionChecksum]): Snapshot
    Attributes
    protected
    Definition Classes
    SnapshotManagement
  27. def createSnapshotAfterCommit(initSegment: LogSegment, newChecksumOpt: Option[VersionChecksum], tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], committedVersion: Long): Snapshot

    Creates a snapshot for a new delta commit.

    Creates a snapshot for a new delta commit.

    Attributes
    protected
    Definition Classes
    SnapshotManagement
  28. def createSnapshotAtInit(initialCatalogTable: Option[CatalogTable]): Unit

    Load the Snapshot for this Delta table at initialization.

    Load the Snapshot for this Delta table at initialization. This method uses the lastCheckpoint file as a hint on where to start listing the transaction log directory. If the _delta_log directory doesn't exist, this method will return an InitialSnapshot.

    Attributes
    protected
    Definition Classes
    SnapshotManagement
  29. def createSnapshotFromGivenOrEquivalentLogSegment(initSegment: LogSegment, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable])(snapshotCreator: (LogSegment) ⇒ Snapshot): Snapshot

    Create a Snapshot from the given LogSegment.

    Create a Snapshot from the given LogSegment. If failing to create the snapshot, we will search an equivalent LogSegment using a different checkpoint and retry up to DeltaSQLConf.DELTA_SNAPSHOT_LOADING_MAX_RETRIES times.

    Attributes
    protected
    Definition Classes
    SnapshotManagement
  30. val currentSnapshot: CapturedSnapshot

    Cached latest snapshot.

    Cached latest snapshot. This is initialized in createSnapshotAtInit

    Attributes
    protected
    Definition Classes
    SnapshotManagement
    Annotations
    @volatile()
  31. val dataPath: Path
    Definition Classes
    DeltaLogCheckpoints
  32. val defaultLogStoreClass: String
    Definition Classes
    LogStoreProvider
  33. def deltaAssert(check: ⇒ Boolean, name: String, msg: String, deltaLog: DeltaLog = null, data: AnyRef = null, path: Option[Path] = None): Unit

    Helper method to check invariants in Delta code.

    Helper method to check invariants in Delta code. Fails when running in tests, records a delta assertion event and logs a warning otherwise.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  34. def deltaRetentionMillis(metadata: Metadata): Long

    Returns the duration in millis for how long to keep around obsolete logs.

    Returns the duration in millis for how long to keep around obsolete logs. We may keep logs beyond this duration until the next calendar day to avoid constantly creating checkpoints.

    Definition Classes
    MetadataCleanup
  35. def doLogCleanup(snapshotToCleanup: Snapshot): Unit
    Definition Classes
    MetadataCleanup
  36. def enableExpiredLogCleanup(metadata: Metadata): Boolean

    Whether to clean up expired log files and checkpoints.

    Whether to clean up expired log files and checkpoints.

    Definition Classes
    MetadataCleanup
  37. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  38. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  39. def fileFormat(protocol: Protocol, metadata: Metadata): FileFormat

    Build the underlying Spark FileFormat of the Delta table with specified metadata.

    Build the underlying Spark FileFormat of the Delta table with specified metadata.

    With column mapping, some properties of the underlying file format might change during transaction, so if possible, we should always pass in the latest transaction's metadata instead of one from a past snapshot.

    Definition Classes
    DeltaFileFormat
  40. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  41. def findEarliestReliableCheckpoint: Option[Long]

    Finds a checkpoint such that we are able to construct table snapshot for all versions at or greater than the checkpoint version returned.

    Finds a checkpoint such that we are able to construct table snapshot for all versions at or greater than the checkpoint version returned.

    Definition Classes
    MetadataCleanup
  42. def getChangeLogFiles(startVersion: Long, failOnDataLoss: Boolean = false): Iterator[(Long, FileStatus)]

    Get access to all actions starting from "startVersion" (inclusive) via FileStatus.

    Get access to all actions starting from "startVersion" (inclusive) via FileStatus. If startVersion doesn't exist, return an empty Iterator. Callers are encouraged to use the other override which takes the endVersion if available to avoid I/O and improve performance of this method.

  43. def getChanges(startVersion: Long, failOnDataLoss: Boolean = false): Iterator[(Long, Seq[Action])]

    Get all actions starting from "startVersion" (inclusive).

    Get all actions starting from "startVersion" (inclusive). If startVersion doesn't exist, return an empty Iterator. Callers are encouraged to use the other override which takes the endVersion if available to avoid I/O and improve performance of this method.

  44. def getCheckpointVersion(lastCheckpointInfoOpt: Option[LastCheckpointInfo], oldCheckpointProviderOpt: Option[UninitializedCheckpointProvider]): Long

    Returns the last known checkpoint version based on LastCheckpointInfo or CheckpointProvider.

    Returns the last known checkpoint version based on LastCheckpointInfo or CheckpointProvider. Returns -1 if both the info is not available.

    Attributes
    protected
    Definition Classes
    SnapshotManagement
  45. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  46. def getCommonTags(deltaLog: DeltaLog, tahoeId: String): Map[TagDefinition, String]
    Definition Classes
    DeltaLogging
  47. def getDeltaFileChecksumOrCheckpointVersion(filePath: Path): Long

    Helper function for getting the version of a checkpoint or a commit.

    Helper function for getting the version of a checkpoint or a commit.

    Definition Classes
    MetadataCleanup
  48. def getErrorData(e: Throwable): Map[String, Any]
    Definition Classes
    DeltaLogging
  49. def getLatestCompleteCheckpointFromList(instances: Array[CheckpointInstance], notLaterThanVersion: Option[Long] = None): Option[CheckpointInstance]

    Given a list of checkpoint files, pick the latest complete checkpoint instance which is not later than notLaterThan.

    Given a list of checkpoint files, pick the latest complete checkpoint instance which is not later than notLaterThan.

    Attributes
    protected[delta]
    Definition Classes
    Checkpoints
  50. def getLogSegmentAfterCommit(tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], oldCheckpointProvider: UninitializedCheckpointProvider): LogSegment
    Attributes
    protected[delta]
    Definition Classes
    SnapshotManagement
  51. def getLogSegmentAfterCommit(committedVersion: Long, newChecksumOpt: Option[VersionChecksum], preCommitLogSegment: LogSegment, commit: Commit, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], oldCheckpointProvider: CheckpointProvider): LogSegment

    Used to compute the LogSegment after a commit, by adding the delta file with the specified version to the preCommitLogSegment (which must match the immediately preceding version).

    Used to compute the LogSegment after a commit, by adding the delta file with the specified version to the preCommitLogSegment (which must match the immediately preceding version).

    Attributes
    protected[delta]
    Definition Classes
    SnapshotManagement
  52. def getLogSegmentForVersion(versionToLoad: Option[Long], files: Option[Array[FileStatus]], validateLogSegmentWithoutCompactedDeltas: Boolean, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], oldCheckpointProviderOpt: Option[UninitializedCheckpointProvider], lastCheckpointInfo: Option[LastCheckpointInfo]): Option[LogSegment]

    Helper function for the getLogSegmentForVersion above.

    Helper function for the getLogSegmentForVersion above. Called with a provided files list, and will then try to construct a new LogSegment using that. *Note*: If table is a coordinated-commits table, the commit-coordinator MUST be passed to correctly list the commits.

    Attributes
    protected
    Definition Classes
    SnapshotManagement
  53. def getLogStoreConfValue(key: String, sparkConf: SparkConf): Option[String]

    We accept keys both with and without the spark. prefix to maintain compatibility across the Delta ecosystem

    We accept keys both with and without the spark. prefix to maintain compatibility across the Delta ecosystem

    key

    the spark-prefixed key to access

    Definition Classes
    LogStoreProvider
  54. def getSnapshotAt(version: Long, lastCheckpointHint: Option[CheckpointInstance] = None, catalogTableOpt: Option[CatalogTable] = None): Snapshot

    Get the snapshot at version.

    Get the snapshot at version.

    Definition Classes
    SnapshotManagement
  55. def getSnapshotForLogSegmentInternal(previousSnapshotOpt: Option[Snapshot], segmentOpt: Option[LogSegment], tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], isAsync: Boolean): Snapshot

    Creates a Snapshot for the given segmentOpt

    Creates a Snapshot for the given segmentOpt

    Attributes
    protected
    Definition Classes
    SnapshotManagement
  56. def getUpdatedLogSegment(oldLogSegment: LogSegment, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable]): (LogSegment, Seq[FileStatus])

    Get the newest logSegment, using the previous logSegment as a hint.

    Get the newest logSegment, using the previous logSegment as a hint. This is faster than doing a full update, but it won't work if the table's log directory was replaced.

    Definition Classes
    SnapshotManagement
  57. def getUpdatedSnapshot(oldSnapshotOpt: Option[Snapshot], initialSegmentForNewSnapshot: Option[LogSegment], initialTableCommitCoordinatorClient: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], isAsync: Boolean): Snapshot

    Updates and installs a new snapshot in the currentSnapshot.

    Updates and installs a new snapshot in the currentSnapshot. This method takes care of recursively creating new snapshots if the commit-coordinator has changed.

    oldSnapshotOpt

    The previous snapshot, if any.

    initialSegmentForNewSnapshot

    the log segment constructed for the new snapshot

    initialTableCommitCoordinatorClient

    the commit-coordinator used for constructing the initialSegmentForNewSnapshot

    catalogTableOpt

    the optional catalog table to pass to the commit coordinator client.

    isAsync

    Whether the update is async.

    returns

    The new snapshot.

    Attributes
    protected
    Definition Classes
    SnapshotManagement
  58. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  59. lazy val history: DeltaHistoryManager

    Delta History Manager containing version and commit history.

  60. def hudiConverter: UniversalFormatConverter
    Definition Classes
    ProvidesUniFormConverters
  61. def icebergConverter: UniversalFormatConverter
    Definition Classes
    ProvidesUniFormConverters
  62. def identifyAndDeleteUnreferencedSidecarFiles(snapshotToCleanup: Snapshot, checkpointRetention: Long, metrics: SidecarDeletionMetrics): Unit

    Deletes any unreferenced files from the sidecar directory _delta_log/_sidecar

    Deletes any unreferenced files from the sidecar directory _delta_log/_sidecar

    Attributes
    protected
    Definition Classes
    MetadataCleanup
  63. def indexToRelation(index: DeltaLogFileIndex, schema: StructType = Action.logSchema): LogicalRelation

    Creates a LogicalRelation for a given DeltaLogFileIndex, with all necessary file source options taken from the Delta Log.

    Creates a LogicalRelation for a given DeltaLogFileIndex, with all necessary file source options taken from the Delta Log. All reads of Delta metadata files should use this method.

  64. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  65. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  66. def installSnapshot(newSnapshot: Snapshot, updateTimestamp: Long): Snapshot

    Installs the given newSnapshot as the currentSnapshot

    Installs the given newSnapshot as the currentSnapshot

    Attributes
    protected
    Definition Classes
    SnapshotManagement
  67. def isCurrentlyStale: (Long) ⇒ Boolean

    Checks if the given timestamp is outside the current staleness window

    Checks if the given timestamp is outside the current staleness window

    Attributes
    protected
    Definition Classes
    SnapshotManagement
  68. def isDeltaCommitOrCheckpointFile(path: Path): Boolean

    Returns true if the path is delta log files.

    Returns true if the path is delta log files. Delta log files can be delta commit file (e.g., 000000000.json), or checkpoint file. (e.g., 000000001.checkpoint.00001.00003.parquet)

    path

    Path of a file

    returns

    Boolean Whether the file is delta log files

    Attributes
    protected
    Definition Classes
    SnapshotManagement
  69. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  70. def isSameLogAs(otherLog: DeltaLog): Boolean
  71. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  72. val lastSeenChecksumFileStatusOpt: Option[FileStatus]

    Cached fileStatus for the latest CRC file seen in the deltaLog.

    Cached fileStatus for the latest CRC file seen in the deltaLog.

    Attributes
    protected
    Definition Classes
    SnapshotManagement
    Annotations
    @volatile()
  73. final def listDeltaCompactedDeltaAndCheckpointFiles(startVersion: Long, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], versionToLoad: Option[Long], includeMinorCompactions: Boolean): Option[Array[FileStatus]]

    This method is designed to efficiently and reliably list delta, compacted delta, and checkpoint files associated with a Delta Lake table.

    This method is designed to efficiently and reliably list delta, compacted delta, and checkpoint files associated with a Delta Lake table. It makes parallel calls to both the file-system and a commit-coordinator (if available), reconciles the results to account for asynchronous backfill operations, and ensures a comprehensive list of file statuses without missing any concurrently backfilled files. *Note*: If table is a coordinated-commits table, the commit-coordinator client MUST be passed to correctly list the commits.

    startVersion

    the version to start. Inclusive.

    tableCommitCoordinatorClientOpt

    the optional commit-coordinator client to use for fetching un-backfilled commits.

    catalogTableOpt

    the optional catalog table to pass to the commit coordinator client.

    versionToLoad

    the optional parameter to set the max version we should return. Inclusive.

    includeMinorCompactions

    Whether to include minor compaction files in the result

    returns

    Some array of files found (possibly empty, if no usable commit files are present), or None if the listing returned no files at all.

    Attributes
    protected
    Definition Classes
    SnapshotManagement
  74. def listDeltaCompactedDeltaCheckpointFilesAndLatestChecksumFile(startVersion: Long, tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient], catalogTableOpt: Option[CatalogTable], versionToLoad: Option[Long], includeMinorCompactions: Boolean): (Option[Array[FileStatus]], Option[FileStatus])

    This method is designed to efficiently and reliably list delta, compacted delta, and checkpoint files associated with a Delta Lake table.

    This method is designed to efficiently and reliably list delta, compacted delta, and checkpoint files associated with a Delta Lake table. It makes parallel calls to both the file-system and a commit-coordinator (if available), reconciles the results to account for asynchronous backfill operations, and ensures a comprehensive list of file statuses without missing any concurrently backfilled files. *Note*: If table is a coordinated-commits table, the commit coordinator MUST be passed to correctly list the commits. The function also collects the latest checksum file found in the listings and returns it.

    startVersion

    the version to start. Inclusive.

    tableCommitCoordinatorClientOpt

    the optional commit coordinator to use for fetching un-backfilled commits.

    catalogTableOpt

    the optional catalog table to pass to the commit coordinator client.

    versionToLoad

    the optional parameter to set the max version we should return. Inclusive.

    includeMinorCompactions

    Whether to include minor compaction files in the result

    returns

    A tuple where the first element is an array of log files (possibly empty, if no usable log files are found), and the second element is the latest checksum file found which has a version less than or equal to versionToLoad.

    Attributes
    protected
    Definition Classes
    SnapshotManagement
  75. def listFromOrNone(startVersion: Long): Option[Iterator[FileStatus]]

    Returns an iterator containing a list of files found from the provided path

    Returns an iterator containing a list of files found from the provided path

    Attributes
    protected
    Definition Classes
    SnapshotManagement
  76. def loadIndex(index: DeltaLogFileIndex, schema: StructType = Action.logSchema): DataFrame

    Load the data using the FileIndex.

    Load the data using the FileIndex. This allows us to skip many checks that add overhead, e.g. file existence checks, partitioning schema inference.

  77. def loadMetadataFromFile(tries: Int): Option[LastCheckpointInfo]

    Loads the checkpoint metadata from the _last_checkpoint file.

    Loads the checkpoint metadata from the _last_checkpoint file.

    Attributes
    protected
    Definition Classes
    Checkpoints
  78. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  79. def logConsole(line: String): Unit
    Definition Classes
    DatabricksLogging
  80. def logDebug(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  81. def logDebug(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  82. def logDebug(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  83. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  84. def logError(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  85. def logError(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  86. def logError(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  87. def logError(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  88. def logInfo(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  89. def logInfo(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  90. def logInfo(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  91. def logInfo(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  92. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  93. val logPath: Path
    Definition Classes
    DeltaLogReadChecksumCheckpoints
  94. val logStoreClassConfKey: String
    Definition Classes
    LogStoreProvider
  95. def logStoreSchemeConfKey(scheme: String): String
    Definition Classes
    LogStoreProvider
  96. def logTrace(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  97. def logTrace(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  98. def logTrace(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  99. def logTrace(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  100. def logWarning(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  101. def logWarning(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  102. def logWarning(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  103. def logWarning(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  104. def manuallyLoadCheckpoint(cv: CheckpointInstance): LastCheckpointInfo

    Loads the given checkpoint manually to come up with the LastCheckpointInfo

    Loads the given checkpoint manually to come up with the LastCheckpointInfo

    Attributes
    protected
    Definition Classes
    Checkpoints
  105. def maxSnapshotLineageLength: Int

    The max lineage length of a Snapshot before Delta forces to build a Snapshot from scratch.

    The max lineage length of a Snapshot before Delta forces to build a Snapshot from scratch. Delta will build a Snapshot on top of the previous one if it doesn't see a checkpoint. However, there is a race condition that when two writers are writing at the same time, a writer may fail to pick up checkpoints written by another one, and the lineage will grow and finally cause StackOverflowError. Hence we have to force to build a Snapshot from scratch when the lineage length is too large to avoid hitting StackOverflowError.

  106. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  107. final def newDeltaHadoopConf(): Configuration

    Returns the Hadoop Configuration object which can be used to access the file system.

    Returns the Hadoop Configuration object which can be used to access the file system. All Delta code should use this method to create the Hadoop Configuration object, so that the hadoop file system configurations specified in DataFrame options will come into effect.

  108. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  109. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  110. val options: Map[String, String]
  111. def protocolRead(protocol: Protocol): Unit

    Asserts that the client is up to date with the protocol and allowed to read the table that is using the given protocol.

  112. def protocolWrite(protocol: Protocol): Unit

    Asserts that the client is up to date with the protocol and allowed to write to the table that is using the given protocol.

  113. def recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null, path: Option[Path] = None): Unit

    Used to record the occurrence of a single event or report detailed, operation specific statistics.

    Used to record the occurrence of a single event or report detailed, operation specific statistics.

    path

    Used to log the path of the delta table when deltaLog is null.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  114. def recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A

    Used to report the duration as well as the success or failure of an operation on a deltaLog.

    Used to report the duration as well as the success or failure of an operation on a deltaLog.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  115. def recordDeltaOperationForTablePath[A](tablePath: String, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A

    Used to report the duration as well as the success or failure of an operation on a tahoePath.

    Used to report the duration as well as the success or failure of an operation on a tahoePath.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  116. def recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
    Definition Classes
    DatabricksLogging
  117. def recordFrameProfile[T](group: String, name: String)(thunk: ⇒ T): T
    Attributes
    protected
    Definition Classes
    DeltaLogging
  118. def recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = METRIC_OPERATION_DURATION, silent: Boolean = true)(thunk: ⇒ S): S
    Definition Classes
    DatabricksLogging
  119. def recordProductEvent(metric: MetricDefinition with CentralizableMetric, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
    Definition Classes
    DatabricksLogging
  120. def recordProductUsage(metric: MetricDefinition with CentralizableMetric, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
    Definition Classes
    DatabricksLogging
  121. def recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
    Definition Classes
    DatabricksLogging
  122. lazy val sidecarDirPath: Path

    Path to sidecar directory.

    Path to sidecar directory. This is intentionally kept lazy val as otherwise any other constructor codepaths in DeltaLog (e.g. SnapshotManagement etc) will see it as null as they are executed before this line is called.

  123. val snapshotLock: ReentrantLock

    Use ReentrantLock to allow us to call lockInterruptibly

    Use ReentrantLock to allow us to call lockInterruptibly

    Attributes
    protected
    Definition Classes
    SnapshotManagement
  124. def spark: SparkSession

    Return the current Spark session used.

    Return the current Spark session used.

    Attributes
    protected
    Definition Classes
    DeltaLogDeltaFileFormat
  125. def startTransaction(catalogTableOpt: Option[CatalogTable], snapshotOpt: Option[Snapshot] = None): OptimisticTransaction

    Returns a new OptimisticTransaction that can be used to read the current state of the log and then commit updates.

    Returns a new OptimisticTransaction that can be used to read the current state of the log and then commit updates. The reads and updates will be checked for logical conflicts with any concurrent writes to the log, and post-commit hooks can be used to notify the table's catalog of schema changes, etc.

    Note that all reads in a transaction must go through the returned transaction object, and not directly to the DeltaLog otherwise they will not be checked for conflicts.

    catalogTableOpt

    The CatalogTable for the table this transaction updates. Passing None asserts this is a path-based table with no catalog entry.

    snapshotOpt

    THe Snapshot this transaction should use, if not latest.

  126. lazy val store: LogStore

    Used to read and write physical log files and checkpoints.

    Used to read and write physical log files and checkpoints.

    Definition Classes
    DeltaLogReadChecksumCheckpoints
  127. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  128. def tableExists: Boolean

    Whether a Delta table exists at this directory.

    Whether a Delta table exists at this directory. It is okay to use the cached volatile snapshot here, since the worst case is that the table has recently started existing which hasn't been picked up here. If so, any subsequent command that updates the table will see the right value.

  129. def tableId: String

    The unique identifier for this table.

  130. def throwNonExistentVersionError(versionToLoad: Long): Unit
    Definition Classes
    SnapshotManagement
  131. def toString(): String
    Definition Classes
    AnyRef → Any
  132. def unsafeLoadMetadataFromFile(): LastCheckpointInfo

    Reads the checkpoint metadata from the _last_checkpoint file.

    Reads the checkpoint metadata from the _last_checkpoint file. This method doesn't handle any exceptions that can be thrown, for example IOExceptions thrown when reading the data such as FileNotFoundExceptions which is expected for a new Delta table or JSON deserialization errors.

    Attributes
    protected
    Definition Classes
    Checkpoints
  133. def unsafeVolatileSnapshot: Snapshot

    Returns the current snapshot.

    Returns the current snapshot. This does not automatically update().

    WARNING: This is not guaranteed to give you the latest snapshot of the log, nor stay consistent across multiple accesses. If you need the latest snapshot, it is recommended to fetch it using deltaLog.update(); and save the returned snapshot so it does not unexpectedly change from under you. See how OptimisticTransaction and DeltaScan use the snapshot as examples for write/read paths respectively. This API should only be used in scenarios where any recent snapshot will suffice and an update is undesired, or by internal code that holds the DeltaLog lock to prevent races.

    Definition Classes
    SnapshotManagement
  134. def update(stalenessAcceptable: Boolean = false, checkIfUpdatedSinceTs: Option[Long] = None, catalogTableOpt: Option[CatalogTable] = None): Snapshot

    Update ActionLog by applying the new delta files if any.

    Update ActionLog by applying the new delta files if any.

    stalenessAcceptable

    Whether we can accept working with a stale version of the table. If the table has surpassed our staleness tolerance, we will update to the latest state of the table synchronously. If staleness is acceptable, and the table hasn't passed the staleness tolerance, we will kick off a job in the background to update the table state, and can return a stale snapshot in the meantime.

    checkIfUpdatedSinceTs

    Skip the update if we've already updated the snapshot since the specified timestamp.

    catalogTableOpt

    The catalog table of the current table.

    Definition Classes
    SnapshotManagement
  135. def updateAfterCommit(committedVersion: Long, commit: Commit, newChecksumOpt: Option[VersionChecksum], preCommitLogSegment: LogSegment, catalogTableOpt: Option[CatalogTable]): Snapshot

    Called after committing a transaction and updating the state of the table.

    Called after committing a transaction and updating the state of the table.

    committedVersion

    the version that was committed

    commit

    information about the commit file.

    newChecksumOpt

    the checksum for the new commit, if available. Usually None, since the commit would have just finished.

    preCommitLogSegment

    the log segment of the table prior to commit

    catalogTableOpt

    the current catalog table

    Definition Classes
    SnapshotManagement
  136. def updateInternal(isAsync: Boolean, catalogTableOpt: Option[CatalogTable]): Snapshot

    Queries the store for new delta files and applies them to the current state.

    Queries the store for new delta files and applies them to the current state. Note: the caller should hold snapshotLock before calling this method.

    Attributes
    protected
    Definition Classes
    SnapshotManagement
  137. def upgradeProtocol(catalogTable: Option[CatalogTable], snapshot: Snapshot, newVersion: Protocol): Unit

    Upgrade the table's protocol version, by default to the maximum recognized reader and writer versions in this Delta release.

    Upgrade the table's protocol version, by default to the maximum recognized reader and writer versions in this Delta release. This method only upgrades protocol version, and will fail if the new protocol version is not a superset of the original one used by the snapshot.

  138. def useCompactedDeltasForLogSegment(deltasAndCompactedDeltas: Seq[FileStatus], deltasAfterCheckpoint: Array[FileStatus], latestCommitVersion: Long, checkpointVersionToUse: Long): Array[FileStatus]

    deltasAndCompactedDeltas

    - all deltas or compacted deltas which could be used

    deltasAfterCheckpoint

    - deltas after the last checkpoint file

    latestCommitVersion

    - commit version for which we are trying to create Snapshot for

    checkpointVersionToUse

    - underlying checkpoint version to use in Snapshot, -1 if no checkpoint is used.

    returns

    Returns a list of deltas/compacted-deltas which can be used to construct the LogSegment instead of deltasAfterCheckpoint.

    Attributes
    protected
    Definition Classes
    SnapshotManagement
  139. def verifyLogStoreConfs(sparkConf: SparkConf): Unit

    Check for conflicting LogStore configs in the spark configuration.

    Check for conflicting LogStore configs in the spark configuration.

    To maintain compatibility across the Delta ecosystem, we accept keys both with and without the "spark." prefix. This means for setting the class conf, we accept both "spark.delta.logStore.class" and "delta.logStore.class" and for scheme confs we accept both "spark.delta.logStore.${scheme}.impl" and "delta.logStore.${scheme}.impl"

    If a conf is set both with and without the spark prefix, it must be set to the same value, otherwise we throw an error.

    Definition Classes
    LogStoreProvider
  140. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  141. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  142. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  143. def withCheckpointExceptionHandling(deltaLog: DeltaLog, opType: String)(thunk: ⇒ Unit): Unit

    Catch non-fatal exceptions related to checkpointing, since the checkpoint is written after the commit has completed.

    Catch non-fatal exceptions related to checkpointing, since the checkpoint is written after the commit has completed. From the perspective of the user, the commit has completed successfully. However, throw if this is in a testing environment - that way any breaking changes can be caught in unit tests.

    Attributes
    protected
    Definition Classes
    Checkpoints
  144. def withNewTransaction[T](catalogTableOpt: Option[CatalogTable], snapshotOpt: Option[Snapshot] = None)(thunk: (OptimisticTransaction) ⇒ T): T

    Execute a piece of code within a new OptimisticTransaction.

    Execute a piece of code within a new OptimisticTransaction. Reads/write sets will be recorded for this table, and all other tables will be read at a snapshot that is pinned on the first access.

    catalogTableOpt

    The CatalogTable for the table this transaction updates. Passing None asserts this is a path-based table with no catalog entry.

    snapshotOpt

    THe Snapshot this transaction should use, if not latest.

    Note

    This uses thread-local variable to make the active transaction visible. So do not use multi-threaded code in the provided thunk.

  145. def withSnapshotLockInterruptibly[T](body: ⇒ T): T

    Run body inside snapshotLock lock using lockInterruptibly so that the thread can be interrupted when waiting for the lock.

    Run body inside snapshotLock lock using lockInterruptibly so that the thread can be interrupted when waiting for the lock.

    Definition Classes
    SnapshotManagement
  146. def withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: ⇒ T): T

    Report a log to indicate some command is running.

    Report a log to indicate some command is running.

    Definition Classes
    DeltaProgressReporter
  147. def writeCheckpointFiles(snapshotToCheckpoint: Snapshot, catalogTableOpt: Option[CatalogTable] = None): LastCheckpointInfo
    Attributes
    protected
    Definition Classes
    Checkpoints
  148. def writeLastCheckpointFile(deltaLog: DeltaLog, lastCheckpointInfo: LastCheckpointInfo, addChecksum: Boolean): Unit
    Attributes
    protected[delta]
    Definition Classes
    Checkpoints

Deprecated Value Members

  1. def checkpoint(): Unit

    Creates a checkpoint using the default snapshot.

    Creates a checkpoint using the default snapshot.

    WARNING: This API is being deprecated, and will be removed in future versions. Please use the checkpoint(Snapshot) function below to write checkpoints to the delta log.

    Definition Classes
    Checkpoints
    Annotations
    @deprecated
    Deprecated

    (Since version 12.0) This method is deprecated and will be removed in future versions.

  2. def snapshot: Snapshot

    WARNING: This API is unsafe and deprecated.

    WARNING: This API is unsafe and deprecated. It will be removed in future versions. Use the above unsafeVolatileSnapshot to get the most recently cached snapshot on the cluster.

    Definition Classes
    SnapshotManagement
    Annotations
    @deprecated
    Deprecated

    (Since version 12.0)

  3. def startTransaction(): OptimisticTransaction

    Legacy/compat overload that does not require catalog table information.

    Legacy/compat overload that does not require catalog table information. Avoid prod use.

    Annotations
    @deprecated
    Deprecated

    (Since version 3.0) Please use the CatalogTable overload instead

  4. def withNewTransaction[T](thunk: (OptimisticTransaction) ⇒ T): T

    Legacy/compat overload that does not require catalog table information.

    Legacy/compat overload that does not require catalog table information. Avoid prod use.

    Annotations
    @deprecated
    Deprecated

    (Since version 3.0) Please use the CatalogTable overload instead

Inherited from ReadChecksum

Inherited from ProvidesUniFormConverters

Inherited from DeltaFileFormat

Inherited from SnapshotManagement

Inherited from LogStoreProvider

Inherited from MetadataCleanup

Inherited from Checkpoints

Inherited from DeltaLogging

Inherited from DatabricksLogging

Inherited from DeltaProgressReporter

Inherited from LoggingShims

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped