Packages

c

org.apache.spark.sql.delta

DummySnapshot

class DummySnapshot extends Snapshot

A dummy snapshot with only metadata and protocol specified. It is used for a targeted table version that does not exist yet before commiting a change. This can be used to create a DataFrame, or to derive the stats schema from an existing Parquet table when converting it to Delta or cloning it to a Delta table prior to the actual snapshot being available after a commit.

Note that the snapshot state reconstruction contains only the protocol and metadata - it does not include add/remove actions, appids, or metadata domains, even if the actual table currently has or will have them in the future.

Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DummySnapshot
  2. Snapshot
  3. ValidateChecksum
  4. DataSkippingReader
  5. DataSkippingReaderBase
  6. ReadsMetadataFields
  7. DeltaScanGenerator
  8. StatisticsCollection
  9. StateCache
  10. SnapshotStateManager
  11. DeltaLogging
  12. DatabricksLogging
  13. DeltaProgressReporter
  14. LoggingShims
  15. Logging
  16. SnapshotDescriptor
  17. AnyRef
  18. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DummySnapshot(logPath: Path, deltaLog: DeltaLog)
  2. new DummySnapshot(logPath: Path, deltaLog: DeltaLog, metadata: Metadata, protocolOpt: Option[Protocol] = None)

    logPath

    the path to transaction log

    deltaLog

    the delta log object

    metadata

    the metadata of the table

    protocolOpt

    the protocol version of the table (optional). If not specified, a default protocol will be computed based on the metadata. This must be explicitly specified when replacing an existing Delta table, otherwise using the metadata to compute the protocol might result in a protocol downgrade for the table.

Type Members

  1. implicit class LogStringContext extends AnyRef
    Definition Classes
    LoggingShims
  2. case class ReconstructedProtocolMetadataAndICT(protocol: Protocol, metadata: Metadata, inCommitTimestamp: Option[Long]) extends Product with Serializable

    Protocol, Metadata, and In-Commit Timestamp retrieved through protocolMetadataAndICTReconstruction which skips a full state reconstruction.

    Protocol, Metadata, and In-Commit Timestamp retrieved through protocolMetadataAndICTReconstruction which skips a full state reconstruction.

    Definition Classes
    Snapshot
  3. class DataFiltersBuilder extends AnyRef

    Builds the data filters for data skipping.

    Builds the data filters for data skipping.

    Definition Classes
    DataSkippingReaderBase
  4. class CachedDS[A] extends AnyRef
    Definition Classes
    StateCache

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. val _computedStateTriggered: Boolean

    Whether computedState is already computed or not

    Whether computedState is already computed or not

    Attributes
    protected
    Definition Classes
    SnapshotStateManager
    Annotations
    @volatile()
  5. def aggregationsToComputeState: Map[String, Column]

    A Map of alias to aggregations which needs to be done to calculate the computedState

    A Map of alias to aggregations which needs to be done to calculate the computedState

    Attributes
    protected
    Definition Classes
    SnapshotStateManager
  6. def allFiles: Dataset[AddFile]

    All of the files present in this Snapshot.

    All of the files present in this Snapshot.

    Definition Classes
    SnapshotDataSkippingReaderBase
  7. def applyFuncToStatisticsColumn(statisticsSchema: StructType, statisticsColumn: Column)(function: PartialFunction[(Column, StructField), Option[Column]]): Seq[Column]

    Traverses the statisticsSchema for the provided statisticsColumn and applies function to leaves.

    Traverses the statisticsSchema for the provided statisticsColumn and applies function to leaves.

    Note, for values that are outside the domain of the partial function we keep the original column. If the caller wants to drop the column needs to explicitly return None.

    Definition Classes
    StatisticsCollection
  8. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  9. def cacheDS[A](ds: Dataset[A], name: String): CachedDS[A]

    Create a CachedDS instance for the given Dataset and the name.

    Create a CachedDS instance for the given Dataset and the name.

    Definition Classes
    StateCache
  10. lazy val checkpointProvider: CheckpointProvider

    The CheckpointProvider for the underlying checkpoint

    The CheckpointProvider for the underlying checkpoint

    Definition Classes
    Snapshot
  11. def checkpointSizeInBytes(): Long
    Definition Classes
    Snapshot
  12. val checksumOpt: Option[VersionChecksum]
    Definition Classes
    Snapshot
  13. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  14. def columnMappingMode: DeltaColumnMappingMode

    The column mapping mode of the target delta table.

    The column mapping mode of the target delta table.

    Definition Classes
    SnapshotStatisticsCollection
  15. def computeChecksum: VersionChecksum

    Computes all the information that is needed by the checksum for the current snapshot.

    Computes all the information that is needed by the checksum for the current snapshot. May kick off state reconstruction if needed by any of the underlying fields. Note that it's safe to set txnId to none, since the snapshot doesn't always have a txn attached. E.g. if a snapshot is created by reading a checkpoint, then no txnId is present.

    Definition Classes
    Snapshot
  16. lazy val computedState: SnapshotState

    Compute the SnapshotState of a table.

    Compute the SnapshotState of a table. Uses the stateDF from the Snapshot to extract the necessary stats.

    Attributes
    protected
    Definition Classes
    DummySnapshotSnapshotStateManager
  17. def constructNotNullFilter(statsProvider: StatsProvider, pathToColumn: Seq[String]): Option[DataSkippingPredicate]

    Constructs a DataSkippingPredicate for isNotNull predicates.

    Constructs a DataSkippingPredicate for isNotNull predicates.

    Attributes
    protected
    Definition Classes
    DataSkippingReaderBase
  18. def constructPartitionFilters(filters: Seq[Expression]): Column

    Given the partition filters on the data, rewrite these filters by pointing to the metadata columns.

    Given the partition filters on the data, rewrite these filters by pointing to the metadata columns.

    Attributes
    protected
    Definition Classes
    DataSkippingReaderBase
  19. def convertDataFrameToAddFiles(df: DataFrame): Array[AddFile]
    Attributes
    protected
    Definition Classes
    DataSkippingReaderBase
  20. def dataSchema: StructType

    Returns the schema of the columns written out to file (overridden in write path)

    Returns the schema of the columns written out to file (overridden in write path)

    Definition Classes
    Snapshot
  21. def datasetRefCache[A](creator: () ⇒ Dataset[A]): DatasetRefCache[A]
    Definition Classes
    StateCache
  22. def deletedRecordCountsHistogramOpt: Option[DeletedRecordCountsHistogram]
    Definition Classes
    SnapshotStateManager
  23. def deletionVectorsReadableAndHistogramEnabled: Boolean
    Attributes
    protected
    Definition Classes
    SnapshotStateManager
  24. def deletionVectorsReadableAndMetricsEnabled: Boolean
    Attributes
    protected
    Definition Classes
    SnapshotStateManager
  25. lazy val deletionVectorsSupported: Boolean
    Definition Classes
    StatisticsCollection
  26. def deltaAssert(check: ⇒ Boolean, name: String, msg: String, deltaLog: DeltaLog = null, data: AnyRef = null, path: Option[Path] = None): Unit

    Helper method to check invariants in Delta code.

    Helper method to check invariants in Delta code. Fails when running in tests, records a delta assertion event and logs a warning otherwise.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  27. lazy val deltaFileIndexOpt: Option[DeltaLogFileIndex]

    Given the list of files from LogSegment, create respective file indices to help create a DataFrame and short-circuit the many file existence and partition schema inference checks that exist in DataSource.resolveRelation().

    Given the list of files from LogSegment, create respective file indices to help create a DataFrame and short-circuit the many file existence and partition schema inference checks that exist in DataSource.resolveRelation().

    Attributes
    protected[delta]
    Definition Classes
    Snapshot
  28. def deltaFileSizeInBytes(): Long
    Definition Classes
    Snapshot
  29. val deltaLog: DeltaLog
  30. def domainMetadata: Seq[DomainMetadata]
    Definition Classes
    SnapshotStateManager
  31. def domainMetadatasIfKnown: Option[Seq[DomainMetadata]]
    Attributes
    protected[delta]
    Definition Classes
    SnapshotStateManager
  32. def emptyDF: DataFrame
    Attributes
    protected
    Definition Classes
    Snapshot
  33. def ensureCommitFilesBackfilled(catalogTableOpt: Option[CatalogTable]): Unit

    Ensures that commit files are backfilled up to the current version in the snapshot.

    Ensures that commit files are backfilled up to the current version in the snapshot.

    This method checks if there are any un-backfilled versions up to the current version and triggers the backfilling process using the commit-coordinator. It verifies that the delta file for the current version exists after the backfilling process.

    Definition Classes
    Snapshot
    Exceptions thrown

    IllegalStateException if the delta file for the current version is not found after backfilling.

  34. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  35. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  36. def extractComputedState(stateDF: DataFrame): SnapshotState

    Extract the SnapshotState from the provided dataframe of actions.

    Extract the SnapshotState from the provided dataframe of actions. Requires that the dataframe has already been deduplicated (either through logReplay or some other method).

    Attributes
    protected
    Definition Classes
    SnapshotStateManager
  37. lazy val fileIndices: Seq[DeltaLogFileIndex]
    Attributes
    protected
    Definition Classes
    Snapshot
  38. def fileSizeHistogram: Option[FileSizeHistogram]
    Definition Classes
    SnapshotStateManager
  39. def filesForScan(limit: Long, partitionFilters: Seq[Expression]): DeltaScan

    Gathers files that should be included in a scan based on the given predicates and limit.

    Gathers files that should be included in a scan based on the given predicates and limit. This will be called only when all predicates are on partitioning columns. Statistics about the amount of data that will be read are gathered and returned.

    Definition Classes
    DataSkippingReaderBaseDeltaScanGenerator
  40. def filesForScan(filters: Seq[Expression], keepNumRecords: Boolean): DeltaScan

    Gathers files that should be included in a scan based on the given predicates.

    Gathers files that should be included in a scan based on the given predicates. Statistics about the amount of data that will be read are gathered and returned. Note, the statistics column that is added when keepNumRecords = true should NOT take into account DVs. Consumers of this method might commit the file. The semantics of the statistics need to be consistent across all files.

    Definition Classes
    DataSkippingReaderBaseDeltaScanGenerator
  41. def filesWithStatsForScan(partitionFilters: Seq[Expression]): DataFrame

    Returns a DataFrame for the given partition filters.

    Returns a DataFrame for the given partition filters. The schema of returned DataFrame is nearly the same as AddFile, except that the stats field is parsed to a struct from a json string.

    Definition Classes
    DataSkippingReaderBaseDeltaScanGenerator
  42. def filterOnPartitions(partitionFilters: Seq[Expression], keepNumRecords: Boolean): (Seq[AddFile], DataSize)

    Get all the files in this table given the partition filter and the corresponding size of the scan.

    Get all the files in this table given the partition filter and the corresponding size of the scan.

    keepNumRecords

    Also select stats.numRecords in the query. This may slow down the query as it has to parse json.

    Attributes
    protected
    Definition Classes
    DataSkippingReaderBase
  43. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  44. def getAllFiles(keepNumRecords: Boolean): Seq[AddFile]

    Get all the files in this table.

    Get all the files in this table.

    keepNumRecords

    Also select stats.numRecords in the query. This may slow down the query as it has to parse json.

    Attributes
    protected
    Definition Classes
    DataSkippingReaderBase
  45. def getBaseStatsColumn: Column

    Returns a Column that references the stats field data skipping should use

    Returns a Column that references the stats field data skipping should use

    Definition Classes
    ReadsMetadataFields
  46. def getBaseStatsColumnName: String
    Definition Classes
    ReadsMetadataFields
  47. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  48. def getCommonTags(deltaLog: DeltaLog, tahoeId: String): Map[TagDefinition, String]
    Definition Classes
    DeltaLogging
  49. def getDataSkippedFiles(partitionFilters: Column, dataFilters: DataSkippingPredicate, keepNumRecords: Boolean): (Seq[AddFile], Seq[DataSize])

    Given the partition and data filters, leverage data skipping statistics to find the set of files that need to be queried.

    Given the partition and data filters, leverage data skipping statistics to find the set of files that need to be queried. Returns a tuple of the files and optionally the size of the scan that's generated if there were no filters, if there were only partition filters, and combined effect of partition and data filters respectively.

    Attributes
    protected
    Definition Classes
    DataSkippingReaderBase
  50. def getErrorData(e: Throwable): Map[String, Any]
    Definition Classes
    DeltaLogging
  51. def getFilesAndNumRecords(df: DataFrame): Iterator[(AddFile, NumRecords)] with Closeable

    Get the files and number of records within each file, to perform limit pushdown.

    Get the files and number of records within each file, to perform limit pushdown.

    Definition Classes
    DataSkippingReaderBase
  52. lazy val getInCommitTimestampOpt: Option[Long]

    Returns the inCommitTimestamp if ICT is enabled, otherwise returns None.

    Returns the inCommitTimestamp if ICT is enabled, otherwise returns None. This potentially triggers an IO operation to read the inCommitTimestamp. This is a lazy val, so repeated calls will not trigger multiple IO operations.

    Attributes
    protected
    Definition Classes
    DummySnapshotSnapshot
  53. def getLastKnownBackfilledVersion: Long
    Definition Classes
    Snapshot
  54. def getProperties: Map[String, String]

    Return the set of properties of the table.

    Return the set of properties of the table.

    Definition Classes
    Snapshot
  55. def getProtocolMetadataAndIctFromCrc(): Option[Array[ReconstructedProtocolMetadataAndICT]]

    Tries to retrieve the protocol, metadata, and in-commit-timestamp (if needed) from the checksum file.

    Tries to retrieve the protocol, metadata, and in-commit-timestamp (if needed) from the checksum file. If the checksum file is not present or if the protocol or metadata is missing this will return None.

    Attributes
    protected
    Definition Classes
    Snapshot
  56. def getSpecificFilesWithStats(paths: Seq[String]): Seq[AddFile]

    Get AddFile (with stats) actions corresponding to given set of paths in the Snapshot.

    Get AddFile (with stats) actions corresponding to given set of paths in the Snapshot. If a path doesn't exist in snapshot, it will be ignored and no AddFile will be returned for it.

    paths

    Sequence of paths for which we want to get AddFile action

    returns

    a sequence of addFiles for the given paths

    Definition Classes
    DataSkippingReaderBase
  57. final def getStatsColumnOpt(stat: StatsColumn): Option[Column]

    Overload for convenience working with StatsColumn helpers

    Overload for convenience working with StatsColumn helpers

    Attributes
    protected
    Definition Classes
    DataSkippingReaderBase
  58. final def getStatsColumnOpt(statType: String, pathToColumn: Seq[String] = Nil): Option[Column]

    Convenience overload for single element stat type paths.

    Convenience overload for single element stat type paths.

    Attributes
    protected
    Definition Classes
    DataSkippingReaderBase
  59. final def getStatsColumnOpt(pathToStatType: Seq[String], pathToColumn: Seq[String]): Option[Column]

    Returns an expression to access the given statistics for a specific column, or None if that stats column does not exist.

    Returns an expression to access the given statistics for a specific column, or None if that stats column does not exist.

    pathToStatType

    Path components of one of the fields declared by the DeltaStatistics object. For statistics of collated strings, this path contains the versioned collation identifier. In all other cases the path only has one element. The path is in reverse order.

    pathToColumn

    The components of the nested column name to get stats for. The components are in reverse order.

    Attributes
    protected
    Definition Classes
    DataSkippingReaderBase
  60. final def getStatsColumnOrNullLiteral(stat: StatsColumn): Column

    Overload for convenience working with StatsColumn helpers

    Overload for convenience working with StatsColumn helpers

    Attributes
    protected[delta]
    Definition Classes
    DataSkippingReaderBase
  61. final def getStatsColumnOrNullLiteral(statType: String, pathToColumn: Seq[String] = Nil): Column

    Returns an expression to access the given statistics for a specific column, or a NULL literal expression if that column does not exist.

    Returns an expression to access the given statistics for a specific column, or a NULL literal expression if that column does not exist.

    Attributes
    protected[delta]
    Definition Classes
    DataSkippingReaderBase
  62. def getTableCommitCoordinatorForWrites: Option[TableCommitCoordinatorClient]

    Returns the TableCommitCoordinatorClient that should be used for any type of mutation operation on the table.

    Returns the TableCommitCoordinatorClient that should be used for any type of mutation operation on the table. This includes, data writes, backfills etc. This method will throw an error if the configured coordinator could not be instantiated.

    returns

    TableCommitCoordinatorClient if the table is configured for coordinated commits, None if the table is not configured for coordinated commits.

    Definition Classes
    DummySnapshotSnapshot
  63. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  64. def init(): Unit

    Performs validations during initialization

    Performs validations during initialization

    Attributes
    protected
    Definition Classes
    Snapshot
  65. def initialState(metadata: Metadata, protocol: Protocol): SnapshotState

    Generate a default SnapshotState of a new table given the table metadata and the protocol.

    Generate a default SnapshotState of a new table given the table metadata and the protocol.

    Attributes
    protected
    Definition Classes
    SnapshotStateManager
  66. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  67. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  68. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  69. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  70. def loadActions: DataFrame

    Loads the file indices into a DataFrame that can be used for LogReplay.

    Loads the file indices into a DataFrame that can be used for LogReplay.

    In addition to the usual nested columns provided by the SingleAction schema, it should provide two additional columns to simplify the log replay process: COMMIT_VERSION_COLUMN (which, when sorted in ascending order, will order older actions before newer ones, as required by InMemoryLogReplay); and ADD_STATS_TO_USE_COL_NAME (to handle certain combinations of config settings for delta.checkpoint.writeStatsAsJson and delta.checkpoint.writeStatsAsStruct).

    Attributes
    protected
    Definition Classes
    Snapshot
  71. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  72. def logConsole(line: String): Unit
    Definition Classes
    DatabricksLogging
  73. def logDebug(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  74. def logDebug(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  75. def logDebug(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  76. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  77. def logError(msg: MessageWithContext, throwable: Throwable): Unit
    Definition Classes
    Snapshot
  78. def logError(msg: MessageWithContext): Unit
    Definition Classes
    Snapshot
  79. def logError(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  80. def logError(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  81. def logError(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  82. def logError(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  83. def logInfo(msg: MessageWithContext): Unit
    Definition Classes
    Snapshot
  84. def logInfo(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  85. def logInfo(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  86. def logInfo(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  87. def logInfo(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  88. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  89. val logPath: Path
  90. val logSegment: LogSegment
    Definition Classes
    Snapshot
  91. def logTrace(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  92. def logTrace(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  93. def logTrace(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  94. def logTrace(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  95. def logWarning(msg: MessageWithContext, throwable: Throwable): Unit
    Definition Classes
    Snapshot
  96. def logWarning(msg: MessageWithContext): Unit
    Definition Classes
    Snapshot
  97. def logWarning(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  98. def logWarning(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  99. def logWarning(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  100. def logWarning(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  101. val metadata: Metadata
  102. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  103. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  104. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  105. def numDeletedRecordsOpt: Option[Long]
    Definition Classes
    SnapshotStateManager
  106. def numDeletionVectorsOpt: Option[Long]
    Definition Classes
    SnapshotStateManager
  107. def numOfFiles: Long
    Definition Classes
    SnapshotStateManager
  108. def numOfFilesIfKnown: Option[Long]
    Attributes
    protected[delta]
    Definition Classes
    SnapshotStateManager
  109. def numOfMetadata: Long
    Definition Classes
    SnapshotStateManager
  110. def numOfProtocol: Long
    Definition Classes
    SnapshotStateManager
  111. def numOfRemoves: Long
    Definition Classes
    SnapshotStateManager
  112. def numOfSetTransactions: Long
    Definition Classes
    SnapshotStateManager
  113. def outputAttributeSchema: StructType

    The schema of the output attributes of the write queries that needs to collect statistics.

    The schema of the output attributes of the write queries that needs to collect statistics. The partition columns' definitions are not included in this schema.

    Definition Classes
    SnapshotStatisticsCollection
  114. def outputTableStatsSchema: StructType

    The output attributes (outputAttributeSchema) that are replaced with table schema with the physical mapping information.

    The output attributes (outputAttributeSchema) that are replaced with table schema with the physical mapping information. NOTE: The partition columns' definitions are not included in this schema.

    Definition Classes
    SnapshotStatisticsCollection
  115. val path: Path
    Definition Classes
    SnapshotDataSkippingReaderBase
  116. def protocol: Protocol
  117. def protocolMetadataAndICTReconstruction(): Array[ReconstructedProtocolMetadataAndICT]

    Pulls the protocol and metadata of the table from the files that are used to compute the Snapshot directly--without triggering a full state reconstruction.

    Pulls the protocol and metadata of the table from the files that are used to compute the Snapshot directly--without triggering a full state reconstruction. This is important, because state reconstruction depends on protocol and metadata for correctness. If the current table version does not have a checkpoint, this function will also return the in-commit-timestamp of the latest commit if available.

    Also this method should only access methods defined in UninitializedCheckpointProvider which are not present in CheckpointProvider. This is because initialization of Snapshot.checkpointProvider depends on Snapshot.protocolMetadataAndICTReconstruction() and so if Snapshot.protocolMetadataAndICTReconstruction() starts depending on Snapshot.checkpointProvider then there will be cyclic dependency.

    Attributes
    protected
    Definition Classes
    Snapshot
  118. def pruneFilesByLimit(df: DataFrame, limit: Long): ScanAfterLimit
    Attributes
    protected[delta]
    Definition Classes
    DataSkippingReaderBase
  119. def recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null, path: Option[Path] = None): Unit

    Used to record the occurrence of a single event or report detailed, operation specific statistics.

    Used to record the occurrence of a single event or report detailed, operation specific statistics.

    path

    Used to log the path of the delta table when deltaLog is null.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  120. def recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A

    Used to report the duration as well as the success or failure of an operation on a deltaLog.

    Used to report the duration as well as the success or failure of an operation on a deltaLog.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  121. def recordDeltaOperationForTablePath[A](tablePath: String, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A

    Used to report the duration as well as the success or failure of an operation on a tahoePath.

    Used to report the duration as well as the success or failure of an operation on a tahoePath.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  122. def recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
    Definition Classes
    DatabricksLogging
  123. def recordFrameProfile[T](group: String, name: String)(thunk: ⇒ T): T
    Attributes
    protected
    Definition Classes
    DeltaLogging
  124. def recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = METRIC_OPERATION_DURATION, silent: Boolean = true)(thunk: ⇒ S): S
    Definition Classes
    DatabricksLogging
  125. def recordProductEvent(metric: MetricDefinition with CentralizableMetric, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
    Definition Classes
    DatabricksLogging
  126. def recordProductUsage(metric: MetricDefinition with CentralizableMetric, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
    Definition Classes
    DatabricksLogging
  127. def recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
    Definition Classes
    DatabricksLogging
  128. def redactedPath: String
    Definition Classes
    SnapshotDataSkippingReaderBase
  129. def schema: StructType
    Definition Classes
    SnapshotDescriptor
  130. def setTransactions: Seq[SetTransaction]
    Definition Classes
    SnapshotStateManager
  131. def setTransactionsIfKnown: Option[Seq[SetTransaction]]
    Attributes
    protected[delta]
    Definition Classes
    SnapshotStateManager
  132. def sizeInBytes: Long

    The following is a list of convenience methods for accessing the computedState.

    The following is a list of convenience methods for accessing the computedState.

    Definition Classes
    SnapshotStateManager
  133. def sizeInBytesIfKnown: Option[Long]
    Attributes
    protected[delta]
    Definition Classes
    SnapshotStateManager
  134. val snapshotToScan: Snapshot

    Snapshot to scan by the DeltaScanGenerator for metadata query optimizations

    Snapshot to scan by the DeltaScanGenerator for metadata query optimizations

    Definition Classes
    SnapshotDeltaScanGenerator
  135. def spark: SparkSession
    Attributes
    protected
    Definition Classes
    SnapshotStatisticsCollectionStateCache
  136. lazy val statCollectionLogicalSchema: StructType

    statCollectionLogicalSchema is the logical schema that is composed of all the columns that have the stats collected with our current table configuration.

    statCollectionLogicalSchema is the logical schema that is composed of all the columns that have the stats collected with our current table configuration.

    Definition Classes
    StatisticsCollection
  137. lazy val statCollectionPhysicalSchema: StructType

    statCollectionPhysicalSchema is the schema that is composed of all the columns that have the stats collected with our current table configuration.

    statCollectionPhysicalSchema is the schema that is composed of all the columns that have the stats collected with our current table configuration.

    Definition Classes
    StatisticsCollection
  138. def stateDF: DataFrame

    The current set of actions in this Snapshot as plain Rows

    The current set of actions in this Snapshot as plain Rows

    Definition Classes
    DummySnapshotSnapshot
  139. def stateDS: Dataset[SingleAction]

    The current set of actions in this Snapshot as a typed Dataset.

    The current set of actions in this Snapshot as a typed Dataset.

    Definition Classes
    DummySnapshotSnapshot
  140. def stateReconstruction: Dataset[SingleAction]
    Attributes
    protected
    Definition Classes
    Snapshot
  141. lazy val statsCollector: Column

    Returns a struct column that can be used to collect statistics for the current schema of the table.

    Returns a struct column that can be used to collect statistics for the current schema of the table. The types we keep stats on must be consistent with DataSkippingReader.SkippingEligibleLiteral. If a column is missing from dataSchema (which will be filled with nulls), we will only collect the NULL_COUNT stats for it as the number of rows.

    Definition Classes
    StatisticsCollection
  142. lazy val statsColumnSpec: DeltaStatsColumnSpec

    Number of columns to collect stats on for data skipping

    Number of columns to collect stats on for data skipping

    Definition Classes
    SnapshotStatisticsCollection
  143. lazy val statsSchema: StructType

    Returns schema of the statistics collected.

    Returns schema of the statistics collected.

    Definition Classes
    StatisticsCollection
  144. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  145. val tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient]
    Definition Classes
    DummySnapshotSnapshot
  146. def tableSchema: StructType

    Returns the data schema of the table, used for reading stats

    Returns the data schema of the table, used for reading stats

    Definition Classes
    SnapshotStatisticsCollection
  147. def timestamp: Long

    Returns the timestamp of the latest commit of this snapshot.

    Returns the timestamp of the latest commit of this snapshot. For an uninitialized snapshot, this returns -1.

    When InCommitTimestampTableFeature is enabled, the timestamp is retrieved from the CommitInfo of the latest commit which can result in an IO operation.

    Definition Classes
    DummySnapshotSnapshot
  148. def toString(): String
    Definition Classes
    Snapshot → AnyRef → Any
  149. def tombstones: Dataset[RemoveFile]

    All unexpired tombstones.

    All unexpired tombstones.

    Definition Classes
    Snapshot
  150. lazy val transactions: Map[String, Long]

    A map to look up transaction version by appId.

    A map to look up transaction version by appId.

    Definition Classes
    SnapshotStateManager
  151. def uncache(): Unit

    Drop any cached data for this Snapshot.

    Drop any cached data for this Snapshot.

    Definition Classes
    StateCache
  152. def updateLastKnownBackfilledVersion(newVersion: Long): Unit
    Definition Classes
    Snapshot
  153. def updateStatsToWideBounds(withStats: DataFrame, statsColName: String): DataFrame

    Sets the TIGHT_BOUNDS column to false and converts the logical nullCount to a tri-state nullCount.

    Sets the TIGHT_BOUNDS column to false and converts the logical nullCount to a tri-state nullCount. The nullCount states are the following: 1) For "all-nulls" columns we set the physical nullCount which is equal to the physical numRecords. 2) "no-nulls" columns remain unchanged, i.e. zero nullCount is the same for both physical and logical representations. 3) For "some-nulls" columns, we leave the existing value. In files with wide bounds, the nullCount in SOME_NULLs columns is considered unknown.

    The file's state can transition back to tight when statistics are recomputed. In that case, TIGHT_BOUNDS is set back to true and nullCount back to the logical value.

    Note, this function gets as input parsed statistics and returns a json document similarly to allFiles. To further match the behavior of allFiles we always return a column named stats instead of statsColName.

    withStats

    A dataFrame of actions with parsed statistics.

    statsColName

    The name of the parsed statistics column.

    Definition Classes
    StatisticsCollection
  154. def validateChecksum(contextInfo: Map[String, String] = Map.empty): Boolean

    Validate checksum (if any) by comparing it against the snapshot's state reconstruction.

    Validate checksum (if any) by comparing it against the snapshot's state reconstruction.

    contextInfo

    caller context that will be added to the logging if validation fails

    returns

    True iff validation succeeded.

    Definition Classes
    ValidateChecksum
    Exceptions thrown

    IllegalStateException if validation failed and corruption is configured as fatal.

  155. def validateFileListAgainstCRC(checksum: VersionChecksum, contextOpt: Option[String]): Boolean

    Validate Snapshot.allFiles against given checksum.allFiles.

    Validate Snapshot.allFiles against given checksum.allFiles. Returns true if validation succeeds, else return false. In Unit Tests, this method throws IllegalStateException so that issues can be caught during development.

    Definition Classes
    ValidateChecksum
  156. def verifyStatsForFilter(referencedStats: Set[StatsColumn]): Column

    Returns an expression that can be used to check that the required statistics are present for a given file.

    Returns an expression that can be used to check that the required statistics are present for a given file. If any required statistics are missing we must include the corresponding file.

    NOTE: We intentionally choose to disable skipping for any file if any required stat is missing, because doing it that way allows us to check each stat only once (rather than once per use). Checking per-use would anyway only help for tables where the number of indexed columns has changed over time, producing add.stats_parsed records with differing schemas. That should be a rare enough case to not worry about optimizing for, given that the fix requires more complex skipping predicates that would penalize the common case.

    Attributes
    protected
    Definition Classes
    DataSkippingReaderBase
  157. val version: Long
  158. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  159. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  160. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  161. def withNoStats: DataFrame

    All files with the statistics column dropped completely.

    All files with the statistics column dropped completely.

    Definition Classes
    DataSkippingReaderBase
  162. final def withStats: DataFrame

    Returns a parsed and cached representation of files with statistics.

    Returns a parsed and cached representation of files with statistics.

    returns

    DataFrame

    Definition Classes
    DataSkippingReaderBase
  163. def withStatsDeduplicated: DataFrame
    Definition Classes
    DataSkippingReaderBase
  164. def withStatsInternal: DataFrame
    Attributes
    protected
    Definition Classes
    DataSkippingReaderBase
  165. def withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: ⇒ T): T

    Report a log to indicate some command is running.

    Report a log to indicate some command is running.

    Definition Classes
    DeltaProgressReporter

Inherited from Snapshot

Inherited from ValidateChecksum

Inherited from DataSkippingReader

Inherited from DataSkippingReaderBase

Inherited from ReadsMetadataFields

Inherited from DeltaScanGenerator

Inherited from StatisticsCollection

Inherited from StateCache

Inherited from SnapshotStateManager

Inherited from DeltaLogging

Inherited from DatabricksLogging

Inherited from DeltaProgressReporter

Inherited from LoggingShims

Inherited from Logging

Inherited from SnapshotDescriptor

Inherited from AnyRef

Inherited from Any

Ungrouped