class DummySnapshot extends Snapshot
A dummy snapshot with only metadata and protocol specified. It is used for a targeted table version that does not exist yet before commiting a change. This can be used to create a DataFrame, or to derive the stats schema from an existing Parquet table when converting it to Delta or cloning it to a Delta table prior to the actual snapshot being available after a commit.
Note that the snapshot state reconstruction contains only the protocol and metadata - it does not include add/remove actions, appids, or metadata domains, even if the actual table currently has or will have them in the future.
- Alphabetic
- By Inheritance
- DummySnapshot
- Snapshot
- ValidateChecksum
- DataSkippingReader
- DataSkippingReaderBase
- ReadsMetadataFields
- DeltaScanGenerator
- StatisticsCollection
- StateCache
- SnapshotStateManager
- DeltaLogging
- DatabricksLogging
- DeltaProgressReporter
- LoggingShims
- Logging
- SnapshotDescriptor
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
- new DummySnapshot(logPath: Path, deltaLog: DeltaLog)
-
new
DummySnapshot(logPath: Path, deltaLog: DeltaLog, metadata: Metadata, protocolOpt: Option[Protocol] = None)
- logPath
the path to transaction log
- deltaLog
the delta log object
- metadata
the metadata of the table
- protocolOpt
the protocol version of the table (optional). If not specified, a default protocol will be computed based on the metadata. This must be explicitly specified when replacing an existing Delta table, otherwise using the metadata to compute the protocol might result in a protocol downgrade for the table.
Type Members
-
implicit
class
LogStringContext extends AnyRef
- Definition Classes
- LoggingShims
-
case class
ReconstructedProtocolMetadataAndICT(protocol: Protocol, metadata: Metadata, inCommitTimestamp: Option[Long]) extends Product with Serializable
Protocol, Metadata, and In-Commit Timestamp retrieved through
protocolMetadataAndICTReconstructionwhich skips a full state reconstruction.Protocol, Metadata, and In-Commit Timestamp retrieved through
protocolMetadataAndICTReconstructionwhich skips a full state reconstruction.- Definition Classes
- Snapshot
-
class
DataFiltersBuilder extends AnyRef
Builds the data filters for data skipping.
Builds the data filters for data skipping.
- Definition Classes
- DataSkippingReaderBase
-
class
CachedDS[A] extends AnyRef
- Definition Classes
- StateCache
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
val
_computedStateTriggered: Boolean
Whether computedState is already computed or not
Whether computedState is already computed or not
- Attributes
- protected
- Definition Classes
- SnapshotStateManager
- Annotations
- @volatile()
-
def
aggregationsToComputeState: Map[String, Column]
A Map of alias to aggregations which needs to be done to calculate the
computedStateA Map of alias to aggregations which needs to be done to calculate the
computedState- Attributes
- protected
- Definition Classes
- SnapshotStateManager
-
def
allFiles: Dataset[AddFile]
All of the files present in this Snapshot.
All of the files present in this Snapshot.
- Definition Classes
- Snapshot → DataSkippingReaderBase
-
def
applyFuncToStatisticsColumn(statisticsSchema: StructType, statisticsColumn: Column)(function: PartialFunction[(Column, StructField), Option[Column]]): Seq[Column]
Traverses the statisticsSchema for the provided statisticsColumn and applies function to leaves.
Traverses the statisticsSchema for the provided statisticsColumn and applies function to leaves.
Note, for values that are outside the domain of the partial function we keep the original column. If the caller wants to drop the column needs to explicitly return None.
- Definition Classes
- StatisticsCollection
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
cacheDS[A](ds: Dataset[A], name: String): CachedDS[A]
Create a CachedDS instance for the given Dataset and the name.
Create a CachedDS instance for the given Dataset and the name.
- Definition Classes
- StateCache
-
lazy val
checkpointProvider: CheckpointProvider
The CheckpointProvider for the underlying checkpoint
The CheckpointProvider for the underlying checkpoint
- Definition Classes
- Snapshot
-
def
checkpointSizeInBytes(): Long
- Definition Classes
- Snapshot
-
val
checksumOpt: Option[VersionChecksum]
- Definition Classes
- Snapshot
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
columnMappingMode: DeltaColumnMappingMode
The column mapping mode of the target delta table.
The column mapping mode of the target delta table.
- Definition Classes
- Snapshot → StatisticsCollection
-
def
computeChecksum: VersionChecksum
Computes all the information that is needed by the checksum for the current snapshot.
Computes all the information that is needed by the checksum for the current snapshot. May kick off state reconstruction if needed by any of the underlying fields. Note that it's safe to set txnId to none, since the snapshot doesn't always have a txn attached. E.g. if a snapshot is created by reading a checkpoint, then no txnId is present.
- Definition Classes
- Snapshot
-
lazy val
computedState: SnapshotState
Compute the SnapshotState of a table.
Compute the SnapshotState of a table. Uses the stateDF from the Snapshot to extract the necessary stats.
- Attributes
- protected
- Definition Classes
- DummySnapshot → SnapshotStateManager
-
def
constructNotNullFilter(statsProvider: StatsProvider, pathToColumn: Seq[String]): Option[DataSkippingPredicate]
Constructs a DataSkippingPredicate for isNotNull predicates.
Constructs a DataSkippingPredicate for isNotNull predicates.
- Attributes
- protected
- Definition Classes
- DataSkippingReaderBase
-
def
constructPartitionFilters(filters: Seq[Expression]): Column
Given the partition filters on the data, rewrite these filters by pointing to the metadata columns.
Given the partition filters on the data, rewrite these filters by pointing to the metadata columns.
- Attributes
- protected
- Definition Classes
- DataSkippingReaderBase
-
def
convertDataFrameToAddFiles(df: DataFrame): Array[AddFile]
- Attributes
- protected
- Definition Classes
- DataSkippingReaderBase
-
def
dataSchema: StructType
Returns the schema of the columns written out to file (overridden in write path)
Returns the schema of the columns written out to file (overridden in write path)
- Definition Classes
- Snapshot
-
def
datasetRefCache[A](creator: () ⇒ Dataset[A]): DatasetRefCache[A]
- Definition Classes
- StateCache
-
def
deletedRecordCountsHistogramOpt: Option[DeletedRecordCountsHistogram]
- Definition Classes
- SnapshotStateManager
-
def
deletionVectorsReadableAndHistogramEnabled: Boolean
- Attributes
- protected
- Definition Classes
- SnapshotStateManager
-
def
deletionVectorsReadableAndMetricsEnabled: Boolean
- Attributes
- protected
- Definition Classes
- SnapshotStateManager
-
lazy val
deletionVectorsSupported: Boolean
- Definition Classes
- StatisticsCollection
-
def
deltaAssert(check: ⇒ Boolean, name: String, msg: String, deltaLog: DeltaLog = null, data: AnyRef = null, path: Option[Path] = None): Unit
Helper method to check invariants in Delta code.
Helper method to check invariants in Delta code. Fails when running in tests, records a delta assertion event and logs a warning otherwise.
- Attributes
- protected
- Definition Classes
- DeltaLogging
-
lazy val
deltaFileIndexOpt: Option[DeltaLogFileIndex]
Given the list of files from
LogSegment, create respective file indices to help create a DataFrame and short-circuit the many file existence and partition schema inference checks that exist in DataSource.resolveRelation(). -
def
deltaFileSizeInBytes(): Long
- Definition Classes
- Snapshot
-
val
deltaLog: DeltaLog
- Definition Classes
- DummySnapshot → Snapshot → DataSkippingReaderBase → SnapshotDescriptor
-
def
domainMetadata: Seq[DomainMetadata]
- Definition Classes
- SnapshotStateManager
-
def
domainMetadatasIfKnown: Option[Seq[DomainMetadata]]
- Attributes
- protected[delta]
- Definition Classes
- SnapshotStateManager
-
def
emptyDF: DataFrame
- Attributes
- protected
- Definition Classes
- Snapshot
-
def
ensureCommitFilesBackfilled(catalogTableOpt: Option[CatalogTable]): Unit
Ensures that commit files are backfilled up to the current version in the snapshot.
Ensures that commit files are backfilled up to the current version in the snapshot.
This method checks if there are any un-backfilled versions up to the current version and triggers the backfilling process using the commit-coordinator. It verifies that the delta file for the current version exists after the backfilling process.
- Definition Classes
- Snapshot
- Exceptions thrown
IllegalStateExceptionif the delta file for the current version is not found after backfilling.
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
extractComputedState(stateDF: DataFrame): SnapshotState
Extract the SnapshotState from the provided dataframe of actions.
Extract the SnapshotState from the provided dataframe of actions. Requires that the dataframe has already been deduplicated (either through logReplay or some other method).
- Attributes
- protected
- Definition Classes
- SnapshotStateManager
-
lazy val
fileIndices: Seq[DeltaLogFileIndex]
- Attributes
- protected
- Definition Classes
- Snapshot
-
def
fileSizeHistogram: Option[FileSizeHistogram]
- Definition Classes
- SnapshotStateManager
-
def
filesForScan(limit: Long, partitionFilters: Seq[Expression]): DeltaScan
Gathers files that should be included in a scan based on the given predicates and limit.
Gathers files that should be included in a scan based on the given predicates and limit. This will be called only when all predicates are on partitioning columns. Statistics about the amount of data that will be read are gathered and returned.
- Definition Classes
- DataSkippingReaderBase → DeltaScanGenerator
-
def
filesForScan(filters: Seq[Expression], keepNumRecords: Boolean): DeltaScan
Gathers files that should be included in a scan based on the given predicates.
Gathers files that should be included in a scan based on the given predicates. Statistics about the amount of data that will be read are gathered and returned. Note, the statistics column that is added when keepNumRecords = true should NOT take into account DVs. Consumers of this method might commit the file. The semantics of the statistics need to be consistent across all files.
- Definition Classes
- DataSkippingReaderBase → DeltaScanGenerator
-
def
filesWithStatsForScan(partitionFilters: Seq[Expression]): DataFrame
Returns a DataFrame for the given partition filters.
Returns a DataFrame for the given partition filters. The schema of returned DataFrame is nearly the same as
AddFile, except that thestatsfield is parsed to a struct from a json string.- Definition Classes
- DataSkippingReaderBase → DeltaScanGenerator
-
def
filterOnPartitions(partitionFilters: Seq[Expression], keepNumRecords: Boolean): (Seq[AddFile], DataSize)
Get all the files in this table given the partition filter and the corresponding size of the scan.
Get all the files in this table given the partition filter and the corresponding size of the scan.
- keepNumRecords
Also select
stats.numRecordsin the query. This may slow down the query as it has to parse json.
- Attributes
- protected
- Definition Classes
- DataSkippingReaderBase
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
getAllFiles(keepNumRecords: Boolean): Seq[AddFile]
Get all the files in this table.
Get all the files in this table.
- keepNumRecords
Also select
stats.numRecordsin the query. This may slow down the query as it has to parse json.
- Attributes
- protected
- Definition Classes
- DataSkippingReaderBase
-
def
getBaseStatsColumn: Column
Returns a Column that references the stats field data skipping should use
Returns a Column that references the stats field data skipping should use
- Definition Classes
- ReadsMetadataFields
-
def
getBaseStatsColumnName: String
- Definition Classes
- ReadsMetadataFields
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getCommonTags(deltaLog: DeltaLog, tahoeId: String): Map[TagDefinition, String]
- Definition Classes
- DeltaLogging
-
def
getDataSkippedFiles(partitionFilters: Column, dataFilters: DataSkippingPredicate, keepNumRecords: Boolean): (Seq[AddFile], Seq[DataSize])
Given the partition and data filters, leverage data skipping statistics to find the set of files that need to be queried.
Given the partition and data filters, leverage data skipping statistics to find the set of files that need to be queried. Returns a tuple of the files and optionally the size of the scan that's generated if there were no filters, if there were only partition filters, and combined effect of partition and data filters respectively.
- Attributes
- protected
- Definition Classes
- DataSkippingReaderBase
-
def
getErrorData(e: Throwable): Map[String, Any]
- Definition Classes
- DeltaLogging
-
def
getFilesAndNumRecords(df: DataFrame): Iterator[(AddFile, NumRecords)] with Closeable
Get the files and number of records within each file, to perform limit pushdown.
Get the files and number of records within each file, to perform limit pushdown.
- Definition Classes
- DataSkippingReaderBase
-
lazy val
getInCommitTimestampOpt: Option[Long]
Returns the inCommitTimestamp if ICT is enabled, otherwise returns None.
Returns the inCommitTimestamp if ICT is enabled, otherwise returns None. This potentially triggers an IO operation to read the inCommitTimestamp. This is a lazy val, so repeated calls will not trigger multiple IO operations.
- Attributes
- protected
- Definition Classes
- DummySnapshot → Snapshot
-
def
getLastKnownBackfilledVersion: Long
- Definition Classes
- Snapshot
-
def
getProperties: Map[String, String]
Return the set of properties of the table.
Return the set of properties of the table.
- Definition Classes
- Snapshot
-
def
getProtocolMetadataAndIctFromCrc(): Option[Array[ReconstructedProtocolMetadataAndICT]]
Tries to retrieve the protocol, metadata, and in-commit-timestamp (if needed) from the checksum file.
Tries to retrieve the protocol, metadata, and in-commit-timestamp (if needed) from the checksum file. If the checksum file is not present or if the protocol or metadata is missing this will return None.
- Attributes
- protected
- Definition Classes
- Snapshot
-
def
getSpecificFilesWithStats(paths: Seq[String]): Seq[AddFile]
Get AddFile (with stats) actions corresponding to given set of paths in the Snapshot.
Get AddFile (with stats) actions corresponding to given set of paths in the Snapshot. If a path doesn't exist in snapshot, it will be ignored and no AddFile will be returned for it.
- paths
Sequence of paths for which we want to get AddFile action
- returns
a sequence of addFiles for the given
paths
- Definition Classes
- DataSkippingReaderBase
-
final
def
getStatsColumnOpt(stat: StatsColumn): Option[Column]
Overload for convenience working with StatsColumn helpers
Overload for convenience working with StatsColumn helpers
- Attributes
- protected
- Definition Classes
- DataSkippingReaderBase
-
final
def
getStatsColumnOpt(statType: String, pathToColumn: Seq[String] = Nil): Option[Column]
Convenience overload for single element stat type paths.
Convenience overload for single element stat type paths.
- Attributes
- protected
- Definition Classes
- DataSkippingReaderBase
-
final
def
getStatsColumnOpt(pathToStatType: Seq[String], pathToColumn: Seq[String]): Option[Column]
Returns an expression to access the given statistics for a specific column, or None if that stats column does not exist.
Returns an expression to access the given statistics for a specific column, or None if that stats column does not exist.
- pathToStatType
Path components of one of the fields declared by the
DeltaStatisticsobject. For statistics of collated strings, this path contains the versioned collation identifier. In all other cases the path only has one element. The path is in reverse order.- pathToColumn
The components of the nested column name to get stats for. The components are in reverse order.
- Attributes
- protected
- Definition Classes
- DataSkippingReaderBase
-
final
def
getStatsColumnOrNullLiteral(stat: StatsColumn): Column
Overload for convenience working with StatsColumn helpers
Overload for convenience working with StatsColumn helpers
- Attributes
- protected[delta]
- Definition Classes
- DataSkippingReaderBase
-
final
def
getStatsColumnOrNullLiteral(statType: String, pathToColumn: Seq[String] = Nil): Column
Returns an expression to access the given statistics for a specific column, or a NULL literal expression if that column does not exist.
Returns an expression to access the given statistics for a specific column, or a NULL literal expression if that column does not exist.
- Attributes
- protected[delta]
- Definition Classes
- DataSkippingReaderBase
-
def
getTableCommitCoordinatorForWrites: Option[TableCommitCoordinatorClient]
Returns the TableCommitCoordinatorClient that should be used for any type of mutation operation on the table.
Returns the TableCommitCoordinatorClient that should be used for any type of mutation operation on the table. This includes, data writes, backfills etc. This method will throw an error if the configured coordinator could not be instantiated.
- returns
TableCommitCoordinatorClient if the table is configured for coordinated commits, None if the table is not configured for coordinated commits.
- Definition Classes
- DummySnapshot → Snapshot
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
init(): Unit
Performs validations during initialization
Performs validations during initialization
- Attributes
- protected
- Definition Classes
- Snapshot
-
def
initialState(metadata: Metadata, protocol: Protocol): SnapshotState
Generate a default SnapshotState of a new table given the table metadata and the protocol.
Generate a default SnapshotState of a new table given the table metadata and the protocol.
- Attributes
- protected
- Definition Classes
- SnapshotStateManager
-
def
initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
loadActions: DataFrame
Loads the file indices into a DataFrame that can be used for LogReplay.
Loads the file indices into a DataFrame that can be used for LogReplay.
In addition to the usual nested columns provided by the SingleAction schema, it should provide two additional columns to simplify the log replay process: COMMIT_VERSION_COLUMN (which, when sorted in ascending order, will order older actions before newer ones, as required by InMemoryLogReplay); and ADD_STATS_TO_USE_COL_NAME (to handle certain combinations of config settings for delta.checkpoint.writeStatsAsJson and delta.checkpoint.writeStatsAsStruct).
- Attributes
- protected
- Definition Classes
- Snapshot
-
def
log: Logger
- Attributes
- protected
- Definition Classes
- Logging
-
def
logConsole(line: String): Unit
- Definition Classes
- DatabricksLogging
-
def
logDebug(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logDebug(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logDebug(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: MessageWithContext, throwable: Throwable): Unit
- Definition Classes
- Snapshot
-
def
logError(msg: MessageWithContext): Unit
- Definition Classes
- Snapshot
-
def
logError(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logError(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logError(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: MessageWithContext): Unit
- Definition Classes
- Snapshot
-
def
logInfo(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logInfo(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logInfo(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logName: String
- Attributes
- protected
- Definition Classes
- Logging
- val logPath: Path
-
val
logSegment: LogSegment
- Definition Classes
- Snapshot
-
def
logTrace(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logTrace(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logTrace(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: MessageWithContext, throwable: Throwable): Unit
- Definition Classes
- Snapshot
-
def
logWarning(msg: MessageWithContext): Unit
- Definition Classes
- Snapshot
-
def
logWarning(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logWarning(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logWarning(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
val
metadata: Metadata
- Definition Classes
- DummySnapshot → Snapshot → DataSkippingReaderBase → SnapshotDescriptor
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
numDeletedRecordsOpt: Option[Long]
- Definition Classes
- SnapshotStateManager
-
def
numDeletionVectorsOpt: Option[Long]
- Definition Classes
- SnapshotStateManager
-
def
numOfFiles: Long
- Definition Classes
- SnapshotStateManager
-
def
numOfFilesIfKnown: Option[Long]
- Attributes
- protected[delta]
- Definition Classes
- SnapshotStateManager
-
def
numOfMetadata: Long
- Definition Classes
- SnapshotStateManager
-
def
numOfProtocol: Long
- Definition Classes
- SnapshotStateManager
-
def
numOfRemoves: Long
- Definition Classes
- SnapshotStateManager
-
def
numOfSetTransactions: Long
- Definition Classes
- SnapshotStateManager
-
def
outputAttributeSchema: StructType
The schema of the output attributes of the write queries that needs to collect statistics.
The schema of the output attributes of the write queries that needs to collect statistics. The partition columns' definitions are not included in this schema.
- Definition Classes
- Snapshot → StatisticsCollection
-
def
outputTableStatsSchema: StructType
The output attributes (
outputAttributeSchema) that are replaced with table schema with the physical mapping information.The output attributes (
outputAttributeSchema) that are replaced with table schema with the physical mapping information. NOTE: The partition columns' definitions are not included in this schema.- Definition Classes
- Snapshot → StatisticsCollection
-
val
path: Path
- Definition Classes
- Snapshot → DataSkippingReaderBase
-
def
protocol: Protocol
- Definition Classes
- DummySnapshot → Snapshot → StatisticsCollection → SnapshotDescriptor
-
def
protocolMetadataAndICTReconstruction(): Array[ReconstructedProtocolMetadataAndICT]
Pulls the protocol and metadata of the table from the files that are used to compute the Snapshot directly--without triggering a full state reconstruction.
Pulls the protocol and metadata of the table from the files that are used to compute the Snapshot directly--without triggering a full state reconstruction. This is important, because state reconstruction depends on protocol and metadata for correctness. If the current table version does not have a checkpoint, this function will also return the in-commit-timestamp of the latest commit if available.
Also this method should only access methods defined in UninitializedCheckpointProvider which are not present in CheckpointProvider. This is because initialization of Snapshot.checkpointProvider depends on Snapshot.protocolMetadataAndICTReconstruction() and so if Snapshot.protocolMetadataAndICTReconstruction() starts depending on Snapshot.checkpointProvider then there will be cyclic dependency.
- Attributes
- protected
- Definition Classes
- Snapshot
-
def
pruneFilesByLimit(df: DataFrame, limit: Long): ScanAfterLimit
- Attributes
- protected[delta]
- Definition Classes
- DataSkippingReaderBase
-
def
recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null, path: Option[Path] = None): Unit
Used to record the occurrence of a single event or report detailed, operation specific statistics.
Used to record the occurrence of a single event or report detailed, operation specific statistics.
- path
Used to log the path of the delta table when
deltaLogis null.
- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A
Used to report the duration as well as the success or failure of an operation on a
deltaLog.Used to report the duration as well as the success or failure of an operation on a
deltaLog.- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordDeltaOperationForTablePath[A](tablePath: String, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A
Used to report the duration as well as the success or failure of an operation on a
tahoePath.Used to report the duration as well as the success or failure of an operation on a
tahoePath.- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
-
def
recordFrameProfile[T](group: String, name: String)(thunk: ⇒ T): T
- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = METRIC_OPERATION_DURATION, silent: Boolean = true)(thunk: ⇒ S): S
- Definition Classes
- DatabricksLogging
-
def
recordProductEvent(metric: MetricDefinition with CentralizableMetric, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
-
def
recordProductUsage(metric: MetricDefinition with CentralizableMetric, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
-
def
recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
-
def
redactedPath: String
- Definition Classes
- Snapshot → DataSkippingReaderBase
-
def
schema: StructType
- Definition Classes
- SnapshotDescriptor
-
def
setTransactions: Seq[SetTransaction]
- Definition Classes
- SnapshotStateManager
-
def
setTransactionsIfKnown: Option[Seq[SetTransaction]]
- Attributes
- protected[delta]
- Definition Classes
- SnapshotStateManager
-
def
sizeInBytes: Long
The following is a list of convenience methods for accessing the computedState.
The following is a list of convenience methods for accessing the computedState.
- Definition Classes
- SnapshotStateManager
-
def
sizeInBytesIfKnown: Option[Long]
- Attributes
- protected[delta]
- Definition Classes
- SnapshotStateManager
-
val
snapshotToScan: Snapshot
Snapshot to scan by the DeltaScanGenerator for metadata query optimizations
Snapshot to scan by the DeltaScanGenerator for metadata query optimizations
- Definition Classes
- Snapshot → DeltaScanGenerator
-
def
spark: SparkSession
- Attributes
- protected
- Definition Classes
- Snapshot → StatisticsCollection → StateCache
-
lazy val
statCollectionLogicalSchema: StructType
statCollectionLogicalSchema is the logical schema that is composed of all the columns that have the stats collected with our current table configuration.
statCollectionLogicalSchema is the logical schema that is composed of all the columns that have the stats collected with our current table configuration.
- Definition Classes
- StatisticsCollection
-
lazy val
statCollectionPhysicalSchema: StructType
statCollectionPhysicalSchema is the schema that is composed of all the columns that have the stats collected with our current table configuration.
statCollectionPhysicalSchema is the schema that is composed of all the columns that have the stats collected with our current table configuration.
- Definition Classes
- StatisticsCollection
-
def
stateDF: DataFrame
The current set of actions in this Snapshot as plain Rows
The current set of actions in this Snapshot as plain Rows
- Definition Classes
- DummySnapshot → Snapshot
-
def
stateDS: Dataset[SingleAction]
The current set of actions in this Snapshot as a typed Dataset.
The current set of actions in this Snapshot as a typed Dataset.
- Definition Classes
- DummySnapshot → Snapshot
-
def
stateReconstruction: Dataset[SingleAction]
- Attributes
- protected
- Definition Classes
- Snapshot
-
lazy val
statsCollector: Column
Returns a struct column that can be used to collect statistics for the current schema of the table.
Returns a struct column that can be used to collect statistics for the current schema of the table. The types we keep stats on must be consistent with DataSkippingReader.SkippingEligibleLiteral. If a column is missing from dataSchema (which will be filled with nulls), we will only collect the NULL_COUNT stats for it as the number of rows.
- Definition Classes
- StatisticsCollection
-
lazy val
statsColumnSpec: DeltaStatsColumnSpec
Number of columns to collect stats on for data skipping
Number of columns to collect stats on for data skipping
- Definition Classes
- Snapshot → StatisticsCollection
-
lazy val
statsSchema: StructType
Returns schema of the statistics collected.
Returns schema of the statistics collected.
- Definition Classes
- StatisticsCollection
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
val
tableCommitCoordinatorClientOpt: Option[TableCommitCoordinatorClient]
- Definition Classes
- DummySnapshot → Snapshot
-
def
tableSchema: StructType
Returns the data schema of the table, used for reading stats
Returns the data schema of the table, used for reading stats
- Definition Classes
- Snapshot → StatisticsCollection
-
def
timestamp: Long
Returns the timestamp of the latest commit of this snapshot.
Returns the timestamp of the latest commit of this snapshot. For an uninitialized snapshot, this returns -1.
When InCommitTimestampTableFeature is enabled, the timestamp is retrieved from the CommitInfo of the latest commit which can result in an IO operation.
- Definition Classes
- DummySnapshot → Snapshot
-
def
toString(): String
- Definition Classes
- Snapshot → AnyRef → Any
-
def
tombstones: Dataset[RemoveFile]
All unexpired tombstones.
All unexpired tombstones.
- Definition Classes
- Snapshot
-
lazy val
transactions: Map[String, Long]
A map to look up transaction version by appId.
A map to look up transaction version by appId.
- Definition Classes
- SnapshotStateManager
-
def
uncache(): Unit
Drop any cached data for this Snapshot.
Drop any cached data for this Snapshot.
- Definition Classes
- StateCache
-
def
updateLastKnownBackfilledVersion(newVersion: Long): Unit
- Definition Classes
- Snapshot
-
def
updateStatsToWideBounds(withStats: DataFrame, statsColName: String): DataFrame
Sets the TIGHT_BOUNDS column to false and converts the logical nullCount to a tri-state nullCount.
Sets the TIGHT_BOUNDS column to false and converts the logical nullCount to a tri-state nullCount. The nullCount states are the following: 1) For "all-nulls" columns we set the physical nullCount which is equal to the physical numRecords. 2) "no-nulls" columns remain unchanged, i.e. zero nullCount is the same for both physical and logical representations. 3) For "some-nulls" columns, we leave the existing value. In files with wide bounds, the nullCount in SOME_NULLs columns is considered unknown.
The file's state can transition back to tight when statistics are recomputed. In that case, TIGHT_BOUNDS is set back to true and nullCount back to the logical value.
Note, this function gets as input parsed statistics and returns a json document similarly to allFiles. To further match the behavior of allFiles we always return a column named
statsinstead of statsColName.- withStats
A dataFrame of actions with parsed statistics.
- statsColName
The name of the parsed statistics column.
- Definition Classes
- StatisticsCollection
-
def
validateChecksum(contextInfo: Map[String, String] = Map.empty): Boolean
Validate checksum (if any) by comparing it against the snapshot's state reconstruction.
Validate checksum (if any) by comparing it against the snapshot's state reconstruction.
- contextInfo
caller context that will be added to the logging if validation fails
- returns
True iff validation succeeded.
- Definition Classes
- ValidateChecksum
- Exceptions thrown
IllegalStateExceptionif validation failed and corruption is configured as fatal.
-
def
validateFileListAgainstCRC(checksum: VersionChecksum, contextOpt: Option[String]): Boolean
Validate Snapshot.allFiles against given checksum.allFiles.
Validate Snapshot.allFiles against given checksum.allFiles. Returns true if validation succeeds, else return false. In Unit Tests, this method throws IllegalStateException so that issues can be caught during development.
- Definition Classes
- ValidateChecksum
-
def
verifyStatsForFilter(referencedStats: Set[StatsColumn]): Column
Returns an expression that can be used to check that the required statistics are present for a given file.
Returns an expression that can be used to check that the required statistics are present for a given file. If any required statistics are missing we must include the corresponding file.
NOTE: We intentionally choose to disable skipping for any file if any required stat is missing, because doing it that way allows us to check each stat only once (rather than once per use). Checking per-use would anyway only help for tables where the number of indexed columns has changed over time, producing add.stats_parsed records with differing schemas. That should be a rare enough case to not worry about optimizing for, given that the fix requires more complex skipping predicates that would penalize the common case.
- Attributes
- protected
- Definition Classes
- DataSkippingReaderBase
-
val
version: Long
- Definition Classes
- Snapshot → DataSkippingReaderBase → SnapshotDescriptor
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
withNoStats: DataFrame
All files with the statistics column dropped completely.
All files with the statistics column dropped completely.
- Definition Classes
- DataSkippingReaderBase
-
final
def
withStats: DataFrame
Returns a parsed and cached representation of files with statistics.
Returns a parsed and cached representation of files with statistics.
- returns
- Definition Classes
- DataSkippingReaderBase
-
def
withStatsDeduplicated: DataFrame
- Definition Classes
- DataSkippingReaderBase
-
def
withStatsInternal: DataFrame
- Attributes
- protected
- Definition Classes
- DataSkippingReaderBase
-
def
withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: ⇒ T): T
Report a log to indicate some command is running.
Report a log to indicate some command is running.
- Definition Classes
- DeltaProgressReporter