case class DeltaSource(spark: SparkSession, deltaLog: DeltaLog, options: DeltaOptions, snapshotAtSourceInit: SnapshotDescriptor, metadataPath: String, metadataTrackingLog: Option[DeltaSourceMetadataTrackingLog] = None, filters: Seq[Expression] = Nil) extends DeltaSourceBase with DeltaSourceCDCSupport with DeltaSourceMetadataEvolutionSupport with Product with Serializable
A streaming source for a Delta table.
When a new stream is started, delta starts by constructing a org.apache.spark.sql.delta.Snapshot at the current version of the table. This snapshot is broken up into batches until all existing data has been processed. Subsequent processing is done by tailing the change log looking for new data. This results in the streaming query returning the same answer as a batch query that had processed the entire dataset at any given point.
- Alphabetic
- By Inheritance
- DeltaSource
- Serializable
- Product
- Equals
- DeltaSourceMetadataEvolutionSupport
- DeltaSourceCDCSupport
- DeltaSourceBase
- DeltaLogging
- DatabricksLogging
- DeltaProgressReporter
- LoggingShims
- Logging
- SupportsTriggerAvailableNow
- SupportsAdmissionControl
- Source
- SparkDataStream
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
- new DeltaSource(spark: SparkSession, deltaLog: DeltaLog, options: DeltaOptions, snapshotAtSourceInit: SnapshotDescriptor, metadataPath: String, metadataTrackingLog: Option[DeltaSourceMetadataTrackingLog] = None, filters: Seq[Expression] = Nil)
Type Members
- implicit class LogStringContext extends AnyRef
- Definition Classes
- LoggingShims
- case class AdmissionLimits(maxFiles: Option[Int] = options.maxFilesPerTrigger, bytesToTake: Long = options.maxBytesPerTrigger.getOrElse(Long.MaxValue)) extends DeltaSourceAdmissionBase with Product with Serializable
Class that helps controlling how much data should be processed by a single micro-batch.
- trait DeltaSourceAdmissionBase extends AnyRef
- class IndexedChangeFileSeq extends AnyRef
This class represents an iterator of Change metadata(AddFile, RemoveFile, AddCDCFile) for a particular version.
This class represents an iterator of Change metadata(AddFile, RemoveFile, AddCDCFile) for a particular version.
- Definition Classes
- DeltaSourceCDCSupport
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- def addBeginAndEndIndexOffsetsForVersion(version: Long, iterator: Iterator[IndexedFile]): Iterator[IndexedFile]
Adds dummy BEGIN_INDEX and END_INDEX IndexedFiles for @version before and after the contents of the iterator.
Adds dummy BEGIN_INDEX and END_INDEX IndexedFiles for @version before and after the contents of the iterator. The contents of the iterator must be the IndexedFiles that correspond to this version.
- Attributes
- protected
- lazy val allowUnsafeStreamingReadOnColumnMappingSchemaChanges: Boolean
Flag that allows user to force enable unsafe streaming read on Delta table with column mapping enabled AND drop/rename actions.
Flag that allows user to force enable unsafe streaming read on Delta table with column mapping enabled AND drop/rename actions.
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- lazy val allowUnsafeStreamingReadOnPartitionColumnChanges: Boolean
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def checkReadIncompatibleSchemaChangeOnStreamStartOnce(batchStartVersion: Long, batchEndVersionOpt: Option[Long] = None): Unit
Check read-incompatible schema changes during stream (re)start so we could fail fast.
Check read-incompatible schema changes during stream (re)start so we could fail fast.
This only needs to be called ONCE in the life cycle of a stream, either at the very first latestOffset, or the very first getBatch to make sure we have detected an incompatible schema change. Typically, the verifyStreamHygiene that was called maybe good enough to detect these schema changes, there may be cases that wouldn't work, e.g. consider this sequence: 1. User starts a new stream @ startingVersion 1 2. latestOffset is called before getBatch() because there was no previous commits so getBatch won't be called as a recovery mechanism. Suppose there's a single rename/drop/nullability change S during computing next offset, S would look exactly the same as the latest schema so verifyStreamHygiene would not work. 3. latestOffset would return this new offset cross the schema boundary.
If a schema log is already initialized, we don't have to run the initialization nor schema checks any more.
- batchStartVersion
Start version we want to verify read compatibility against
- batchEndVersionOpt
Optionally, if we are checking against an existing constructed batch during streaming initialization, we would also like to verify all schema changes in between as well before we can lazily initialize the schema log if needed.
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- def checkReadIncompatibleSchemaChanges(metadata: Metadata, version: Long, batchStartVersion: Long, batchEndVersionOpt: Option[Long] = None, validatedDuringStreamStart: Boolean = false): Unit
Narrow waist to verify a metadata action for read-incompatible schema changes, specifically: 1.
Narrow waist to verify a metadata action for read-incompatible schema changes, specifically: 1. Any column mapping related schema changes (rename / drop) columns 2. Standard read-compatibility changes including: a) No missing columns b) No data type changes c) No read-incompatible nullability changes If the check fails, we throw an exception to exit the stream. If lazy log initialization is required, we also run a one time scan to safely initialize the metadata tracking log upon any non-additive schema change failures.
- metadata
Metadata that contains a potential schema change
- version
Version for the metadata action
- validatedDuringStreamStart
Whether this check is being done during stream start.
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- def cleanUpSnapshotResources(): Unit
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- def collectMetadataActions(startVersion: Long, endVersion: Long): Seq[(Long, Metadata)]
- Attributes
- protected
- Definition Classes
- DeltaSourceMetadataEvolutionSupport
- def collectProtocolActions(startVersion: Long, endVersion: Long): Seq[(Long, Protocol)]
- Attributes
- protected
- Definition Classes
- DeltaSourceMetadataEvolutionSupport
- def commit(end: Offset): Unit
- Definition Classes
- DeltaSource → Source
- def commit(end: Offset): Unit
- Definition Classes
- Source → SparkDataStream
- def createDataFrame(indexedFiles: Iterator[IndexedFile]): DataFrame
Given an iterator of file actions, create a DataFrame representing the files added to a table Only AddFile actions will be used to create the DataFrame.
Given an iterator of file actions, create a DataFrame representing the files added to a table Only AddFile actions will be used to create the DataFrame.
- indexedFiles
actions iterator from which to generate the DataFrame.
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- def createDataFrameBetweenOffsets(startVersion: Long, startIndex: Long, isInitialSnapshot: Boolean, startOffsetOption: Option[DeltaSourceOffset], endOffset: DeltaSourceOffset): DataFrame
Return the DataFrame between start and end offset.
Return the DataFrame between start and end offset.
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- def deltaAssert(check: => Boolean, name: String, msg: String, deltaLog: DeltaLog = null, data: AnyRef = null, path: Option[Path] = None): Unit
Helper method to check invariants in Delta code.
Helper method to check invariants in Delta code. Fails when running in tests, records a delta assertion event and logs a warning otherwise.
- Attributes
- protected
- Definition Classes
- DeltaLogging
- val deltaLog: DeltaLog
- def deserializeOffset(json: String): Offset
- Definition Classes
- Source → SparkDataStream
- val emptyDataFrame: DataFrame
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- val excludeRegex: Option[Regex]
- Attributes
- protected
- val filters: Seq[Expression]
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- lazy val forceEnableStreamingReadOnReadIncompatibleSchemaChangesDuringStreamStart: Boolean
Flag that allows user to disable the read-compatibility check during stream start which protects against an corner case in which verifyStreamHygiene could not detect.
Flag that allows user to disable the read-compatibility check during stream start which protects against an corner case in which verifyStreamHygiene could not detect. This is a bug fix but yet a potential behavior change, so we add a flag to fallback.
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- lazy val forceEnableUnsafeReadOnNullabilityChange: Boolean
Flag that allow user to fallback to the legacy behavior in which user can allow nullable=false schema to read nullable=true data, which is incorrect but a behavior change regardless.
Flag that allow user to fallback to the legacy behavior in which user can allow nullable=false schema to read nullable=true data, which is incorrect but a behavior change regardless.
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- def getBatch(startOffsetOption: Option[Offset], end: Offset): DataFrame
- Definition Classes
- DeltaSource → Source
- def getCDCFileChangesAndCreateDataFrame(startVersion: Long, startIndex: Long, isInitialSnapshot: Boolean, endOffset: DeltaSourceOffset): DataFrame
Get the changes from startVersion, startIndex to the end for CDC case.
Get the changes from startVersion, startIndex to the end for CDC case. We need to call CDCReader to get the CDC DataFrame.
- startVersion
- calculated starting version
- startIndex
- calculated starting index
- isInitialSnapshot
- whether the stream has to return the initial snapshot or not
- endOffset
- Offset that signifies the end of the stream.
- returns
the DataFrame containing the file changes (AddFile, RemoveFile, AddCDCFile)
- Attributes
- protected
- Definition Classes
- DeltaSourceCDCSupport
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def getCommonTags(deltaLog: DeltaLog, tahoeId: String): Map[TagDefinition, String]
- Definition Classes
- DeltaLogging
- def getDefaultReadLimit(): ReadLimit
- Definition Classes
- DeltaSource → SupportsAdmissionControl
- def getErrorData(e: Throwable): Map[String, Any]
- Definition Classes
- DeltaLogging
- def getFileChanges(fromVersion: Long, fromIndex: Long, isInitialSnapshot: Boolean, endOffset: Option[DeltaSourceOffset] = None, verifyMetadataAction: Boolean = true): ClosableIterator[IndexedFile]
Get the changes starting from (startVersion, startIndex).
Get the changes starting from (startVersion, startIndex). The start point should not be included in the result.
- endOffset
If defined, do not return changes beyond this offset. If not defined, we must be scanning the log to find the next offset.
- verifyMetadataAction
If true, we will break the stream when we detect any read-incompatible metadata changes.
- Attributes
- protected
- def getFileChangesAndCreateDataFrame(startVersion: Long, startIndex: Long, isInitialSnapshot: Boolean, endOffset: DeltaSourceOffset): DataFrame
get the changes from startVersion, startIndex to the end
get the changes from startVersion, startIndex to the end
- startVersion
- calculated starting version
- startIndex
- calculated starting index
- isInitialSnapshot
- whether the stream has to return the initial snapshot or not
- endOffset
- Offset that signifies the end of the stream.
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- def getFileChangesForCDC(fromVersion: Long, fromIndex: Long, isInitialSnapshot: Boolean, limits: Option[AdmissionLimits], endOffset: Option[DeltaSourceOffset], verifyMetadataAction: Boolean = true): Iterator[(Long, Iterator[IndexedFile], Option[CommitInfo])]
Get the changes starting from (fromVersion, fromIndex).
Get the changes starting from (fromVersion, fromIndex). fromVersion is included. It returns an iterator of (log_version, fileActions, Optional[CommitInfo]). The commit info is needed later on so that the InCommitTimestamp of the log files can be determined.
If verifyMetadataAction = true, we will break the stream when we detect any read-incompatible metadata changes.
- Attributes
- protected
- Definition Classes
- DeltaSourceCDCSupport
- def getFileChangesWithRateLimit(fromVersion: Long, fromIndex: Long, isInitialSnapshot: Boolean, limits: Option[AdmissionLimits] = Some(AdmissionLimits())): ClosableIterator[IndexedFile]
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- def getMetadataOrProtocolChangeIndexedFileIterator(metadataChangeOpt: Option[Metadata], protocolChangeOpt: Option[Protocol], version: Long): ClosableIterator[IndexedFile]
If the current stream metadata is not equal to the metadata change in metadataChangeOpt, return a metadata change barrier IndexedFile.
If the current stream metadata is not equal to the metadata change in metadataChangeOpt, return a metadata change barrier IndexedFile. Only returns something if trackingMetadataChangeis true.
- Attributes
- protected
- Definition Classes
- DeltaSourceMetadataEvolutionSupport
- def getNextOffsetFromPreviousOffset(previousOffset: DeltaSourceOffset, limits: Option[AdmissionLimits]): Option[DeltaSourceOffset]
Return the next offset when previous offset exists.
Return the next offset when previous offset exists.
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- def getNextOffsetFromPreviousOffsetIfPendingSchemaChange(previousOffset: DeltaSourceOffset): Option[DeltaSourceOffset]
If the given previous Delta source offset is a schema change offset, returns the appropriate next offset.
If the given previous Delta source offset is a schema change offset, returns the appropriate next offset. This should be called before trying any other means of determining the next offset. If this returns None, then there is no schema change, and the caller should determine the next offset in the normal way.
- Attributes
- protected
- Definition Classes
- DeltaSourceMetadataEvolutionSupport
- def getOffset: Option[Offset]
- Definition Classes
- DeltaSource → Source
- def getSnapshotAt(version: Long): (Iterator[IndexedFile], Option[Long])
This method computes the initial snapshot to read when Delta Source was initialized on a fresh stream.
This method computes the initial snapshot to read when Delta Source was initialized on a fresh stream.
- returns
A tuple where the first element is an iterator of IndexedFiles and the second element is the in-commit timestamp of the initial snapshot if available.
- Attributes
- protected
- def getSnapshotFromDeltaLog(version: Long): Snapshot
Narrow-waist for generating snapshot from Delta Log within Delta Source
Narrow-waist for generating snapshot from Delta Log within Delta Source
- Attributes
- protected
- def getStartingOffsetFromSpecificDeltaVersion(fromVersion: Long, isInitialSnapshot: Boolean, limits: Option[AdmissionLimits]): Option[DeltaSourceOffset]
Returns the offset that starts from a specific delta table version.
Returns the offset that starts from a specific delta table version. This function is called when starting a new stream query.
- fromVersion
The version of the delta table to calculate the offset from.
- isInitialSnapshot
Whether the delta version is for the initial snapshot or not.
- limits
Indicates how much data can be processed by a micro batch.
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- lazy val getStartingVersion: Option[Long]
Extracts whether users provided the option to time travel a relation.
Extracts whether users provided the option to time travel a relation. If a query restarts from a checkpoint and the checkpoint has recorded the offset, this method should never been called.
- Attributes
- protected
- val hasCheckedReadIncompatibleSchemaChangesOnStreamStart: Boolean
A global flag to mark whether we have done a per-stream start check for column mapping schema changes (rename / drop).
A global flag to mark whether we have done a per-stream start check for column mapping schema changes (rename / drop).
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- Annotations
- @volatile()
- def initForTriggerAvailableNowIfNeeded(startOffsetOpt: Option[DeltaSourceOffset]): Unit
initialize the internal states for AvailableNow if this method is called first time after
prepareForTriggerAvailableNow.initialize the internal states for AvailableNow if this method is called first time after
prepareForTriggerAvailableNow.- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- def initLastOffsetForTriggerAvailableNow(startOffsetOpt: Option[DeltaSourceOffset]): Unit
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- def initialOffset(): Offset
- Definition Classes
- Source → SparkDataStream
- var initialState: DeltaSourceSnapshot
- Attributes
- protected
- var initialStateVersion: Long
- Attributes
- protected
- def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def initializeMetadataTrackingAndExitStream(batchStartVersion: Long, batchEndVersionOpt: Option[Long] = None, alwaysFailUponLogInitialized: Boolean = false): Unit
Initialize the schema tracking log if an empty schema tracking log is provided.
Initialize the schema tracking log if an empty schema tracking log is provided. This method also checks the range between batchStartVersion and batchEndVersion to ensure we a safe schema to be initialized in the log.
- batchStartVersion
Start version of the batch of data to be proceed, it should typically be the schema that is safe to process incoming data.
- batchEndVersionOpt
Optionally, if we are looking at a constructed batch with existing end offset, we need to double verify to ensure no read-incompatible within the batch range.
- alwaysFailUponLogInitialized
Whether we should always fail with the schema evolution exception.
- Attributes
- protected
- Definition Classes
- DeltaSourceMetadataEvolutionSupport
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- val isStreamingFromColumnMappingTable: Boolean
Whether we are streaming from a table with column mapping enabled
Whether we are streaming from a table with column mapping enabled
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- def isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- val lastOffsetForTriggerAvailableNow: Option[DeltaSourceOffset]
When
AvailableNowis used, this offset will be the upper bound where this run of the query will process up.When
AvailableNowis used, this offset will be the upper bound where this run of the query will process up. We may run multiple micro batches, but the query will stop itself when it reaches this offset.- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- def latestOffset(startOffset: Offset, limit: ReadLimit): Offset
This should only be called by the engine.
This should only be called by the engine. Call
latestOffsetInternalinstead if you need to get the latest offset.- Definition Classes
- DeltaSource → SupportsAdmissionControl
- def latestOffsetInternal(startOffset: Option[DeltaSourceOffset], limit: ReadLimit): Option[DeltaSourceOffset]
An internal
latestOffsetInternalto get the latest offset.An internal
latestOffsetInternalto get the latest offset.- Attributes
- protected
- Definition Classes
- DeltaSource → DeltaSourceBase
- def log: Logger
- Attributes
- protected
- Definition Classes
- Logging
- def logConsole(line: String): Unit
- Definition Classes
- DatabricksLogging
- def logDebug(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logDebug(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logDebug(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logError(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logError(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logInfo(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logInfo(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logName: String
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logTrace(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logTrace(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logWarning(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logWarning(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- val metadataPath: String
- val metadataTrackingLog: Option[DeltaSourceMetadataTrackingLog]
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- val options: DeltaOptions
- val persistedMetadataAtSourceInit: Option[PersistedMetadata]
The persisted schema from the schema log that must be used to read data files in this Delta streaming source.
The persisted schema from the schema log that must be used to read data files in this Delta streaming source.
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- def prepareForTriggerAvailableNow(): Unit
- Definition Classes
- DeltaSourceBase → SupportsTriggerAvailableNow
- def productElementNames: Iterator[String]
- Definition Classes
- Product
- val readConfigurationsAtSourceInit: Map[String, String]
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- val readPartitionSchemaAtSourceInit: StructType
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- val readProtocolAtSourceInit: Protocol
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- val readSchemaAtSourceInit: StructType
The read schema for this source during initialization, taking in account of SchemaLog.
The read schema for this source during initialization, taking in account of SchemaLog.
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- lazy val readSnapshotDescriptor: SnapshotDescriptor
Create a snapshot descriptor, customizing its metadata using metadata tracking if necessary
Create a snapshot descriptor, customizing its metadata using metadata tracking if necessary
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- def readyToInitializeMetadataTrackingEagerly: Boolean
Whether a schema tracking log is provided (and is empty), so we could initialize eagerly.
Whether a schema tracking log is provided (and is empty), so we could initialize eagerly. This should only be used for the first write to the schema log, after then, schema tracking should not rely on this state any more.
- Attributes
- protected
- Definition Classes
- DeltaSourceMetadataEvolutionSupport
- def recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null, path: Option[Path] = None): Unit
Used to record the occurrence of a single event or report detailed, operation specific statistics.
Used to record the occurrence of a single event or report detailed, operation specific statistics.
- path
Used to log the path of the delta table when
deltaLogis null.
- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: => A): A
Used to report the duration as well as the success or failure of an operation on a
deltaLog.Used to report the duration as well as the success or failure of an operation on a
deltaLog.- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordDeltaOperationForTablePath[A](tablePath: String, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: => A): A
Used to report the duration as well as the success or failure of an operation on a
tahoePath.Used to report the duration as well as the success or failure of an operation on a
tahoePath.- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
- def recordFrameProfile[T](group: String, name: String)(thunk: => T): T
- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = METRIC_OPERATION_DURATION, silent: Boolean = true)(thunk: => S): S
- Definition Classes
- DatabricksLogging
- def recordProductEvent(metric: MetricDefinition with CentralizableMetric, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
- def recordProductUsage(metric: MetricDefinition with CentralizableMetric, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
- def recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
- def reportLatestOffset(): Offset
- Definition Classes
- SupportsAdmissionControl
- val schema: StructType
- Definition Classes
- DeltaSourceBase → Source
- val snapshotAtSourceInit: SnapshotDescriptor
- val spark: SparkSession
- def stop(): Unit
- Definition Classes
- DeltaSource → SparkDataStream
- def stopIndexedFileIteratorAtSchemaChangeBarrier(fileActionScanIter: ClosableIterator[IndexedFile]): ClosableIterator[IndexedFile]
This is called from getFileChangesWithRateLimit() during latestOffset().
This is called from getFileChangesWithRateLimit() during latestOffset().
- Attributes
- protected
- Definition Classes
- DeltaSourceMetadataEvolutionSupport
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- val tableId: String
- Attributes
- protected
- def toDeltaSourceOffset(offset: Offset): DeltaSourceOffset
- def toString(): String
- Definition Classes
- DeltaSource → AnyRef → Any
- def trackingMetadataChange: Boolean
Whether this DeltaSource is utilizing a schema log entry as its read schema.
Whether this DeltaSource is utilizing a schema log entry as its read schema.
If user explicitly turn on the flag to fall back to using latest schema to read (i.e. the legacy mode), we will ignore the schema log.
- Attributes
- protected
- Definition Classes
- DeltaSourceMetadataEvolutionSupport
- def updateMetadataTrackingLogAndFailTheStreamIfNeeded(changedMetadataOpt: Option[Metadata], changedProtocolOpt: Option[Protocol], version: Long, replace: Boolean = false): Unit
Write a new potentially changed metadata into the metadata tracking log.
Write a new potentially changed metadata into the metadata tracking log. Then fail the stream to allow reanalysis if there are changes.
- changedMetadataOpt
Potentially changed metadata action
- changedProtocolOpt
Potentially changed protocol action
- version
The version of change
- Attributes
- protected
- Definition Classes
- DeltaSourceMetadataEvolutionSupport
- def updateMetadataTrackingLogAndFailTheStreamIfNeeded(end: Offset): Unit
Update the current stream schema in the schema tracking log and fail the stream.
Update the current stream schema in the schema tracking log and fail the stream. This is called during commit(). It's ok to fail during commit() because in streaming's semantics, the batch with offset ending at
endshould've already being processed completely.- Attributes
- protected
- Definition Classes
- DeltaSourceMetadataEvolutionSupport
- def validateCommitAndDecideSkipping(actions: Iterator[Action], version: Long, batchStartVersion: Long, batchEndOffsetOpt: Option[DeltaSourceOffset] = None, verifyMetadataAction: Boolean = true): (Boolean, Option[Metadata], Option[Protocol])
Check stream for violating any constraints.
Check stream for violating any constraints.
If verifyMetadataAction = true, we will break the stream when we detect any read-incompatible metadata changes.
- returns
(true if commit should be skipped, a metadata action if found)
- Attributes
- protected
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- def withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: => T): T
Report a log to indicate some command is running.
Report a log to indicate some command is running.
- Definition Classes
- DeltaProgressReporter
- object AdmissionLimits extends Serializable