trait DeltaSourceBase extends Source with SupportsAdmissionControl with SupportsTriggerAvailableNow with DeltaLogging
Base trait for the Delta Source, that contains methods that deal with getting changes from the delta log.
- Self Type
- DeltaSource
- Alphabetic
- By Inheritance
- DeltaSourceBase
- DeltaLogging
- DatabricksLogging
- DeltaProgressReporter
- LoggingShims
- Logging
- SupportsTriggerAvailableNow
- SupportsAdmissionControl
- Source
- SparkDataStream
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Type Members
- implicit class LogStringContext extends AnyRef
- Definition Classes
- LoggingShims
Abstract Value Members
- abstract def getBatch(start: Option[Offset], end: Offset): DataFrame
- Definition Classes
- Source
- abstract def getOffset: Option[Offset]
- Definition Classes
- Source
- abstract def latestOffset(arg0: Offset, arg1: ReadLimit): Offset
- Definition Classes
- SupportsAdmissionControl
- abstract def latestOffsetInternal(startOffset: Option[DeltaSourceOffset], limit: ReadLimit): Option[DeltaSourceOffset]
An internal
latestOffsetInternalto get the latest offset.An internal
latestOffsetInternalto get the latest offset.- Attributes
- protected
- abstract def stop(): Unit
- Definition Classes
- SparkDataStream
Concrete Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- lazy val allowUnsafeStreamingReadOnColumnMappingSchemaChanges: Boolean
Flag that allows user to force enable unsafe streaming read on Delta table with column mapping enabled AND drop/rename actions.
Flag that allows user to force enable unsafe streaming read on Delta table with column mapping enabled AND drop/rename actions.
- Attributes
- protected
- lazy val allowUnsafeStreamingReadOnPartitionColumnChanges: Boolean
- Attributes
- protected
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def checkReadIncompatibleSchemaChangeOnStreamStartOnce(batchStartVersion: Long, batchEndVersionOpt: Option[Long] = None): Unit
Check read-incompatible schema changes during stream (re)start so we could fail fast.
Check read-incompatible schema changes during stream (re)start so we could fail fast.
This only needs to be called ONCE in the life cycle of a stream, either at the very first latestOffset, or the very first getBatch to make sure we have detected an incompatible schema change. Typically, the verifyStreamHygiene that was called maybe good enough to detect these schema changes, there may be cases that wouldn't work, e.g. consider this sequence: 1. User starts a new stream @ startingVersion 1 2. latestOffset is called before getBatch() because there was no previous commits so getBatch won't be called as a recovery mechanism. Suppose there's a single rename/drop/nullability change S during computing next offset, S would look exactly the same as the latest schema so verifyStreamHygiene would not work. 3. latestOffset would return this new offset cross the schema boundary.
If a schema log is already initialized, we don't have to run the initialization nor schema checks any more.
- batchStartVersion
Start version we want to verify read compatibility against
- batchEndVersionOpt
Optionally, if we are checking against an existing constructed batch during streaming initialization, we would also like to verify all schema changes in between as well before we can lazily initialize the schema log if needed.
- Attributes
- protected
- def checkReadIncompatibleSchemaChanges(metadata: Metadata, version: Long, batchStartVersion: Long, batchEndVersionOpt: Option[Long] = None, validatedDuringStreamStart: Boolean = false): Unit
Narrow waist to verify a metadata action for read-incompatible schema changes, specifically: 1.
Narrow waist to verify a metadata action for read-incompatible schema changes, specifically: 1. Any column mapping related schema changes (rename / drop) columns 2. Standard read-compatibility changes including: a) No missing columns b) No data type changes c) No read-incompatible nullability changes If the check fails, we throw an exception to exit the stream. If lazy log initialization is required, we also run a one time scan to safely initialize the metadata tracking log upon any non-additive schema change failures.
- metadata
Metadata that contains a potential schema change
- version
Version for the metadata action
- validatedDuringStreamStart
Whether this check is being done during stream start.
- Attributes
- protected
- def cleanUpSnapshotResources(): Unit
- Attributes
- protected
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- def commit(end: Offset): Unit
- Definition Classes
- Source → SparkDataStream
- def commit(end: Offset): Unit
- Definition Classes
- Source
- def createDataFrame(indexedFiles: Iterator[IndexedFile]): DataFrame
Given an iterator of file actions, create a DataFrame representing the files added to a table Only AddFile actions will be used to create the DataFrame.
Given an iterator of file actions, create a DataFrame representing the files added to a table Only AddFile actions will be used to create the DataFrame.
- indexedFiles
actions iterator from which to generate the DataFrame.
- Attributes
- protected
- def createDataFrameBetweenOffsets(startVersion: Long, startIndex: Long, isInitialSnapshot: Boolean, startOffsetOption: Option[DeltaSourceOffset], endOffset: DeltaSourceOffset): DataFrame
Return the DataFrame between start and end offset.
Return the DataFrame between start and end offset.
- Attributes
- protected
- def deltaAssert(check: => Boolean, name: String, msg: String, deltaLog: DeltaLog = null, data: AnyRef = null, path: Option[Path] = None): Unit
Helper method to check invariants in Delta code.
Helper method to check invariants in Delta code. Fails when running in tests, records a delta assertion event and logs a warning otherwise.
- Attributes
- protected
- Definition Classes
- DeltaLogging
- def deserializeOffset(json: String): Offset
- Definition Classes
- Source → SparkDataStream
- val emptyDataFrame: DataFrame
- Attributes
- protected
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- lazy val forceEnableStreamingReadOnReadIncompatibleSchemaChangesDuringStreamStart: Boolean
Flag that allows user to disable the read-compatibility check during stream start which protects against an corner case in which verifyStreamHygiene could not detect.
Flag that allows user to disable the read-compatibility check during stream start which protects against an corner case in which verifyStreamHygiene could not detect. This is a bug fix but yet a potential behavior change, so we add a flag to fallback.
- Attributes
- protected
- lazy val forceEnableUnsafeReadOnNullabilityChange: Boolean
Flag that allow user to fallback to the legacy behavior in which user can allow nullable=false schema to read nullable=true data, which is incorrect but a behavior change regardless.
Flag that allow user to fallback to the legacy behavior in which user can allow nullable=false schema to read nullable=true data, which is incorrect but a behavior change regardless.
- Attributes
- protected
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def getCommonTags(deltaLog: DeltaLog, tahoeId: String): Map[TagDefinition, String]
- Definition Classes
- DeltaLogging
- def getDefaultReadLimit(): ReadLimit
- Definition Classes
- SupportsAdmissionControl
- def getErrorData(e: Throwable): Map[String, Any]
- Definition Classes
- DeltaLogging
- def getFileChangesAndCreateDataFrame(startVersion: Long, startIndex: Long, isInitialSnapshot: Boolean, endOffset: DeltaSourceOffset): DataFrame
get the changes from startVersion, startIndex to the end
get the changes from startVersion, startIndex to the end
- startVersion
- calculated starting version
- startIndex
- calculated starting index
- isInitialSnapshot
- whether the stream has to return the initial snapshot or not
- endOffset
- Offset that signifies the end of the stream.
- Attributes
- protected
- def getFileChangesWithRateLimit(fromVersion: Long, fromIndex: Long, isInitialSnapshot: Boolean, limits: Option[AdmissionLimits] = Some(AdmissionLimits())): ClosableIterator[IndexedFile]
- Attributes
- protected
- def getNextOffsetFromPreviousOffset(previousOffset: DeltaSourceOffset, limits: Option[AdmissionLimits]): Option[DeltaSourceOffset]
Return the next offset when previous offset exists.
Return the next offset when previous offset exists.
- Attributes
- protected
- def getStartingOffsetFromSpecificDeltaVersion(fromVersion: Long, isInitialSnapshot: Boolean, limits: Option[AdmissionLimits]): Option[DeltaSourceOffset]
Returns the offset that starts from a specific delta table version.
Returns the offset that starts from a specific delta table version. This function is called when starting a new stream query.
- fromVersion
The version of the delta table to calculate the offset from.
- isInitialSnapshot
Whether the delta version is for the initial snapshot or not.
- limits
Indicates how much data can be processed by a micro batch.
- Attributes
- protected
- val hasCheckedReadIncompatibleSchemaChangesOnStreamStart: Boolean
A global flag to mark whether we have done a per-stream start check for column mapping schema changes (rename / drop).
A global flag to mark whether we have done a per-stream start check for column mapping schema changes (rename / drop).
- Attributes
- protected
- Annotations
- @volatile()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def initForTriggerAvailableNowIfNeeded(startOffsetOpt: Option[DeltaSourceOffset]): Unit
initialize the internal states for AvailableNow if this method is called first time after
prepareForTriggerAvailableNow.initialize the internal states for AvailableNow if this method is called first time after
prepareForTriggerAvailableNow.- Attributes
- protected
- def initLastOffsetForTriggerAvailableNow(startOffsetOpt: Option[DeltaSourceOffset]): Unit
- Attributes
- protected
- def initialOffset(): Offset
- Definition Classes
- Source → SparkDataStream
- def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- val isStreamingFromColumnMappingTable: Boolean
Whether we are streaming from a table with column mapping enabled
Whether we are streaming from a table with column mapping enabled
- Attributes
- protected
- def isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- val lastOffsetForTriggerAvailableNow: Option[DeltaSourceOffset]
When
AvailableNowis used, this offset will be the upper bound where this run of the query will process up.When
AvailableNowis used, this offset will be the upper bound where this run of the query will process up. We may run multiple micro batches, but the query will stop itself when it reaches this offset.- Attributes
- protected
- def log: Logger
- Attributes
- protected
- Definition Classes
- Logging
- def logConsole(line: String): Unit
- Definition Classes
- DatabricksLogging
- def logDebug(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logDebug(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logDebug(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logError(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logError(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logInfo(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logInfo(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logName: String
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logTrace(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logTrace(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logWarning(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logWarning(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- val persistedMetadataAtSourceInit: Option[PersistedMetadata]
The persisted schema from the schema log that must be used to read data files in this Delta streaming source.
The persisted schema from the schema log that must be used to read data files in this Delta streaming source.
- Attributes
- protected
- def prepareForTriggerAvailableNow(): Unit
- Definition Classes
- DeltaSourceBase → SupportsTriggerAvailableNow
- val readConfigurationsAtSourceInit: Map[String, String]
- Attributes
- protected
- val readPartitionSchemaAtSourceInit: StructType
- Attributes
- protected
- val readProtocolAtSourceInit: Protocol
- Attributes
- protected
- val readSchemaAtSourceInit: StructType
The read schema for this source during initialization, taking in account of SchemaLog.
The read schema for this source during initialization, taking in account of SchemaLog.
- Attributes
- protected
- lazy val readSnapshotDescriptor: SnapshotDescriptor
Create a snapshot descriptor, customizing its metadata using metadata tracking if necessary
Create a snapshot descriptor, customizing its metadata using metadata tracking if necessary
- Attributes
- protected
- def recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null, path: Option[Path] = None): Unit
Used to record the occurrence of a single event or report detailed, operation specific statistics.
Used to record the occurrence of a single event or report detailed, operation specific statistics.
- path
Used to log the path of the delta table when
deltaLogis null.
- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: => A): A
Used to report the duration as well as the success or failure of an operation on a
deltaLog.Used to report the duration as well as the success or failure of an operation on a
deltaLog.- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordDeltaOperationForTablePath[A](tablePath: String, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: => A): A
Used to report the duration as well as the success or failure of an operation on a
tahoePath.Used to report the duration as well as the success or failure of an operation on a
tahoePath.- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
- def recordFrameProfile[T](group: String, name: String)(thunk: => T): T
- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = METRIC_OPERATION_DURATION, silent: Boolean = true)(thunk: => S): S
- Definition Classes
- DatabricksLogging
- def recordProductEvent(metric: MetricDefinition with CentralizableMetric, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
- def recordProductUsage(metric: MetricDefinition with CentralizableMetric, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
- def recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
- def reportLatestOffset(): Offset
- Definition Classes
- SupportsAdmissionControl
- val schema: StructType
- Definition Classes
- DeltaSourceBase → Source
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- def withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: => T): T
Report a log to indicate some command is running.
Report a log to indicate some command is running.
- Definition Classes
- DeltaProgressReporter