trait DeltaSourceBase extends Source with SupportsAdmissionControl with SupportsTriggerAvailableNow with DeltaLogging
Base trait for the Delta Source, that contains methods that deal with getting changes from the delta log.
- Self Type
- DeltaSource
- Alphabetic
- By Inheritance
- DeltaSourceBase
- DeltaLogging
- DatabricksLogging
- DeltaProgressReporter
- LoggingShims
- Logging
- SupportsTriggerAvailableNow
- SupportsAdmissionControl
- Source
- SparkDataStream
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
-
implicit
class
LogStringContext extends AnyRef
- Definition Classes
- LoggingShims
Abstract Value Members
-
abstract
def
getBatch(start: Option[Offset], end: Offset): DataFrame
- Definition Classes
- Source
-
abstract
def
getOffset: Option[Offset]
- Definition Classes
- Source
-
abstract
def
latestOffset(arg0: Offset, arg1: ReadLimit): Offset
- Definition Classes
- SupportsAdmissionControl
-
abstract
def
latestOffsetInternal(startOffset: Option[DeltaSourceOffset], limit: ReadLimit): Option[DeltaSourceOffset]
An internal
latestOffsetInternalto get the latest offset.An internal
latestOffsetInternalto get the latest offset.- Attributes
- protected
-
abstract
def
stop(): Unit
- Definition Classes
- SparkDataStream
Concrete Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
lazy val
allowUnsafeStreamingReadOnColumnMappingSchemaChanges: Boolean
Flag that allows user to force enable unsafe streaming read on Delta table with column mapping enabled AND drop/rename actions.
Flag that allows user to force enable unsafe streaming read on Delta table with column mapping enabled AND drop/rename actions.
- Attributes
- protected
-
lazy val
allowUnsafeStreamingReadOnPartitionColumnChanges: Boolean
- Attributes
- protected
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
checkReadIncompatibleSchemaChangeOnStreamStartOnce(batchStartVersion: Long, batchEndVersionOpt: Option[Long] = None): Unit
Check read-incompatible schema changes during stream (re)start so we could fail fast.
Check read-incompatible schema changes during stream (re)start so we could fail fast.
This only needs to be called ONCE in the life cycle of a stream, either at the very first latestOffset, or the very first getBatch to make sure we have detected an incompatible schema change. Typically, the verifyStreamHygiene that was called maybe good enough to detect these schema changes, there may be cases that wouldn't work, e.g. consider this sequence: 1. User starts a new stream @ startingVersion 1 2. latestOffset is called before getBatch() because there was no previous commits so getBatch won't be called as a recovery mechanism. Suppose there's a single rename/drop/nullability change S during computing next offset, S would look exactly the same as the latest schema so verifyStreamHygiene would not work. 3. latestOffset would return this new offset cross the schema boundary.
If a schema log is already initialized, we don't have to run the initialization nor schema checks any more.
- batchStartVersion
Start version we want to verify read compatibility against
- batchEndVersionOpt
Optionally, if we are checking against an existing constructed batch during streaming initialization, we would also like to verify all schema changes in between as well before we can lazily initialize the schema log if needed.
- Attributes
- protected
-
def
checkReadIncompatibleSchemaChanges(metadata: Metadata, version: Long, batchStartVersion: Long, batchEndVersionOpt: Option[Long] = None, validatedDuringStreamStart: Boolean = false): Unit
Narrow waist to verify a metadata action for read-incompatible schema changes, specifically: 1.
Narrow waist to verify a metadata action for read-incompatible schema changes, specifically: 1. Any column mapping related schema changes (rename / drop) columns 2. Standard read-compatibility changes including: a) No missing columns b) No data type changes c) No read-incompatible nullability changes If the check fails, we throw an exception to exit the stream. If lazy log initialization is required, we also run a one time scan to safely initialize the metadata tracking log upon any non-additive schema change failures.
- metadata
Metadata that contains a potential schema change
- version
Version for the metadata action
- validatedDuringStreamStart
Whether this check is being done during stream start.
- Attributes
- protected
-
def
cleanUpSnapshotResources(): Unit
- Attributes
- protected
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
commit(end: Offset): Unit
- Definition Classes
- Source → SparkDataStream
-
def
commit(end: Offset): Unit
- Definition Classes
- Source
-
def
createDataFrame(indexedFiles: Iterator[IndexedFile]): DataFrame
Given an iterator of file actions, create a DataFrame representing the files added to a table Only AddFile actions will be used to create the DataFrame.
Given an iterator of file actions, create a DataFrame representing the files added to a table Only AddFile actions will be used to create the DataFrame.
- indexedFiles
actions iterator from which to generate the DataFrame.
- Attributes
- protected
-
def
createDataFrameBetweenOffsets(startVersion: Long, startIndex: Long, isInitialSnapshot: Boolean, startOffsetOption: Option[DeltaSourceOffset], endOffset: DeltaSourceOffset): DataFrame
Return the DataFrame between start and end offset.
Return the DataFrame between start and end offset.
- Attributes
- protected
-
def
deltaAssert(check: ⇒ Boolean, name: String, msg: String, deltaLog: DeltaLog = null, data: AnyRef = null, path: Option[Path] = None): Unit
Helper method to check invariants in Delta code.
Helper method to check invariants in Delta code. Fails when running in tests, records a delta assertion event and logs a warning otherwise.
- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
deserializeOffset(json: String): Offset
- Definition Classes
- Source → SparkDataStream
-
val
emptyDataFrame: DataFrame
- Attributes
- protected
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
lazy val
forceEnableStreamingReadOnReadIncompatibleSchemaChangesDuringStreamStart: Boolean
Flag that allows user to disable the read-compatibility check during stream start which protects against an corner case in which verifyStreamHygiene could not detect.
Flag that allows user to disable the read-compatibility check during stream start which protects against an corner case in which verifyStreamHygiene could not detect. This is a bug fix but yet a potential behavior change, so we add a flag to fallback.
- Attributes
- protected
-
lazy val
forceEnableUnsafeReadOnNullabilityChange: Boolean
Flag that allow user to fallback to the legacy behavior in which user can allow nullable=false schema to read nullable=true data, which is incorrect but a behavior change regardless.
Flag that allow user to fallback to the legacy behavior in which user can allow nullable=false schema to read nullable=true data, which is incorrect but a behavior change regardless.
- Attributes
- protected
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getCommonTags(deltaLog: DeltaLog, tahoeId: String): Map[TagDefinition, String]
- Definition Classes
- DeltaLogging
-
def
getDefaultReadLimit(): ReadLimit
- Definition Classes
- SupportsAdmissionControl
-
def
getErrorData(e: Throwable): Map[String, Any]
- Definition Classes
- DeltaLogging
-
def
getFileChangesAndCreateDataFrame(startVersion: Long, startIndex: Long, isInitialSnapshot: Boolean, endOffset: DeltaSourceOffset): DataFrame
get the changes from startVersion, startIndex to the end
get the changes from startVersion, startIndex to the end
- startVersion
- calculated starting version
- startIndex
- calculated starting index
- isInitialSnapshot
- whether the stream has to return the initial snapshot or not
- endOffset
- Offset that signifies the end of the stream.
- Attributes
- protected
-
def
getFileChangesWithRateLimit(fromVersion: Long, fromIndex: Long, isInitialSnapshot: Boolean, limits: Option[AdmissionLimits] = Some(AdmissionLimits())): ClosableIterator[IndexedFile]
- Attributes
- protected
-
def
getNextOffsetFromPreviousOffset(previousOffset: DeltaSourceOffset, limits: Option[AdmissionLimits]): Option[DeltaSourceOffset]
Return the next offset when previous offset exists.
Return the next offset when previous offset exists.
- Attributes
- protected
-
def
getStartingOffsetFromSpecificDeltaVersion(fromVersion: Long, isInitialSnapshot: Boolean, limits: Option[AdmissionLimits]): Option[DeltaSourceOffset]
Returns the offset that starts from a specific delta table version.
Returns the offset that starts from a specific delta table version. This function is called when starting a new stream query.
- fromVersion
The version of the delta table to calculate the offset from.
- isInitialSnapshot
Whether the delta version is for the initial snapshot or not.
- limits
Indicates how much data can be processed by a micro batch.
- Attributes
- protected
-
val
hasCheckedReadIncompatibleSchemaChangesOnStreamStart: Boolean
A global flag to mark whether we have done a per-stream start check for column mapping schema changes (rename / drop).
A global flag to mark whether we have done a per-stream start check for column mapping schema changes (rename / drop).
- Attributes
- protected
- Annotations
- @volatile()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
initForTriggerAvailableNowIfNeeded(startOffsetOpt: Option[DeltaSourceOffset]): Unit
initialize the internal states for AvailableNow if this method is called first time after
prepareForTriggerAvailableNow.initialize the internal states for AvailableNow if this method is called first time after
prepareForTriggerAvailableNow.- Attributes
- protected
-
def
initLastOffsetForTriggerAvailableNow(startOffsetOpt: Option[DeltaSourceOffset]): Unit
- Attributes
- protected
-
def
initialOffset(): Offset
- Definition Classes
- Source → SparkDataStream
-
def
initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
val
isStreamingFromColumnMappingTable: Boolean
Whether we are streaming from a table with column mapping enabled
Whether we are streaming from a table with column mapping enabled
- Attributes
- protected
-
def
isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
val
lastOffsetForTriggerAvailableNow: Option[DeltaSourceOffset]
When
AvailableNowis used, this offset will be the upper bound where this run of the query will process up.When
AvailableNowis used, this offset will be the upper bound where this run of the query will process up. We may run multiple micro batches, but the query will stop itself when it reaches this offset.- Attributes
- protected
-
def
log: Logger
- Attributes
- protected
- Definition Classes
- Logging
-
def
logConsole(line: String): Unit
- Definition Classes
- DatabricksLogging
-
def
logDebug(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logDebug(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logDebug(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logError(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logError(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logInfo(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logInfo(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logName: String
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logTrace(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logTrace(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logWarning(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logWarning(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
val
persistedMetadataAtSourceInit: Option[PersistedMetadata]
The persisted schema from the schema log that must be used to read data files in this Delta streaming source.
The persisted schema from the schema log that must be used to read data files in this Delta streaming source.
- Attributes
- protected
-
def
prepareForTriggerAvailableNow(): Unit
- Definition Classes
- DeltaSourceBase → SupportsTriggerAvailableNow
-
val
readConfigurationsAtSourceInit: Map[String, String]
- Attributes
- protected
-
val
readPartitionSchemaAtSourceInit: StructType
- Attributes
- protected
-
val
readProtocolAtSourceInit: Protocol
- Attributes
- protected
-
val
readSchemaAtSourceInit: StructType
The read schema for this source during initialization, taking in account of SchemaLog.
The read schema for this source during initialization, taking in account of SchemaLog.
- Attributes
- protected
-
lazy val
readSnapshotDescriptor: SnapshotDescriptor
Create a snapshot descriptor, customizing its metadata using metadata tracking if necessary
Create a snapshot descriptor, customizing its metadata using metadata tracking if necessary
- Attributes
- protected
-
def
recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null, path: Option[Path] = None): Unit
Used to record the occurrence of a single event or report detailed, operation specific statistics.
Used to record the occurrence of a single event or report detailed, operation specific statistics.
- path
Used to log the path of the delta table when
deltaLogis null.
- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A
Used to report the duration as well as the success or failure of an operation on a
deltaLog.Used to report the duration as well as the success or failure of an operation on a
deltaLog.- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordDeltaOperationForTablePath[A](tablePath: String, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A
Used to report the duration as well as the success or failure of an operation on a
tahoePath.Used to report the duration as well as the success or failure of an operation on a
tahoePath.- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
-
def
recordFrameProfile[T](group: String, name: String)(thunk: ⇒ T): T
- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = METRIC_OPERATION_DURATION, silent: Boolean = true)(thunk: ⇒ S): S
- Definition Classes
- DatabricksLogging
-
def
recordProductEvent(metric: MetricDefinition with CentralizableMetric, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
-
def
recordProductUsage(metric: MetricDefinition with CentralizableMetric, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
-
def
recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
-
def
reportLatestOffset(): Offset
- Definition Classes
- SupportsAdmissionControl
-
val
schema: StructType
- Definition Classes
- DeltaSourceBase → Source
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: ⇒ T): T
Report a log to indicate some command is running.
Report a log to indicate some command is running.
- Definition Classes
- DeltaProgressReporter