t

org.apache.spark.sql.delta.sources

DeltaSourceBase

trait DeltaSourceBase extends Source with SupportsAdmissionControl with SupportsTriggerAvailableNow with DeltaLogging

Base trait for the Delta Source, that contains methods that deal with getting changes from the delta log.

Self Type
DeltaSource
Linear Supertypes
DeltaLogging, DatabricksLogging, DeltaProgressReporter, LoggingShims, Logging, SupportsTriggerAvailableNow, SupportsAdmissionControl, Source, SparkDataStream, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DeltaSourceBase
  2. DeltaLogging
  3. DatabricksLogging
  4. DeltaProgressReporter
  5. LoggingShims
  6. Logging
  7. SupportsTriggerAvailableNow
  8. SupportsAdmissionControl
  9. Source
  10. SparkDataStream
  11. AnyRef
  12. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. implicit class LogStringContext extends AnyRef
    Definition Classes
    LoggingShims

Abstract Value Members

  1. abstract def getBatch(start: Option[Offset], end: Offset): DataFrame
    Definition Classes
    Source
  2. abstract def getOffset: Option[Offset]
    Definition Classes
    Source
  3. abstract def latestOffset(arg0: Offset, arg1: ReadLimit): Offset
    Definition Classes
    SupportsAdmissionControl
  4. abstract def latestOffsetInternal(startOffset: Option[DeltaSourceOffset], limit: ReadLimit): Option[DeltaSourceOffset]

    An internal latestOffsetInternal to get the latest offset.

    An internal latestOffsetInternal to get the latest offset.

    Attributes
    protected
  5. abstract def stop(): Unit
    Definition Classes
    SparkDataStream

Concrete Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. lazy val allowUnsafeStreamingReadOnColumnMappingSchemaChanges: Boolean

    Flag that allows user to force enable unsafe streaming read on Delta table with column mapping enabled AND drop/rename actions.

    Flag that allows user to force enable unsafe streaming read on Delta table with column mapping enabled AND drop/rename actions.

    Attributes
    protected
  5. lazy val allowUnsafeStreamingReadOnPartitionColumnChanges: Boolean
    Attributes
    protected
  6. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  7. def checkReadIncompatibleSchemaChangeOnStreamStartOnce(batchStartVersion: Long, batchEndVersionOpt: Option[Long] = None): Unit

    Check read-incompatible schema changes during stream (re)start so we could fail fast.

    Check read-incompatible schema changes during stream (re)start so we could fail fast.

    This only needs to be called ONCE in the life cycle of a stream, either at the very first latestOffset, or the very first getBatch to make sure we have detected an incompatible schema change. Typically, the verifyStreamHygiene that was called maybe good enough to detect these schema changes, there may be cases that wouldn't work, e.g. consider this sequence: 1. User starts a new stream @ startingVersion 1 2. latestOffset is called before getBatch() because there was no previous commits so getBatch won't be called as a recovery mechanism. Suppose there's a single rename/drop/nullability change S during computing next offset, S would look exactly the same as the latest schema so verifyStreamHygiene would not work. 3. latestOffset would return this new offset cross the schema boundary.

    If a schema log is already initialized, we don't have to run the initialization nor schema checks any more.

    batchStartVersion

    Start version we want to verify read compatibility against

    batchEndVersionOpt

    Optionally, if we are checking against an existing constructed batch during streaming initialization, we would also like to verify all schema changes in between as well before we can lazily initialize the schema log if needed.

    Attributes
    protected
  8. def checkReadIncompatibleSchemaChanges(metadata: Metadata, version: Long, batchStartVersion: Long, batchEndVersionOpt: Option[Long] = None, validatedDuringStreamStart: Boolean = false): Unit

    Narrow waist to verify a metadata action for read-incompatible schema changes, specifically: 1.

    Narrow waist to verify a metadata action for read-incompatible schema changes, specifically: 1. Any column mapping related schema changes (rename / drop) columns 2. Standard read-compatibility changes including: a) No missing columns b) No data type changes c) No read-incompatible nullability changes If the check fails, we throw an exception to exit the stream. If lazy log initialization is required, we also run a one time scan to safely initialize the metadata tracking log upon any non-additive schema change failures.

    metadata

    Metadata that contains a potential schema change

    version

    Version for the metadata action

    validatedDuringStreamStart

    Whether this check is being done during stream start.

    Attributes
    protected
  9. def cleanUpSnapshotResources(): Unit
    Attributes
    protected
  10. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  11. def commit(end: Offset): Unit
    Definition Classes
    Source → SparkDataStream
  12. def commit(end: Offset): Unit
    Definition Classes
    Source
  13. def createDataFrame(indexedFiles: Iterator[IndexedFile]): DataFrame

    Given an iterator of file actions, create a DataFrame representing the files added to a table Only AddFile actions will be used to create the DataFrame.

    Given an iterator of file actions, create a DataFrame representing the files added to a table Only AddFile actions will be used to create the DataFrame.

    indexedFiles

    actions iterator from which to generate the DataFrame.

    Attributes
    protected
  14. def createDataFrameBetweenOffsets(startVersion: Long, startIndex: Long, isInitialSnapshot: Boolean, startOffsetOption: Option[DeltaSourceOffset], endOffset: DeltaSourceOffset): DataFrame

    Return the DataFrame between start and end offset.

    Return the DataFrame between start and end offset.

    Attributes
    protected
  15. def deltaAssert(check: ⇒ Boolean, name: String, msg: String, deltaLog: DeltaLog = null, data: AnyRef = null, path: Option[Path] = None): Unit

    Helper method to check invariants in Delta code.

    Helper method to check invariants in Delta code. Fails when running in tests, records a delta assertion event and logs a warning otherwise.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  16. def deserializeOffset(json: String): Offset
    Definition Classes
    Source → SparkDataStream
  17. val emptyDataFrame: DataFrame
    Attributes
    protected
  18. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  19. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  20. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  21. lazy val forceEnableStreamingReadOnReadIncompatibleSchemaChangesDuringStreamStart: Boolean

    Flag that allows user to disable the read-compatibility check during stream start which protects against an corner case in which verifyStreamHygiene could not detect.

    Flag that allows user to disable the read-compatibility check during stream start which protects against an corner case in which verifyStreamHygiene could not detect. This is a bug fix but yet a potential behavior change, so we add a flag to fallback.

    Attributes
    protected
  22. lazy val forceEnableUnsafeReadOnNullabilityChange: Boolean

    Flag that allow user to fallback to the legacy behavior in which user can allow nullable=false schema to read nullable=true data, which is incorrect but a behavior change regardless.

    Flag that allow user to fallback to the legacy behavior in which user can allow nullable=false schema to read nullable=true data, which is incorrect but a behavior change regardless.

    Attributes
    protected
  23. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  24. def getCommonTags(deltaLog: DeltaLog, tahoeId: String): Map[TagDefinition, String]
    Definition Classes
    DeltaLogging
  25. def getDefaultReadLimit(): ReadLimit
    Definition Classes
    SupportsAdmissionControl
  26. def getErrorData(e: Throwable): Map[String, Any]
    Definition Classes
    DeltaLogging
  27. def getFileChangesAndCreateDataFrame(startVersion: Long, startIndex: Long, isInitialSnapshot: Boolean, endOffset: DeltaSourceOffset): DataFrame

    get the changes from startVersion, startIndex to the end

    get the changes from startVersion, startIndex to the end

    startVersion

    - calculated starting version

    startIndex

    - calculated starting index

    isInitialSnapshot

    - whether the stream has to return the initial snapshot or not

    endOffset

    - Offset that signifies the end of the stream.

    Attributes
    protected
  28. def getFileChangesWithRateLimit(fromVersion: Long, fromIndex: Long, isInitialSnapshot: Boolean, limits: Option[AdmissionLimits] = Some(AdmissionLimits())): ClosableIterator[IndexedFile]
    Attributes
    protected
  29. def getNextOffsetFromPreviousOffset(previousOffset: DeltaSourceOffset, limits: Option[AdmissionLimits]): Option[DeltaSourceOffset]

    Return the next offset when previous offset exists.

    Return the next offset when previous offset exists.

    Attributes
    protected
  30. def getStartingOffsetFromSpecificDeltaVersion(fromVersion: Long, isInitialSnapshot: Boolean, limits: Option[AdmissionLimits]): Option[DeltaSourceOffset]

    Returns the offset that starts from a specific delta table version.

    Returns the offset that starts from a specific delta table version. This function is called when starting a new stream query.

    fromVersion

    The version of the delta table to calculate the offset from.

    isInitialSnapshot

    Whether the delta version is for the initial snapshot or not.

    limits

    Indicates how much data can be processed by a micro batch.

    Attributes
    protected
  31. val hasCheckedReadIncompatibleSchemaChangesOnStreamStart: Boolean

    A global flag to mark whether we have done a per-stream start check for column mapping schema changes (rename / drop).

    A global flag to mark whether we have done a per-stream start check for column mapping schema changes (rename / drop).

    Attributes
    protected
    Annotations
    @volatile()
  32. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  33. def initForTriggerAvailableNowIfNeeded(startOffsetOpt: Option[DeltaSourceOffset]): Unit

    initialize the internal states for AvailableNow if this method is called first time after prepareForTriggerAvailableNow.

    initialize the internal states for AvailableNow if this method is called first time after prepareForTriggerAvailableNow.

    Attributes
    protected
  34. def initLastOffsetForTriggerAvailableNow(startOffsetOpt: Option[DeltaSourceOffset]): Unit
    Attributes
    protected
  35. def initialOffset(): Offset
    Definition Classes
    Source → SparkDataStream
  36. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  37. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  38. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  39. val isStreamingFromColumnMappingTable: Boolean

    Whether we are streaming from a table with column mapping enabled

    Whether we are streaming from a table with column mapping enabled

    Attributes
    protected
  40. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  41. val lastOffsetForTriggerAvailableNow: Option[DeltaSourceOffset]

    When AvailableNow is used, this offset will be the upper bound where this run of the query will process up.

    When AvailableNow is used, this offset will be the upper bound where this run of the query will process up. We may run multiple micro batches, but the query will stop itself when it reaches this offset.

    Attributes
    protected
  42. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  43. def logConsole(line: String): Unit
    Definition Classes
    DatabricksLogging
  44. def logDebug(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  45. def logDebug(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  46. def logDebug(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  47. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  48. def logError(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  49. def logError(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  50. def logError(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  51. def logError(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  52. def logInfo(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  53. def logInfo(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  54. def logInfo(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  55. def logInfo(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  56. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  57. def logTrace(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  58. def logTrace(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  59. def logTrace(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  60. def logTrace(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  61. def logWarning(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  62. def logWarning(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  63. def logWarning(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  64. def logWarning(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  65. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  66. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  67. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  68. val persistedMetadataAtSourceInit: Option[PersistedMetadata]

    The persisted schema from the schema log that must be used to read data files in this Delta streaming source.

    The persisted schema from the schema log that must be used to read data files in this Delta streaming source.

    Attributes
    protected
  69. def prepareForTriggerAvailableNow(): Unit
    Definition Classes
    DeltaSourceBase → SupportsTriggerAvailableNow
  70. val readConfigurationsAtSourceInit: Map[String, String]
    Attributes
    protected
  71. val readPartitionSchemaAtSourceInit: StructType
    Attributes
    protected
  72. val readProtocolAtSourceInit: Protocol
    Attributes
    protected
  73. val readSchemaAtSourceInit: StructType

    The read schema for this source during initialization, taking in account of SchemaLog.

    The read schema for this source during initialization, taking in account of SchemaLog.

    Attributes
    protected
  74. lazy val readSnapshotDescriptor: SnapshotDescriptor

    Create a snapshot descriptor, customizing its metadata using metadata tracking if necessary

    Create a snapshot descriptor, customizing its metadata using metadata tracking if necessary

    Attributes
    protected
  75. def recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null, path: Option[Path] = None): Unit

    Used to record the occurrence of a single event or report detailed, operation specific statistics.

    Used to record the occurrence of a single event or report detailed, operation specific statistics.

    path

    Used to log the path of the delta table when deltaLog is null.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  76. def recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A

    Used to report the duration as well as the success or failure of an operation on a deltaLog.

    Used to report the duration as well as the success or failure of an operation on a deltaLog.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  77. def recordDeltaOperationForTablePath[A](tablePath: String, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A

    Used to report the duration as well as the success or failure of an operation on a tahoePath.

    Used to report the duration as well as the success or failure of an operation on a tahoePath.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  78. def recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
    Definition Classes
    DatabricksLogging
  79. def recordFrameProfile[T](group: String, name: String)(thunk: ⇒ T): T
    Attributes
    protected
    Definition Classes
    DeltaLogging
  80. def recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = METRIC_OPERATION_DURATION, silent: Boolean = true)(thunk: ⇒ S): S
    Definition Classes
    DatabricksLogging
  81. def recordProductEvent(metric: MetricDefinition with CentralizableMetric, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
    Definition Classes
    DatabricksLogging
  82. def recordProductUsage(metric: MetricDefinition with CentralizableMetric, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
    Definition Classes
    DatabricksLogging
  83. def recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
    Definition Classes
    DatabricksLogging
  84. def reportLatestOffset(): Offset
    Definition Classes
    SupportsAdmissionControl
  85. val schema: StructType
    Definition Classes
    DeltaSourceBase → Source
  86. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  87. def toString(): String
    Definition Classes
    AnyRef → Any
  88. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  89. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  90. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  91. def withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: ⇒ T): T

    Report a log to indicate some command is running.

    Report a log to indicate some command is running.

    Definition Classes
    DeltaProgressReporter

Inherited from DeltaLogging

Inherited from DatabricksLogging

Inherited from DeltaProgressReporter

Inherited from LoggingShims

Inherited from Logging

Inherited from SupportsTriggerAvailableNow

Inherited from SupportsAdmissionControl

Inherited from Source

Inherited from SparkDataStream

Inherited from AnyRef

Inherited from Any

Ungrouped