package sources
- Alphabetic
- Public
- All
Type Members
-
class
DeltaDataSource extends RelationProvider with StreamSourceProvider with StreamSinkProvider with CreatableRelationProvider with DataSourceRegister with DeltaLogging
A DataSource V1 for integrating Delta into Spark SQL batch and Streaming APIs.
-
class
DeltaSink extends Sink with ImplicitMetadataOperation with DeltaLogging
A streaming sink that writes data into a Delta Table.
-
case class
DeltaSource(spark: SparkSession, deltaLog: DeltaLog, options: DeltaOptions, filters: Seq[Expression] = Nil) extends Source with DeltaLogging with Product with Serializable
A streaming source for a Delta table.
A streaming source for a Delta table.
When a new stream is started, delta starts by constructing a org.apache.spark.sql.delta.Snapshot at the current version of the table. This snapshot is broken up into batches until all existing data has been processed. Subsequent processing is done by tailing the change log looking for new data. This results in the streaming query returning the same answer as a batch query that had processed the entire dataset at any given point.
-
case class
DeltaSourceOffset(sourceVersion: Long, reservoirId: String, reservoirVersion: Long, index: Long, isStartingVersion: Boolean) extends Offset with Product with Serializable
Tracks how far we processed in when reading changes from the DeltaLog.
Tracks how far we processed in when reading changes from the DeltaLog.
Note this class retains the naming of
Reservoirto maintain compatibility with serialized offsets from the beta period.- sourceVersion
The version of serialization that this offset is encoded with.
- reservoirId
The id of the table we are reading from. Used to detect misconfiguration when restarting a query.
- reservoirVersion
The version of the table that we are current processing.
- index
The index in the sequence of AddFiles in this version. Used to break large commits into multiple batches. This index is created by sorting on modificationTimestamp and path.
- isStartingVersion
Whether this offset denotes a query that is starting rather than processing changes. When starting a new query, we first process all data present in the table at the start and then move on to processing new data that has arrived.
Value Members
- object DeltaDataSource extends DatabricksLogging
-
object
DeltaSQLConf
SQLConf entries for Delta features.
- object DeltaSourceOffset extends Serializable
- object DeltaSourceUtils