Packages

package sources

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. class DeltaDataSource extends RelationProvider with StreamSourceProvider with StreamSinkProvider with CreatableRelationProvider with DataSourceRegister with DeltaLogging

    A DataSource V1 for integrating Delta into Spark SQL batch and Streaming APIs.

  2. class DeltaSink extends Sink with ImplicitMetadataOperation with DeltaLogging

    A streaming sink that writes data into a Delta Table.

  3. case class DeltaSource(spark: SparkSession, deltaLog: DeltaLog, options: DeltaOptions, filters: Seq[Expression] = Nil) extends Source with DeltaLogging with Product with Serializable

    A streaming source for a Delta table.

    A streaming source for a Delta table.

    When a new stream is started, delta starts by constructing a org.apache.spark.sql.delta.Snapshot at the current version of the table. This snapshot is broken up into batches until all existing data has been processed. Subsequent processing is done by tailing the change log looking for new data. This results in the streaming query returning the same answer as a batch query that had processed the entire dataset at any given point.

  4. case class DeltaSourceOffset(sourceVersion: Long, reservoirId: String, reservoirVersion: Long, index: Long, isStartingVersion: Boolean) extends Offset with Product with Serializable

    Tracks how far we processed in when reading changes from the DeltaLog.

    Tracks how far we processed in when reading changes from the DeltaLog.

    Note this class retains the naming of Reservoir to maintain compatibility with serialized offsets from the beta period.

    sourceVersion

    The version of serialization that this offset is encoded with.

    reservoirId

    The id of the table we are reading from. Used to detect misconfiguration when restarting a query.

    reservoirVersion

    The version of the table that we are current processing.

    index

    The index in the sequence of AddFiles in this version. Used to break large commits into multiple batches. This index is created by sorting on modificationTimestamp and path.

    isStartingVersion

    Whether this offset denotes a query that is starting rather than processing changes. When starting a new query, we first process all data present in the table at the start and then move on to processing new data that has arrived.

Value Members

  1. object DeltaDataSource extends DatabricksLogging
  2. object DeltaSQLConf

    SQLConf entries for Delta features.

  3. object DeltaSourceOffset extends Serializable
  4. object DeltaSourceUtils

Ungrouped