package streaming
Type Members
-
trait
AcceptsLatestSeenOffset extends SparkDataStream
Indicates that the source accepts the latest seen offset, which requires streaming execution to provide the latest seen offset when restarting the streaming query from checkpoint.
Indicates that the source accepts the latest seen offset, which requires streaming execution to provide the latest seen offset when restarting the streaming query from checkpoint.
Note that this interface aims to only support DSv2 streaming sources. Spark may throw error if the interface is implemented along with DSv1 streaming sources.
The callback method will be called once per run.
- Since
3.3.0
-
final
class
CompositeReadLimit extends ReadLimit
/** Represents a
ReadLimitwhere theMicroBatchStreamshould scan approximately given maximum number of rows with at least the given minimum number of rows./** Represents a
ReadLimitwhere theMicroBatchStreamshould scan approximately given maximum number of rows with at least the given minimum number of rows.- Annotations
- @Evolving()
- Since
3.2.0
- See also
SupportsAdmissionControl#latestOffset(Offset, ReadLimit)
-
trait
ContinuousPartitionReader[T] extends PartitionReader[T]
A variation on
PartitionReaderfor use with continuous streaming processing.A variation on
PartitionReaderfor use with continuous streaming processing.- Annotations
- @Evolving()
- Since
3.0.0
-
trait
ContinuousPartitionReaderFactory extends PartitionReaderFactory
A variation on
PartitionReaderFactorythat returnsContinuousPartitionReaderinstead ofPartitionReader.A variation on
PartitionReaderFactorythat returnsContinuousPartitionReaderinstead ofPartitionReader. It's used for continuous streaming processing.- Annotations
- @Evolving()
- Since
3.0.0
-
trait
ContinuousStream extends SparkDataStream
A
SparkDataStreamfor streaming queries with continuous mode.A
SparkDataStreamfor streaming queries with continuous mode.- Annotations
- @Evolving()
- Since
3.0.0
-
trait
MicroBatchStream extends SparkDataStream
A
SparkDataStreamfor streaming queries with micro-batch mode.A
SparkDataStreamfor streaming queries with micro-batch mode.- Annotations
- @Evolving()
- Since
3.0.0
-
abstract
class
Offset extends AnyRef
An abstract representation of progress through a
MicroBatchStreamorContinuousStream.An abstract representation of progress through a
MicroBatchStreamorContinuousStream.During execution, offsets provided by the data source implementation will be logged and used as restart checkpoints. Each source should provide an offset implementation which the source can use to reconstruct a position in the stream up to which data has been seen/processed.
- Annotations
- @Evolving()
- Since
3.0.0
-
trait
PartitionOffset extends Serializable
Used for per-partition offsets in continuous processing.
Used for per-partition offsets in continuous processing. ContinuousReader implementations will provide a method to merge these into a global Offset.
These offsets must be serializable.
- Annotations
- @Evolving()
- Since
3.0.0
-
final
class
ReadAllAvailable extends ReadLimit
Represents a
ReadLimitwhere theMicroBatchStreammust scan all the data available at the streaming source.Represents a
ReadLimitwhere theMicroBatchStreammust scan all the data available at the streaming source. This is meant to be a hard specification as being able to return all available data is necessary forTrigger.Once()to work correctly. If a source is unable to scan all available data, then it must throw an error.- Annotations
- @Evolving()
- Since
3.0.0
- See also
SupportsAdmissionControl#latestOffset(Offset, ReadLimit)
-
trait
ReadLimit extends AnyRef
Interface representing limits on how much to read from a
MicroBatchStreamwhen it implementsSupportsAdmissionControl.Interface representing limits on how much to read from a
MicroBatchStreamwhen it implementsSupportsAdmissionControl. There are several child interfaces representing various kinds of limits.- Annotations
- @Evolving()
- Since
3.0.0
- See also
SupportsAdmissionControl#latestOffset(Offset, ReadLimit)
ReadAllAvailable
ReadMaxRows
-
class
ReadMaxFiles extends ReadLimit
Represents a
ReadLimitwhere theMicroBatchStreamshould scan approximately the given maximum number of files.Represents a
ReadLimitwhere theMicroBatchStreamshould scan approximately the given maximum number of files.- Annotations
- @Evolving()
- Since
3.0.0
- See also
SupportsAdmissionControl#latestOffset(Offset, ReadLimit)
-
final
class
ReadMaxRows extends ReadLimit
Represents a
ReadLimitwhere theMicroBatchStreamshould scan approximately the given maximum number of rows.Represents a
ReadLimitwhere theMicroBatchStreamshould scan approximately the given maximum number of rows.- Annotations
- @Evolving()
- Since
3.0.0
- See also
SupportsAdmissionControl#latestOffset(Offset, ReadLimit)
-
final
class
ReadMinRows extends ReadLimit
Represents a
ReadLimitwhere theMicroBatchStreamshould scan approximately at least the given minimum number of rows.Represents a
ReadLimitwhere theMicroBatchStreamshould scan approximately at least the given minimum number of rows.- Annotations
- @Evolving()
- Since
3.2.0
- See also
SupportsAdmissionControl#latestOffset(Offset, ReadLimit)
-
trait
ReportsSinkMetrics extends AnyRef
A mix-in interface for streaming sinks to signal that they can report metrics.
A mix-in interface for streaming sinks to signal that they can report metrics.
- Annotations
- @Evolving()
- Since
3.4.0
-
trait
ReportsSourceMetrics extends SparkDataStream
A mix-in interface for
SparkDataStreamstreaming sources to signal that they can report metrics.A mix-in interface for
SparkDataStreamstreaming sources to signal that they can report metrics.- Annotations
- @Evolving()
- Since
3.2.0
-
trait
SparkDataStream extends AnyRef
The base interface representing a readable data stream in a Spark streaming query.
The base interface representing a readable data stream in a Spark streaming query. It's responsible to manage the offsets of the streaming source in the streaming query.
Data sources should implement concrete data stream interfaces:
MicroBatchStreamandContinuousStream.- Annotations
- @Evolving()
- Since
3.0.0
-
trait
SupportsAdmissionControl extends SparkDataStream
A mix-in interface for
SparkDataStreamstreaming sources to signal that they can control the rate of data ingested into the system.A mix-in interface for
SparkDataStreamstreaming sources to signal that they can control the rate of data ingested into the system. These rate limits can come implicitly from the contract of triggers, e.g. Trigger.Once() requires that a micro-batch process all data available to the system at the start of the micro-batch. Alternatively, sources can decide to limit ingest through data source options.Through this interface, a MicroBatchStream should be able to return the next offset that it will process until given a
ReadLimit.- Annotations
- @Evolving()
- Since
3.0.0
-
trait
SupportsTriggerAvailableNow extends SupportsAdmissionControl
An interface for streaming sources that supports running in Trigger.AvailableNow mode, which will process all the available data at the beginning of the query in (possibly) multiple batches.
An interface for streaming sources that supports running in Trigger.AvailableNow mode, which will process all the available data at the beginning of the query in (possibly) multiple batches.
This mode will have better scalability comparing to Trigger.Once mode.
- Annotations
- @Evolving()
- Since
3.3.0