Packages

c

org.apache.spark.sql.execution.streaming

PropagateWatermarkSimulator

class PropagateWatermarkSimulator extends WatermarkPropagator with Logging

This implementation simulates propagation of watermark among operators.

It is considered a "simulation" because watermarks are not being physically sent between operators, but rather propagated up the tree via post-order (children first) traversal of the query plan. This allows Structured Streaming to determine the new (input watermark, output watermark) for all nodes.

For each node, below logic is applied:

- Input watermark for specific node is decided by min(output watermarks from all children). -- Children providing no input watermark (DEFAULT_WATERMARK_MS) are excluded. -- If there is no valid input watermark from children, input watermark = DEFAULT_WATERMARK_MS. - Output watermark for specific node is decided as following: -- watermark nodes: origin watermark value This could be individual origin watermark value, but we decide to retain global watermark to keep the watermark model be simple. -- stateless nodes: same as input watermark -- stateful nodes: the return value of op.produceOutputWatermark(input watermark).

See also

StateStoreWriter.produceOutputWatermark Note that this implementation will throw an exception if watermark node sees a valid input watermark from children, meaning that we do not support re-definition of watermark. Once the algorithm traverses the physical plan tree, the association between stateful operator and input watermark will be constructed. Spark will request the input watermark for specific stateful operator, which this implementation will give the value from the association. We skip simulation of propagation for the value of watermark as 0. Input watermark for every operator will be 0. (This may not be expected for the case op.produceOutputWatermark returns higher than the input watermark, but it won't happen in most practical cases.)

Linear Supertypes
Logging, WatermarkPropagator, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. PropagateWatermarkSimulator
  2. Logging
  3. WatermarkPropagator
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new PropagateWatermarkSimulator()

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  6. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  10. def getInputWatermarkForEviction(batchId: Long, stateOpId: Long): Long

    Provide the calculated input watermark for eviction for given stateful operator.

    Provide the calculated input watermark for eviction for given stateful operator.

    Definition Classes
    PropagateWatermarkSimulatorWatermarkPropagator
  11. def getInputWatermarkForLateEvents(batchId: Long, stateOpId: Long): Long

    Provide the calculated input watermark for late events for given stateful operator.

    Provide the calculated input watermark for late events for given stateful operator.

    Definition Classes
    PropagateWatermarkSimulatorWatermarkPropagator
  12. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  13. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  14. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  15. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  16. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  17. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  18. def logDebug(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  19. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  20. def logError(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  21. def logError(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  22. def logInfo(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  23. def logInfo(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  24. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  25. def logTrace(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  26. def logTrace(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  27. def logWarning(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  28. def logWarning(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  29. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  30. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  31. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  32. def propagate(batchId: Long, plan: SparkPlan, originWatermark: Long): Unit

    Request to propagate watermark among operators based on origin watermark value.

    Request to propagate watermark among operators based on origin watermark value. The result should be input watermark per stateful operator, which Spark will request the value by calling getInputWatermarkXXX with operator ID.

    It is recommended for implementation to cache the result, as Spark can request the propagation multiple times with the same batch ID and origin watermark value.

    Definition Classes
    PropagateWatermarkSimulatorWatermarkPropagator
  33. def purge(batchId: Long): Unit

    Request to clean up cached result on propagation.

    Request to clean up cached result on propagation. Spark will call this method when the given batch ID will be likely to be not re-executed.

    Definition Classes
    PropagateWatermarkSimulatorWatermarkPropagator
  34. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  35. def toString(): String
    Definition Classes
    AnyRef → Any
  36. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  37. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  38. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()

Inherited from Logging

Inherited from WatermarkPropagator

Inherited from AnyRef

Inherited from Any

Ungrouped