case class StatefulOpClusteredDistribution(expressions: Seq[Expression], _requiredNumPartitions: Int) extends Distribution with Product with Serializable
Represents the requirement of distribution on the stateful operator in Structured Streaming.
Each partition in stateful operator initializes state store(s), which are independent with state store(s) in other partitions. Since it is not possible to repartition the data in state store, Spark should make sure the physical partitioning of the stateful operator is unchanged across Spark versions. Violation of this requirement may bring silent correctness issue.
Since this distribution relies on HashPartitioning on the physical partitioning of the
stateful operator, only HashPartitioning (and HashPartitioning in
PartitioningCollection) can satisfy this distribution.
When _requiredNumPartitions is 1, SinglePartition is essentially same as
HashPartitioning, so it can satisfy this distribution as well.
NOTE: This is applied only to stream-stream join as of now. For other stateful operators, we have been using ClusteredDistribution, which could construct the physical partitioning of the state in different way (ClusteredDistribution requires relaxed condition and multiple partitionings can satisfy the requirement.) We need to construct the way to fix this with minimizing possibility to break the existing checkpoints.
TODO(SPARK-38204): address the issue explained in above note.
- Alphabetic
- By Inheritance
- StatefulOpClusteredDistribution
- Serializable
- Product
- Equals
- Distribution
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
- new StatefulOpClusteredDistribution(expressions: Seq[Expression], _requiredNumPartitions: Int)
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- val _requiredNumPartitions: Int
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- def createPartitioning(numPartitions: Int): Partitioning
Creates a default partitioning for this distribution, which can satisfy this distribution while matching the given number of partitions.
Creates a default partitioning for this distribution, which can satisfy this distribution while matching the given number of partitions.
- Definition Classes
- StatefulOpClusteredDistribution → Distribution
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- val expressions: Seq[Expression]
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- def productElementNames: Iterator[String]
- Definition Classes
- Product
- val requiredNumPartitions: Option[Int]
The required number of partitions for this distribution.
The required number of partitions for this distribution. If it's None, then any number of partitions is allowed for this distribution.
- Definition Classes
- StatefulOpClusteredDistribution → Distribution
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()