case class StatefulOpClusteredDistribution(expressions: Seq[Expression], _requiredNumPartitions: Int) extends Distribution with Product with Serializable
Represents the requirement of distribution on the stateful operator in Structured Streaming.
Each partition in stateful operator initializes state store(s), which are independent with state store(s) in other partitions. Since it is not possible to repartition the data in state store, Spark should make sure the physical partitioning of the stateful operator is unchanged across Spark versions. Violation of this requirement may bring silent correctness issue.
Since this distribution relies on HashPartitioning on the physical partitioning of the
stateful operator, only HashPartitioning (and HashPartitioning in
PartitioningCollection) can satisfy this distribution.
When _requiredNumPartitions is 1, SinglePartition is essentially same as
HashPartitioning, so it can satisfy this distribution as well.
NOTE: This is applied only to stream-stream join as of now. For other stateful operators, we have been using ClusteredDistribution, which could construct the physical partitioning of the state in different way (ClusteredDistribution requires relaxed condition and multiple partitionings can satisfy the requirement.) We need to construct the way to fix this with minimizing possibility to break the existing checkpoints.
TODO(SPARK-38204): address the issue explained in above note.
- Alphabetic
- By Inheritance
- StatefulOpClusteredDistribution
- Serializable
- Serializable
- Product
- Equals
- Distribution
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
- new StatefulOpClusteredDistribution(expressions: Seq[Expression], _requiredNumPartitions: Int)
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- val _requiredNumPartitions: Int
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
createPartitioning(numPartitions: Int): Partitioning
Creates a default partitioning for this distribution, which can satisfy this distribution while matching the given number of partitions.
Creates a default partitioning for this distribution, which can satisfy this distribution while matching the given number of partitions.
- Definition Classes
- StatefulOpClusteredDistribution → Distribution
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- val expressions: Seq[Expression]
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
val
requiredNumPartitions: Option[Int]
The required number of partitions for this distribution.
The required number of partitions for this distribution. If it's None, then any number of partitions is allowed for this distribution.
- Definition Classes
- StatefulOpClusteredDistribution → Distribution
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()