case class HashClusteredDistribution(expressions: Seq[Expression], requiredNumPartitions: Option[Int] = None) extends Distribution with Product with Serializable
Represents data where tuples have been clustered according to the hash of the given
expressions. Since this distribution relies on HashPartitioning on the physical
partitioning, only HashPartitioning (and HashPartitioning in PartitioningCollection)
can satisfy this distribution. When requiredNumPartitions is Some(1), SinglePartition
is essentially same as HashPartitioning, so it can satisfy this distribution as well.
This distribution is used majorly to represent the requirement of distribution on the stateful operator in Structured Streaming, but this can be used for other cases as well.
NOTE: Each partition in stateful operator initializes state store(s), which are independent with state store(s) in other partitions. Since it is not possible to repartition the data in state store, Spark should make sure the physical partitioning of the stateful operator is unchanged across Spark versions. Violation of this requirement may bring silent correctness issue.
- Alphabetic
- By Inheritance
- HashClusteredDistribution
- Serializable
- Serializable
- Product
- Equals
- Distribution
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
- new HashClusteredDistribution(expressions: Seq[Expression], requiredNumPartitions: Option[Int] = None)
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
createPartitioning(numPartitions: Int): Partitioning
Creates a default partitioning for this distribution, which can satisfy this distribution while matching the given number of partitions.
Creates a default partitioning for this distribution, which can satisfy this distribution while matching the given number of partitions.
- Definition Classes
- HashClusteredDistribution → Distribution
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- val expressions: Seq[Expression]
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
val
requiredNumPartitions: Option[Int]
The required number of partitions for this distribution.
The required number of partitions for this distribution. If it's None, then any number of partitions is allowed for this distribution.
- Definition Classes
- HashClusteredDistribution → Distribution
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()