trait HasPartitionKey extends InputPartition
A mix-in for input partitions whose records are clustered on the same set of partition keys
(provided via SupportsReportPartitioning, see below). Data sources can opt-in to
implement this interface for the partitions they report to Spark, which will use the
information to avoid data shuffling in certain scenarios, such as join, aggregate, etc. Note
that Spark requires ALL input partitions to implement this interface, otherwise it can't take
advantage of it.
This interface should be used in combination with SupportsReportPartitioning, which
allows data sources to report distribution and ordering spec to Spark. In particular, Spark
expects data sources to report
org.apache.spark.sql.connector.distributions.ClusteredDistribution whenever its input
partitions implement this interface. Spark derives partition keys spec (e.g., column names,
transforms) from the distribution, and partition values from the input partitions.
It is implementor's responsibility to ensure that when an input partition implements this interface, its records all have the same value for the partition keys. Spark doesn't check this property.
- Since
3.3.0
- See also
org.apache.spark.sql.connector.read.SupportsReportPartitioning
org.apache.spark.sql.connector.read.partitioning.Partitioning
- Alphabetic
- By Inheritance
- HasPartitionKey
- InputPartition
- Serializable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Abstract Value Members
-
abstract
def
partitionKey(): InternalRow
Returns the value of the partition key(s) associated to this partition.
Returns the value of the partition key(s) associated to this partition. An input partition implementing this interface needs to ensure that all its records have the same value for the partition keys. Note that the value is after partition transform has been applied, if there is any.
Concrete Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
preferredLocations(): Array[String]
The preferred locations where the input partition reader returned by this partition can run faster, but Spark does not guarantee to run the input partition reader on these locations.
The preferred locations where the input partition reader returned by this partition can run faster, but Spark does not guarantee to run the input partition reader on these locations. The implementations should make sure that it can be run on any location. The location is a string representing the host name.
Note that if a host name cannot be recognized by Spark, it will be ignored as it was not in the returned locations. The default return value is empty string array, which means this input partition's reader has no location preference.
If this method fails (by throwing an exception), the action will fail and no Spark job will be submitted.
- Definition Classes
- InputPartition
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()