Class SkewedPartitionRebalancer
This rebalancer initialize a bunch of buckets for each task based on a given taskBucketCount and then tries to uniformly distribute partitions across those buckets. This helps to mitigate two problems: 1. Mitigate skewness across tasks. 2. Scale few big partitions across tasks even if there's no skewness among them. This will essentially speed the local scaling without impacting much overall resource utilization.
Example:
Before: 3 tasks, 3 buckets per task, and 2 skewed partitions Task1 Task2 Task3 Bucket1 (Part 1) Bucket1 (Part 2) Bucket1 Bucket2 Bucket2 Bucket2 Bucket3 Bucket3 Bucket3
After rebalancing: Task1 Task2 Task3 Bucket1 (Part 1) Bucket1 (Part 2) Bucket1 (Part 1) Bucket2 (Part 2) Bucket2 (Part 1) Bucket2 (Part 2) Bucket3 Bucket3 Bucket3
-
Constructor Summary
ConstructorsConstructorDescriptionSkewedPartitionRebalancer(int partitionCount, int taskCount, int taskBucketCount, long minPartitionDataProcessedRebalanceThreshold, long maxDataProcessedRebalanceThreshold) -
Method Summary
Modifier and TypeMethodDescriptionvoidaddDataProcessed(long dataSize) voidaddPartitionRowCount(int partition, long rowCount) static booleancheckCanScalePartitionsRemotely(Session session, int taskCount, PartitioningHandle partitioningHandle, NodePartitioningManager nodePartitioningManager) static PartitionFunctioncreatePartitionFunction(Session session, NodePartitioningManager nodePartitioningManager, PartitioningScheme scheme, List<Type> partitionChannelTypes) static intgetMaxWritersBasedOnMemory(Session session) intstatic intgetTaskCount(PartitioningScheme partitioningScheme) intgetTaskId(int partitionId, long index) void
-
Constructor Details
-
SkewedPartitionRebalancer
public SkewedPartitionRebalancer(int partitionCount, int taskCount, int taskBucketCount, long minPartitionDataProcessedRebalanceThreshold, long maxDataProcessedRebalanceThreshold)
-
-
Method Details
-
checkCanScalePartitionsRemotely
public static boolean checkCanScalePartitionsRemotely(Session session, int taskCount, PartitioningHandle partitioningHandle, NodePartitioningManager nodePartitioningManager) -
createPartitionFunction
public static PartitionFunction createPartitionFunction(Session session, NodePartitioningManager nodePartitioningManager, PartitioningScheme scheme, List<Type> partitionChannelTypes) -
getMaxWritersBasedOnMemory
-
getTaskCount
-
getTaskCount
public int getTaskCount() -
getTaskId
public int getTaskId(int partitionId, long index) -
addDataProcessed
public void addDataProcessed(long dataSize) -
addPartitionRowCount
public void addPartitionRowCount(int partition, long rowCount) -
rebalance
public void rebalance()
-