Class SkewedPartitionRebalancer
This rebalancer initialize a bunch of buckets for each task based on a given taskBucketCount and then tries to uniformly distribute partitions across those buckets. This helps to mitigate two problems: 1. Mitigate skewness across tasks. 2. Scale few big partitions across tasks even if there's no skewness among them. This will essentially speed the local scaling without impacting much overall resource utilization.
Example:
Before: 3 tasks, 3 buckets per task, and 2 skewed partitions Task1 Task2 Task3 Bucket1 (Part 1) Bucket1 (Part 2) Bucket1 Bucket2 Bucket2 Bucket2 Bucket3 Bucket3 Bucket3
After rebalancing: Task1 Task2 Task3 Bucket1 (Part 1) Bucket1 (Part 2) Bucket1 (Part 1) Bucket2 (Part 2) Bucket2 (Part 1) Bucket2 (Part 2) Bucket3 Bucket3 Bucket3
-
Method Summary
Modifier and TypeMethodDescriptionvoidaddDataProcessed(long dataSize) voidaddPartitionRowCount(int partition, long rowCount) static booleancheckCanScalePartitionsRemotely(Session session, int taskCount, PartitioningHandle partitioningHandle, NodePartitioningManager nodePartitioningManager) static PartitionFunctioncreatePartitionFunction(Session session, NodePartitioningManager nodePartitioningManager, PartitioningScheme scheme, List<Type> partitionChannelTypes) static SkewedPartitionRebalancercreateSkewedPartitionRebalancer(int partitionCount, int taskCount, int taskPartitionedWriterCount, long minPartitionDataProcessedRebalanceThreshold) intstatic intgetTaskCount(PartitioningScheme partitioningScheme) intgetTaskId(int partitionId, long index) void
-
Method Details
-
checkCanScalePartitionsRemotely
public static boolean checkCanScalePartitionsRemotely(Session session, int taskCount, PartitioningHandle partitioningHandle, NodePartitioningManager nodePartitioningManager) -
createPartitionFunction
public static PartitionFunction createPartitionFunction(Session session, NodePartitioningManager nodePartitioningManager, PartitioningScheme scheme, List<Type> partitionChannelTypes) -
createSkewedPartitionRebalancer
public static SkewedPartitionRebalancer createSkewedPartitionRebalancer(int partitionCount, int taskCount, int taskPartitionedWriterCount, long minPartitionDataProcessedRebalanceThreshold) -
getTaskCount
-
getTaskCount
public int getTaskCount() -
getTaskId
public int getTaskId(int partitionId, long index) -
addDataProcessed
public void addDataProcessed(long dataSize) -
addPartitionRowCount
public void addPartitionRowCount(int partition, long rowCount) -
rebalance
public void rebalance()
-