Class SkewedPartitionRebalancer

java.lang.Object
io.trino.operator.output.SkewedPartitionRebalancer

@ThreadSafe public class SkewedPartitionRebalancer extends Object
Helps in distributing big or skewed partitions across available tasks to improve the performance of partitioned writes.

This rebalancer initialize a bunch of buckets for each task based on a given taskBucketCount and then tries to uniformly distribute partitions across those buckets. This helps to mitigate two problems: 1. Mitigate skewness across tasks. 2. Scale few big partitions across tasks even if there's no skewness among them. This will essentially speed the local scaling without impacting much overall resource utilization.

Example:

Before: 3 tasks, 3 buckets per task, and 2 skewed partitions Task1 Task2 Task3 Bucket1 (Part 1) Bucket1 (Part 2) Bucket1 Bucket2 Bucket2 Bucket2 Bucket3 Bucket3 Bucket3

After rebalancing: Task1 Task2 Task3 Bucket1 (Part 1) Bucket1 (Part 2) Bucket1 (Part 1) Bucket2 (Part 2) Bucket2 (Part 1) Bucket2 (Part 2) Bucket3 Bucket3 Bucket3

  • Constructor Details

    • SkewedPartitionRebalancer

      public SkewedPartitionRebalancer(int partitionCount, int taskCount, int taskBucketCount, long minPartitionDataProcessedRebalanceThreshold, long maxDataProcessedRebalanceThreshold)
  • Method Details

    • checkCanScalePartitionsRemotely

      public static boolean checkCanScalePartitionsRemotely(Session session, int taskCount, PartitioningHandle partitioningHandle, NodePartitioningManager nodePartitioningManager)
    • createPartitionFunction

      public static PartitionFunction createPartitionFunction(Session session, NodePartitioningManager nodePartitioningManager, PartitioningScheme scheme, List<Type> partitionChannelTypes)
    • getMaxWritersBasedOnMemory

      public static int getMaxWritersBasedOnMemory(Session session)
    • getTaskCount

      public static int getTaskCount(PartitioningScheme partitioningScheme)
    • getTaskCount

      public int getTaskCount()
    • getTaskId

      public int getTaskId(int partitionId, long index)
    • addDataProcessed

      public void addDataProcessed(long dataSize)
    • addPartitionRowCount

      public void addPartitionRowCount(int partition, long rowCount)
    • rebalance

      public void rebalance()