Class DeterminePartitionCount

java.lang.Object
io.trino.sql.planner.optimizations.DeterminePartitionCount
All Implemented Interfaces:
PlanOptimizer

public class DeterminePartitionCount extends Object implements PlanOptimizer
This rule looks at the amount of data read and processed by the query to determine the value of partition count used for remote partitioned exchanges. It helps to increase the concurrency of the engine in the case of large cluster. This rule is also cautious about lack of or incorrect statistics therefore it skips for input multiplying nodes like CROSS JOIN or UNNEST.

E.g. 1: Given query: SELECT count(column_a) FROM table_with_stats_a group by column_b config: MIN_INPUT_SIZE_PER_TASK: 500 MB Input table data size: 1000 MB Estimated partition count: Input table data size / MIN_INPUT_SIZE_PER_TASK => 2

E.g. 2: Given query: SELECT * FROM table_with_stats_a as a JOIN table_with_stats_b as b ON a.column_b = b.column_b config: MIN_INPUT_SIZE_PER_TASK: 500 MB Input tables data size: 1000 MB Join output data size: 5000 MB Estimated partition count: max((Input table data size / MIN_INPUT_SIZE_PER_TASK), (Join output data size / MIN_INPUT_SIZE_PER_TASK)) => 10