Packages

package optimizer

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. sealed abstract class BuildSide extends AnyRef
  2. case class Cost(card: BigInt, size: BigInt) extends Product with Serializable

    This class defines the cost model for a plan.

    This class defines the cost model for a plan.

    card

    Cardinality (number of rows).

    size

    Size in bytes.

  3. case class JoinGraphInfo(starJoins: Set[Int], nonStarJoins: Set[Int]) extends Product with Serializable

    Helper class that keeps information about the join graph as sets of item/plan ids.

    Helper class that keeps information about the join graph as sets of item/plan ids. It currently stores the star/non-star plans. It can be extended with the set of connected/unconnected plans.

  4. trait JoinSelectionHelper extends AnyRef
  5. case class NormalizeNaNAndZero(child: Expression) extends UnaryExpression with ExpectsInputTypes with Product with Serializable
  6. abstract class Optimizer extends RuleExecutor[LogicalPlan]

    Abstract class all optimizers should inherit of, contains the standard batches (extending Optimizers can override this.

  7. case class OrderedJoin(left: LogicalPlan, right: LogicalPlan, joinType: JoinType, condition: Option[Expression]) extends LogicalPlan with BinaryNode with Product with Serializable

    This is a mimic class for a join node that has been ordered.

  8. abstract class PropagateEmptyRelationBase extends Rule[LogicalPlan] with CastSupport

    The base class of two rules in the normal and AQE Optimizer.

    The base class of two rules in the normal and AQE Optimizer. It simplifies query plans with empty or non-empty relations:

    1. Binary-node Logical Plans
      • Join with one or two empty children (including Intersect/Except).
      • Left semi Join Right side is non-empty and condition is empty. Eliminate join to its left side.
      • Left anti join Right side is non-empty and condition is empty. Eliminate join to an empty LocalRelation. 2. Unary-node Logical Plans
      • Limit/Repartition with all empty children.
      • Aggregate with all empty children and at least one grouping expression.
      • Generate(Explode) with all empty children. Others like Hive UDTF may return results.
  9. case class ReplaceCurrentLike(catalogManager: CatalogManager) extends Rule[LogicalPlan] with Product with Serializable

    Replaces the expression of CurrentDatabase with the current database name.

    Replaces the expression of CurrentDatabase with the current database name. Replaces the expression of CurrentCatalog with the current catalog name.

  10. class SimpleTestOptimizer extends Optimizer

Value Members

  1. object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper

    Simplifies boolean expressions: 1.

    Simplifies boolean expressions: 1. Simplifies expressions whose answer can be determined without evaluating both sides. 2. Eliminates / extracts common factors. 3. Merge same expressions 4. Removes Not operator.

  2. object BuildLeft extends BuildSide with Product with Serializable
  3. object BuildRight extends BuildSide with Product with Serializable
  4. object CheckCartesianProducts extends Rule[LogicalPlan] with PredicateHelper

    Check if there any cartesian products between joins of any type in the optimized plan tree.

    Check if there any cartesian products between joins of any type in the optimized plan tree. Throw an error if a cartesian product is found without an explicit cross join specified. This rule is effectively disabled if the CROSS_JOINS_ENABLED flag is true.

    This rule must be run AFTER the ReorderJoin rule since the join conditions for each join must be collected before checking if it is a cartesian product. If you have SELECT * from R, S where R.r = S.s, the join between R and S is not a cartesian product and therefore should be allowed. The predicate R.r = S.s is not recognized as a join condition until the ReorderJoin rule.

    This rule must be run AFTER the batch "LocalRelation", since a join with empty relation should not be a cartesian product.

  5. object CollapseProject extends Rule[LogicalPlan] with AliasHelper

    Combines two Project operators into one and perform alias substitution, merging the expressions into one single expression for the following cases.

    Combines two Project operators into one and perform alias substitution, merging the expressions into one single expression for the following cases. 1. When two Project operators are adjacent. 2. When two Project operators have LocalLimit/Sample/Repartition operator between them and the upper project consists of the same number of columns which is equal or aliasing. GlobalLimit(LocalLimit) pattern is also considered.

  6. object CollapseRepartition extends Rule[LogicalPlan]

    Combines adjacent RepartitionOperation operators

  7. object CollapseWindow extends Rule[LogicalPlan]

    Collapse Adjacent Window Expression.

    Collapse Adjacent Window Expression. - If the partition specs and order specs are the same and the window expression are independent and are of the same window function type, collapse into the parent.

  8. object ColumnPruning extends Rule[LogicalPlan]

    Attempts to eliminate the reading of unneeded columns from the query plan.

    Attempts to eliminate the reading of unneeded columns from the query plan.

    Since adding Project before Filter conflicts with PushPredicatesThroughProject, this rule will remove the Project p2 in the following pattern:

    p1 @ Project(_, Filter(_, p2 @ Project(_, child))) if p2.outputSet.subsetOf(p2.inputSet)

    p2 is usually inserted by this rule and useless, p1 could prune the columns anyway.

  9. object CombineConcats extends Rule[LogicalPlan]

    Combine nested Concat expressions.

  10. object CombineFilters extends Rule[LogicalPlan] with PredicateHelper

    Combines two adjacent Filter operators into one, merging the non-redundant conditions into one conjunctive predicate.

  11. object CombineTypedFilters extends Rule[LogicalPlan]

    Combines two adjacent TypedFilters, which operate on same type object in condition, into one, merging the filter functions into one conjunctive function.

  12. object CombineUnions extends Rule[LogicalPlan]

    Combines all adjacent Union operators into a single Union.

  13. object ComputeCurrentTime extends Rule[LogicalPlan]

    Computes the current date and time to make sure we return the same result in a single query.

  14. object ConstantFolding extends Rule[LogicalPlan]

    Replaces Expressions that can be statically evaluated with equivalent Literal values.

  15. object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper

    Substitutes Attributes which can be statically evaluated with their corresponding value in conjunctive Expressions e.g.

    Substitutes Attributes which can be statically evaluated with their corresponding value in conjunctive Expressions e.g.

    SELECT * FROM table WHERE i = 5 AND j = i + 3
    ==>  SELECT * FROM table WHERE i = 5 AND j = 8

    Approach used: - Populate a mapping of attribute => constant value by looking at all the equals predicates - Using this mapping, replace occurrence of the attributes with the corresponding constant values in the AND node.

  16. object ConvertToLocalRelation extends Rule[LogicalPlan]

    Converts local operations (i.e.

    Converts local operations (i.e. ones that don't require data exchange) on LocalRelation to another LocalRelation.

  17. object CostBasedJoinReorder extends Rule[LogicalPlan] with PredicateHelper

    Cost-based join reorder.

    Cost-based join reorder. We may have several join reorder algorithms in the future. This class is the entry of these algorithms, and chooses which one to use.

  18. object DecimalAggregates extends Rule[LogicalPlan]

    Speeds up aggregates on fixed-precision decimals by executing them on unscaled Long values.

    Speeds up aggregates on fixed-precision decimals by executing them on unscaled Long values.

    This uses the same rules for increasing the precision and scale of the output as org.apache.spark.sql.catalyst.analysis.DecimalPrecision.

  19. object DecorrelateInnerQuery extends PredicateHelper

    Decorrelate the inner query by eliminating outer references and create domain joins.

    Decorrelate the inner query by eliminating outer references and create domain joins. The implementation is based on the paper: Unnesting Arbitrary Queries by Thomas Neumann and Alfons Kemper. https://dl.gi.de/handle/20.500.12116/2418.

    A correlated subquery can be viewed as a "dependent" nested loop join between the outer and the inner query. For each row produced by the outer query, we bind the OuterReferences in in the inner query with the corresponding values in the row, and then evaluate the inner query.

    Dependent Join :- Outer Query +- Inner Query

    If the OuterReferences are bound to the same value, the inner query will return the same result. Based on this, we can reduce the times to evaluate the inner query by first getting all distinct values of the OuterReferences.

    Normal Join :- Outer Query +- Dependent Join :- Inner Query +- Distinct Aggregate (outer_ref1, outer_ref2, ...) +- Outer Query

    The distinct aggregate of the outer references is called a "domain", and the dependent join between the inner query and the domain is called a "domain join". We need to push down the domain join through the inner query until there is no outer reference in the sub-tree and the domain join will turn into a normal join.

    The decorrelation function returns a new query plan with optional placeholder DomainJoinss added and a list of join conditions with the outer query. DomainJoins need to be rewritten into actual inner join between the inner query sub-tree and the outer query.

    E.g. decorrelate an inner query with equality predicates:

    SELECT (SELECT MIN(b) FROM t1 WHERE t2.c = t1.a) FROM t2

    Aggregate [] [min(b)] Aggregate [a] [min(b), a] +- Filter (outer(c) = a) => +- Relation [t1] +- Relation [t1]

    Join conditions: [c = a]

    E.g. decorrelate an inner query with non-equality predicates:

    SELECT (SELECT MIN(b) FROM t1 WHERE t2.c > t1.a) FROM t2

    Aggregate [] [min(b)] Aggregate [c'] [min(b), c'] +- Filter (outer(c) > a) => +- Filter (c' > a) +- Relation [t1] +- DomainJoin [c'] +- Relation [t1]

    Join conditions: [c <=> c']

  20. object EliminateAggregateFilter extends Rule[LogicalPlan]

    Remove useless FILTER clause for aggregate expressions.

    Remove useless FILTER clause for aggregate expressions. This rule should be applied before RewriteDistinctAggregates.

  21. object EliminateDistinct extends Rule[LogicalPlan]

    Remove useless DISTINCT for MAX and MIN.

    Remove useless DISTINCT for MAX and MIN. This rule should be applied before RewriteDistinctAggregates.

  22. object EliminateLimits extends Rule[LogicalPlan]

    This rule optimizes Limit operators by: 1.

    This rule optimizes Limit operators by: 1. Eliminate Limit/GlobalLimit operators if it's child max row <= limit. 2. Combines two adjacent Limit operators into one, merging the expressions into one single expression.

  23. object EliminateMapObjects extends Rule[LogicalPlan]

    Removes MapObjects when the following conditions are satisfied

    Removes MapObjects when the following conditions are satisfied

    1. Mapobject(... lambdavariable(..., false) ...), which means types for input and output are primitive types with non-nullable 2. no custom collection class specified representation of data item.
  24. object EliminateOuterJoin extends Rule[LogicalPlan] with PredicateHelper

    1.

    1. Elimination of outer joins, if the predicates can restrict the result sets so that all null-supplying rows are eliminated

    - full outer -> inner if both sides have such predicates - left outer -> inner if the right side has such predicates - right outer -> inner if the left side has such predicates - full outer -> left outer if only the left side has such predicates - full outer -> right outer if only the right side has such predicates

    2. Removes outer join if it only has distinct on streamed side

    SELECT DISTINCT f1 FROM t1 LEFT JOIN t2 ON t1.id = t2.id  ==>  SELECT DISTINCT f1 FROM t1

    This rule should be executed before pushing down the Filter

  25. object EliminateResolvedHint extends Rule[LogicalPlan]

    Replaces ResolvedHint operators from the plan.

    Replaces ResolvedHint operators from the plan. Move the HintInfo to associated Join operators, otherwise remove it if no Join operator is matched.

  26. object EliminateSerialization extends Rule[LogicalPlan]

    Removes cases where we are unnecessarily going between the object and serialized (InternalRow) representation of data item.

    Removes cases where we are unnecessarily going between the object and serialized (InternalRow) representation of data item. For example back to back map operations.

  27. object EliminateSorts extends Rule[LogicalPlan]

    Removes Sort operations if they don't affect the final output ordering.

    Removes Sort operations if they don't affect the final output ordering. Note that changes in the final output ordering may affect the file size (SPARK-32318). This rule handles the following cases: 1) if the child maximum number of rows less than or equal to 1 2) if the sort order is empty or the sort order does not have any reference 3) if the Sort operator is a local sort and the child is already sorted 4) if there is another Sort operator separated by 0...n Project, Filter, Repartition or RepartitionByExpression (with deterministic expressions) operators 5) if the Sort operator is within Join separated by 0...n Project, Filter, Repartition or RepartitionByExpression (with deterministic expressions) operators only and the Join condition is deterministic 6) if the Sort operator is within GroupBy separated by 0...n Project, Filter, Repartition or RepartitionByExpression (with deterministic expressions) operators only and the aggregate function is order irrelevant

  28. object ExtractPythonUDFFromJoinCondition extends Rule[LogicalPlan] with PredicateHelper

    PythonUDF in join condition can't be evaluated if it refers to attributes from both join sides.

    PythonUDF in join condition can't be evaluated if it refers to attributes from both join sides. See ExtractPythonUDFs for details. This rule will detect un-evaluable PythonUDF and pull them out from join condition.

  29. object FoldablePropagation extends Rule[LogicalPlan]

    Replace attributes with aliases of the original foldable expressions if possible.

    Replace attributes with aliases of the original foldable expressions if possible. Other optimizations will take advantage of the propagated foldable expressions. For example, this rule can optimize

    SELECT 1.0 x, 'abc' y, Now() z ORDER BY x, y, 3

    to

    SELECT 1.0 x, 'abc' y, Now() z ORDER BY 1.0, 'abc', Now()

    and other rules can further optimize it and remove the ORDER BY operator.

  30. object GeneratorNestedColumnAliasing

    This prunes unnecessary nested columns from Generate, or Project -> Generate

  31. object InferFiltersFromConstraints extends Rule[LogicalPlan] with PredicateHelper with ConstraintHelper

    Generate a list of additional filters from an operator's existing constraint but remove those that are either already part of the operator's condition or are part of the operator's child constraints.

    Generate a list of additional filters from an operator's existing constraint but remove those that are either already part of the operator's condition or are part of the operator's child constraints. These filters are currently inserted to the existing conditions in the Filter operators and on either side of Join operators.

    Note: While this optimization is applicable to a lot of types of join, it primarily benefits Inner and LeftSemi joins.

  32. object InferFiltersFromGenerate extends Rule[LogicalPlan]

    Infers filters from Generate, such that rows that would have been removed by this Generate can be removed earlier - before joins and in data sources.

  33. object InlineCTE extends Rule[LogicalPlan]

    Inlines CTE definitions into corresponding references if either of the conditions satisfies: 1.

    Inlines CTE definitions into corresponding references if either of the conditions satisfies: 1. The CTE definition does not contain any non-deterministic expressions. If this CTE definition references another CTE definition that has non-deterministic expressions, it is still OK to inline the current CTE definition. 2. The CTE definition is only referenced once throughout the main query and all the subqueries.

    In addition, due to the complexity of correlated subqueries, all CTE references in correlated subqueries are inlined regardless of the conditions above.

  34. object JoinReorderDP extends PredicateHelper with Logging

    Reorder the joins using a dynamic programming algorithm.

    Reorder the joins using a dynamic programming algorithm. This implementation is based on the paper: Access Path Selection in a Relational Database Management System. https://dl.acm.org/doi/10.1145/582095.582099

    First we put all items (basic joined nodes) into level 0, then we build all two-way joins at level 1 from plans at level 0 (single items), then build all 3-way joins from plans at previous levels (two-way joins and single items), then 4-way joins ... etc, until we build all n-way joins and pick the best plan among them.

    When building m-way joins, we only keep the best plan (with the lowest cost) for the same set of m items. E.g., for 3-way joins, we keep only the best plan for items {A, B, C} among plans (A J B) J C, (A J C) J B and (B J C) J A. We also prune cartesian product candidates when building a new plan if there exists no join condition involving references from both left and right. This pruning strategy significantly reduces the search space. E.g., given A J B J C J D with join conditions A.k1 = B.k1 and B.k2 = C.k2 and C.k3 = D.k3, plans maintained for each level are as follows: level 0: p({A}), p({B}), p({C}), p({D}) level 1: p({A, B}), p({B, C}), p({C, D}) level 2: p({A, B, C}), p({B, C, D}) level 3: p({A, B, C, D}) where p({A, B, C, D}) is the final output plan.

    For cost evaluation, since physical costs for operators are not available currently, we use cardinalities and sizes to compute costs.

  35. object JoinReorderDPFilters extends PredicateHelper

    Implements optional filters to reduce the search space for join enumeration.

    Implements optional filters to reduce the search space for join enumeration.

    1) Star-join filters: Plan star-joins together since they are assumed to have an optimal execution based on their RI relationship. 2) Cartesian products: Defer their planning later in the graph to avoid large intermediate results (expanding joins, in general). 3) Composite inners: Don't generate "bushy tree" plans to avoid materializing intermediate results.

    Filters (2) and (3) are not implemented.

  36. object LikeSimplification extends Rule[LogicalPlan]

    Simplifies LIKE expressions that do not need full regular expressions to evaluate the condition.

    Simplifies LIKE expressions that do not need full regular expressions to evaluate the condition. For example, when the expression is just checking to see if a string starts with a given pattern.

  37. object LimitPushDown extends Rule[LogicalPlan]

    Pushes down LocalLimit beneath UNION ALL and joins.

  38. object LimitPushDownThroughWindow extends Rule[LogicalPlan]

    Pushes down LocalLimit beneath WINDOW.

    Pushes down LocalLimit beneath WINDOW. This rule optimizes the following case:

    SELECT *, ROW_NUMBER() OVER(ORDER BY a) AS rn FROM Tab1 LIMIT 5 ==>
    SELECT *, ROW_NUMBER() OVER(ORDER BY a) AS rn FROM (SELECT * FROM Tab1 ORDER BY a LIMIT 5) t
  39. object NestedColumnAliasing

    This aims to handle a nested column aliasing pattern inside the ColumnPruning optimizer rule.

    This aims to handle a nested column aliasing pattern inside the ColumnPruning optimizer rule. If: - A Project or its child references nested fields - Not all of the fields in a nested attribute are used Then: - Substitute the nested field references with alias attributes - Add grandchild Projects transforming the nested fields to aliases

    Example 1: Project ------------------ Before: +- Project [concat_ws(s#0.a, s#0.b) AS concat_ws(s.a, s.b)#1] +- GlobalLimit 5 +- LocalLimit 5 +- LocalRelation <empty>, [s#0] After: +- Project [concat_ws(_extract_a#2, _extract_b#3) AS concat_ws(s.a, s.b)#1] +- GlobalLimit 5 +- LocalLimit 5 +- Project [s#0.a AS _extract_a#2, s#0.b AS _extract_b#3] +- LocalRelation <empty>, [s#0]

    Example 2: Project above Filter ------------------------------- Before: +- Project [s#0.a AS s.a#1] +- Filter (length(s#0.b) > 2) +- GlobalLimit 5 +- LocalLimit 5 +- LocalRelation <empty>, [s#0] After: +- Project [_extract_a#2 AS s.a#1] +- Filter (length(_extract_b#3) > 2) +- GlobalLimit 5 +- LocalLimit 5 +- Project [s#0.a AS _extract_a#2, s#0.b AS _extract_b#3] +- LocalRelation <empty>, [s#0]

    Example 3: Nested fields with referenced parents ------------------------------------------------ Before: +- Project [s#0.a AS s.a#1, s#0.a.a1 AS s.a.a1#2] +- GlobalLimit 5 +- LocalLimit 5 +- LocalRelation <empty>, [s#0] After: +- Project [_extract_a#3 AS s.a#1, _extract_a#3.name AS s.a.a1#2] +- GlobalLimit 5 +- LocalLimit 5 +- Project [s#0.a AS _extract_a#3] +- LocalRelation <empty>, [s#0]

    The schema of the datasource relation will be pruned in the SchemaPruning optimizer rule.

  40. object NormalizeFloatingNumbers extends Rule[LogicalPlan]

    We need to take care of special floating numbers (NaN and -0.0) in several places:

    We need to take care of special floating numbers (NaN and -0.0) in several places:

    1. When compare values, different NaNs should be treated as same, -0.0 and 0.0 should be treated as same. 2. In aggregate grouping keys, different NaNs should belong to the same group, -0.0 and 0.0 should belong to the same group. 3. In join keys, different NaNs should be treated as same, -0.0 and 0.0 should be treated as same. 4. In window partition keys, different NaNs should belong to the same partition, -0.0 and 0.0 should belong to the same partition.

    Case 1 is fine, as we handle NaN and -0.0 well during comparison. For complex types, we recursively compare the fields/elements, so it's also fine.

    Case 2, 3 and 4 are problematic, as Spark SQL turns grouping/join/window partition keys into binary UnsafeRow and compare the binary data directly. Different NaNs have different binary representation, and the same thing happens for -0.0 and 0.0.

    This rule normalizes NaN and -0.0 in window partition keys, join keys and aggregate grouping keys.

    Ideally we should do the normalization in the physical operators that compare the binary UnsafeRow directly. We don't need this normalization if the Spark SQL execution engine is not optimized to run on binary data. This rule is created to simplify the implementation, so that we have a single place to do normalization, which is more maintainable.

    Note that, this rule must be executed at the end of optimizer, because the optimizer may create new joins(the subquery rewrite) and new join conditions(the join reorder).

  41. object NullPropagation extends Rule[LogicalPlan]

    Replaces Expressions that can be statically evaluated with equivalent Literal values.

    Replaces Expressions that can be statically evaluated with equivalent Literal values. This rule is more specific with Null value propagation from bottom to top of the expression tree.

  42. object ObjectSerializerPruning extends Rule[LogicalPlan]

    Prunes unnecessary object serializers from query plan.

    Prunes unnecessary object serializers from query plan. This rule prunes both individual serializer and nested fields in serializers.

  43. object OptimizeCsvJsonExprs extends Rule[LogicalPlan]

    Simplify redundant csv/json related expressions.

    Simplify redundant csv/json related expressions.

    The optimization includes: 1. JsonToStructs(StructsToJson(child)) => child. 2. Prune unnecessary columns from GetStructField/GetArrayStructFields + JsonToStructs. 3. CreateNamedStruct(JsonToStructs(json).col1, JsonToStructs(json).col2, ...) => If(IsNull(json), nullStruct, KnownNotNull(JsonToStructs(prunedSchema, ..., json))) if JsonToStructs(json) is shared among all fields of CreateNamedStruct. prunedSchema contains all accessed fields in original CreateNamedStruct. 4. Prune unnecessary columns from GetStructField + CsvToStructs.

  44. object OptimizeIn extends Rule[LogicalPlan]

    Optimize IN predicates: 1.

    Optimize IN predicates: 1. Converts the predicate to false when the list is empty and the value is not nullable. 2. Removes literal repetitions. 3. Replaces (value, seq[Literal]) with optimized version (value, HashSet[Literal]) which is much faster.

  45. object OptimizeLimitZero extends Rule[LogicalPlan]

    Replaces GlobalLimit 0 and LocalLimit 0 nodes (subtree) with empty Local Relation, as they don't return any rows.

  46. object OptimizeOneRowRelationSubquery extends Rule[LogicalPlan]

    This rule optimizes subqueries with OneRowRelation as leaf nodes.

  47. object OptimizeRepartition extends Rule[LogicalPlan]

    Replace RepartitionByExpression numPartitions to 1 if all partition expressions are foldable and user not specify.

  48. object OptimizeUpdateFields extends Rule[LogicalPlan]

    Optimizes UpdateFields expression chains.

  49. object OptimizeWindowFunctions extends Rule[LogicalPlan]

    Replaces first(col) to nth_value(col, 1) for better performance.

  50. object PropagateEmptyRelation extends PropagateEmptyRelationBase

    This rule runs in the normal optimizer and optimizes more cases compared to PropagateEmptyRelationBase: 1.

    This rule runs in the normal optimizer and optimizes more cases compared to PropagateEmptyRelationBase: 1. Higher-node Logical Plans

    • Union with all empty children. 2. Unary-node Logical Plans
    • Project/Filter/Sample with all empty children.

    The reason why we don't apply this rule at AQE optimizer side is: the benefit is not big enough and it may introduce extra exchanges.

  51. object PruneFilters extends Rule[LogicalPlan] with PredicateHelper

    Removes filters that can be evaluated trivially.

    Removes filters that can be evaluated trivially. This can be done through the following ways: 1) by eliding the filter for cases where it will always evaluate to true. 2) by substituting a dummy empty relation when the filter will always evaluate to false. 3) by eliminating the always-true conditions given the constraints on the child's output.

  52. object PullOutGroupingExpressions extends Rule[LogicalPlan]

    This rule ensures that Aggregate nodes doesn't contain complex grouping expressions in the optimization phase.

    This rule ensures that Aggregate nodes doesn't contain complex grouping expressions in the optimization phase.

    Complex grouping expressions are pulled out to a Project node under Aggregate and are referenced in both grouping expressions and aggregate expressions without aggregate functions. These references ensure that optimization rules don't change the aggregate expressions to invalid ones that no longer refer to any grouping expressions and also simplify the expression transformations on the node (need to transform the expression only once).

    For example, in the following query Spark shouldn't optimize the aggregate expression Not(IsNull(c)) to IsNotNull(c) as the grouping expression is IsNull(c): SELECT not(c IS NULL) FROM t GROUP BY c IS NULL Instead, the aggregate expression references a _groupingexpression attribute: Aggregate [_groupingexpression#233], [NOT _groupingexpression#233 AS (NOT (c IS NULL))#230] +- Project [isnull(c#219) AS _groupingexpression#233] +- LocalRelation [c#219]

  53. object PullupCorrelatedPredicates extends Rule[LogicalPlan] with PredicateHelper

    Pull out all (outer) correlated predicates from a given subquery.

    Pull out all (outer) correlated predicates from a given subquery. This method removes the correlated predicates from subquery Filters and adds the references of these predicates to all intermediate Project and Aggregate clauses (if they are missing) in order to be able to evaluate the predicates at the top level.

    TODO: Look to merge this rule with RewritePredicateSubquery.

  54. object PushDownLeftSemiAntiJoin extends Rule[LogicalPlan] with PredicateHelper with JoinSelectionHelper

    This rule is a variant of PushPredicateThroughNonJoin which can handle pushing down Left semi and Left Anti joins below the following operators.

    This rule is a variant of PushPredicateThroughNonJoin which can handle pushing down Left semi and Left Anti joins below the following operators. 1) Project 2) Window 3) Union 4) Aggregate 5) Other permissible unary operators. please see PushPredicateThroughNonJoin.canPushThrough.

  55. object PushDownPredicates extends Rule[LogicalPlan] with PredicateHelper

    The unified version for predicate pushdown of normal operators and joins.

    The unified version for predicate pushdown of normal operators and joins. This rule improves performance of predicate pushdown for cascading joins such as: Filter-Join-Join-Join. Most predicates can be pushed down in a single pass.

  56. object PushExtraPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper

    Try pushing down disjunctive join condition into left and right child.

    Try pushing down disjunctive join condition into left and right child. To avoid expanding the join condition, the join condition will be kept in the original form even when predicate pushdown happens.

  57. object PushFoldableIntoBranches extends Rule[LogicalPlan] with PredicateHelper

    Push the foldable expression into (if / case) branches.

  58. object PushLeftSemiLeftAntiThroughJoin extends Rule[LogicalPlan] with PredicateHelper

    This rule is a variant of PushPredicateThroughJoin which can handle pushing down Left semi and Left Anti joins below a join operator.

    This rule is a variant of PushPredicateThroughJoin which can handle pushing down Left semi and Left Anti joins below a join operator. The allowable join types are: 1) Inner 2) Cross 3) LeftOuter 4) RightOuter

    TODO: Currently this rule can push down the left semi or left anti joins to either left or right leg of the child join. This matches the behaviour of PushPredicateThroughJoin when the left semi or left anti join is in expression form. We need to explore the possibility to push the left semi/anti joins to both legs of join if the join condition refers to both left and right legs of the child join.

  59. object PushPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper

    Pushes down Filter operators where the condition can be evaluated using only the attributes of the left or right side of a join.

    Pushes down Filter operators where the condition can be evaluated using only the attributes of the left or right side of a join. Other Filter conditions are moved into the condition of the Join.

    And also pushes down the join filter, where the condition can be evaluated using only the attributes of the left or right side of sub query when applicable.

    Check https://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavior for more details

  60. object PushPredicateThroughNonJoin extends Rule[LogicalPlan] with PredicateHelper

    Pushes Filter operators through many operators iff: 1) the operator is deterministic 2) the predicate is deterministic and the operator will not change any of rows.

    Pushes Filter operators through many operators iff: 1) the operator is deterministic 2) the predicate is deterministic and the operator will not change any of rows.

    This heuristic is valid assuming the expression evaluation cost is minimal.

  61. object PushProjectionThroughUnion extends Rule[LogicalPlan] with PredicateHelper

    Pushes Project operator to both sides of a Union operator.

    Pushes Project operator to both sides of a Union operator. Operations that are safe to pushdown are listed as follows. Union: Right now, Union means UNION ALL, which does not de-duplicate rows. So, it is safe to pushdown Filters and Projections through it. Filter pushdown is handled by another rule PushDownPredicates. Once we add UNION DISTINCT, we will not be able to pushdown Projections.

  62. object ReassignLambdaVariableID extends Rule[LogicalPlan]

    Reassigns per-query unique IDs to LambdaVariables, whose original IDs are globally unique.

    Reassigns per-query unique IDs to LambdaVariables, whose original IDs are globally unique. This can help Spark to hit codegen cache more often and improve performance.

  63. object RemoveDispensableExpressions extends Rule[LogicalPlan]

    Removes nodes that are not necessary.

  64. object RemoveLiteralFromGroupExpressions extends Rule[LogicalPlan]

    Removes literals from group expressions in Aggregate, as they have no effect to the result but only makes the grouping key bigger.

  65. object RemoveNoopOperators extends Rule[LogicalPlan]

    Remove no-op operators from the query plan that do not make any modifications.

  66. object RemoveNoopUnion extends Rule[LogicalPlan]

    Smplify the children of Union or remove no-op Union from the query plan that do not make any modifications to the query.

  67. object RemoveRedundantAggregates extends Rule[LogicalPlan] with AliasHelper

    Remove redundant aggregates from a query plan.

    Remove redundant aggregates from a query plan. A redundant aggregate is an aggregate whose only goal is to keep distinct values, while its parent aggregate would ignore duplicate values.

  68. object RemoveRedundantAliases extends Rule[LogicalPlan]

    Remove redundant aliases from a query plan.

    Remove redundant aliases from a query plan. A redundant alias is an alias that does not change the name or metadata of a column, and does not deduplicate it.

  69. object RemoveRepetitionFromGroupExpressions extends Rule[LogicalPlan]

    Removes repetition from group expressions in Aggregate, as they have no effect to the result but only makes the grouping key bigger.

  70. object ReorderAssociativeOperator extends Rule[LogicalPlan]

    Reorder associative integral-type operators and fold all constants into one.

  71. object ReorderJoin extends Rule[LogicalPlan] with PredicateHelper

    Reorder the joins and push all the conditions into join, so that the bottom ones have at least one condition.

    Reorder the joins and push all the conditions into join, so that the bottom ones have at least one condition.

    The order of joins will not be changed if all of them already have at least one condition.

    If star schema detection is enabled, reorder the star join plans based on heuristics.

  72. object ReplaceDeduplicateWithAggregate extends Rule[LogicalPlan]

    Replaces logical Deduplicate operator with an Aggregate operator.

  73. object ReplaceDistinctWithAggregate extends Rule[LogicalPlan]

    Replaces logical Distinct operator with an Aggregate operator.

    Replaces logical Distinct operator with an Aggregate operator.

    SELECT DISTINCT f1, f2 FROM t  ==>  SELECT f1, f2 FROM t GROUP BY f1, f2
  74. object ReplaceExceptWithAntiJoin extends Rule[LogicalPlan]

    Replaces logical Except operator with a left-anti Join operator.

    Replaces logical Except operator with a left-anti Join operator.

    SELECT a1, a2 FROM Tab1 EXCEPT SELECT b1, b2 FROM Tab2
    ==>  SELECT DISTINCT a1, a2 FROM Tab1 LEFT ANTI JOIN Tab2 ON a1<=>b1 AND a2<=>b2

    Note: 1. This rule is only applicable to EXCEPT DISTINCT. Do not use it for EXCEPT ALL. 2. This rule has to be done after de-duplicating the attributes; otherwise, the generated join conditions will be incorrect.

  75. object ReplaceExceptWithFilter extends Rule[LogicalPlan]

    If one or both of the datasets in the logical Except operator are purely transformed using Filter, this rule will replace logical Except operator with a Filter operator by flipping the filter condition of the right child.

    If one or both of the datasets in the logical Except operator are purely transformed using Filter, this rule will replace logical Except operator with a Filter operator by flipping the filter condition of the right child.

    SELECT a1, a2 FROM Tab1 WHERE a2 = 12 EXCEPT SELECT a1, a2 FROM Tab1 WHERE a1 = 5
    ==>  SELECT DISTINCT a1, a2 FROM Tab1 WHERE a2 = 12 AND (a1 is null OR a1 <> 5)

    Note: Before flipping the filter condition of the right node, we should: 1. Combine all it's Filter. 2. Update the attribute references to the left node; 3. Add a Coalesce(condition, False) (to take into account of NULL values in the condition).

  76. object ReplaceExpressions extends Rule[LogicalPlan]

    Finds all the expressions that are unevaluable and replace/rewrite them with semantically equivalent expressions that can be evaluated.

    Finds all the expressions that are unevaluable and replace/rewrite them with semantically equivalent expressions that can be evaluated. Currently we replace two kinds of expressions: 1) RuntimeReplaceable expressions 2) UnevaluableAggregate expressions such as Every, Some, Any, CountIf This is mainly used to provide compatibility with other databases. Few examples are: we use this to support "nvl" by replacing it with "coalesce". we use this to replace Every and Any with Min and Max respectively.

    TODO: In future, explore an option to replace aggregate functions similar to how RuntimeReplaceable does.

  77. object ReplaceIntersectWithSemiJoin extends Rule[LogicalPlan]

    Replaces logical Intersect operator with a left-semi Join operator.

    Replaces logical Intersect operator with a left-semi Join operator.

    SELECT a1, a2 FROM Tab1 INTERSECT SELECT b1, b2 FROM Tab2
    ==>  SELECT DISTINCT a1, a2 FROM Tab1 LEFT SEMI JOIN Tab2 ON a1<=>b1 AND a2<=>b2

    Note: 1. This rule is only applicable to INTERSECT DISTINCT. Do not use it for INTERSECT ALL. 2. This rule has to be done after de-duplicating the attributes; otherwise, the generated join conditions will be incorrect.

  78. object ReplaceNullWithFalseInPredicate extends Rule[LogicalPlan]

    A rule that replaces Literal(null, BooleanType) with FalseLiteral, if possible, in the search condition of the WHERE/HAVING/ON(JOIN) clauses, which contain an implicit Boolean operator "(search condition) = TRUE".

    A rule that replaces Literal(null, BooleanType) with FalseLiteral, if possible, in the search condition of the WHERE/HAVING/ON(JOIN) clauses, which contain an implicit Boolean operator "(search condition) = TRUE". The replacement is only valid when Literal(null, BooleanType) is semantically equivalent to FalseLiteral when evaluating the whole search condition.

    Please note that FALSE and NULL are not exchangeable in most cases, when the search condition contains NOT and NULL-tolerant expressions. Thus, the rule is very conservative and applicable in very limited cases.

    For example, Filter(Literal(null, BooleanType)) is equal to Filter(FalseLiteral).

    Another example containing branches is Filter(If(cond, FalseLiteral, Literal(null, _))); this can be optimized to Filter(If(cond, FalseLiteral, FalseLiteral)), and eventually Filter(FalseLiteral).

    Moreover, this rule also transforms predicates in all If expressions as well as branch conditions in all CaseWhen expressions, even if they are not part of the search conditions.

    For example, Project(If(And(cond, Literal(null)), Literal(1), Literal(2))) can be simplified into Project(Literal(2)).

  79. object ReplaceUpdateFieldsExpression extends Rule[LogicalPlan]

    Replaces UpdateFields expression with an evaluable expression.

  80. object RewriteCorrelatedScalarSubquery extends Rule[LogicalPlan] with AliasHelper

    This rule rewrites correlated ScalarSubquery expressions into LEFT OUTER joins.

  81. object RewriteDistinctAggregates extends Rule[LogicalPlan]

    This rule rewrites an aggregate query with distinct aggregations into an expanded double aggregation in which the regular aggregation expressions and every distinct clause is aggregated in a separate group.

    This rule rewrites an aggregate query with distinct aggregations into an expanded double aggregation in which the regular aggregation expressions and every distinct clause is aggregated in a separate group. The results are then combined in a second aggregate.

    First example: query without filter clauses (in scala):

    val data = Seq(
      ("a", "ca1", "cb1", 10),
      ("a", "ca1", "cb2", 5),
      ("b", "ca1", "cb1", 13))
      .toDF("key", "cat1", "cat2", "value")
    data.createOrReplaceTempView("data")
    
    val agg = data.groupBy($"key")
      .agg(
        count_distinct($"cat1").as("cat1_cnt"),
        count_distinct($"cat2").as("cat2_cnt"),
        sum($"value").as("total"))

    This translates to the following (pseudo) logical plan:

    Aggregate(
       key = ['key]
       functions = [COUNT(DISTINCT 'cat1),
                    COUNT(DISTINCT 'cat2),
                    sum('value)]
       output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
      LocalTableScan [...]

    This rule rewrites this logical plan to the following (pseudo) logical plan:

    Aggregate(
       key = ['key]
       functions = [count('cat1) FILTER (WHERE 'gid = 1),
                    count('cat2) FILTER (WHERE 'gid = 2),
                    first('total) ignore nulls FILTER (WHERE 'gid = 0)]
       output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
      Aggregate(
         key = ['key, 'cat1, 'cat2, 'gid]
         functions = [sum('value)]
         output = ['key, 'cat1, 'cat2, 'gid, 'total])
        Expand(
           projections = [('key, null, null, 0, cast('value as bigint)),
                          ('key, 'cat1, null, 1, null),
                          ('key, null, 'cat2, 2, null)]
           output = ['key, 'cat1, 'cat2, 'gid, 'value])
          LocalTableScan [...]

    Second example: aggregate function without distinct and with filter clauses (in sql):

    SELECT
      COUNT(DISTINCT cat1) as cat1_cnt,
      COUNT(DISTINCT cat2) as cat2_cnt,
      SUM(value) FILTER (WHERE id > 1) AS total
    FROM
      data
    GROUP BY
      key

    This translates to the following (pseudo) logical plan:

    Aggregate(
       key = ['key]
       functions = [COUNT(DISTINCT 'cat1),
                    COUNT(DISTINCT 'cat2),
                    sum('value) FILTER (WHERE 'id > 1)]
       output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
      LocalTableScan [...]

    This rule rewrites this logical plan to the following (pseudo) logical plan:

    Aggregate(
       key = ['key]
       functions = [count('cat1) FILTER (WHERE 'gid = 1),
                    count('cat2) FILTER (WHERE 'gid = 2),
                    first('total) ignore nulls FILTER (WHERE 'gid = 0)]
       output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
      Aggregate(
         key = ['key, 'cat1, 'cat2, 'gid]
         functions = [sum('value) FILTER (WHERE 'id > 1)]
         output = ['key, 'cat1, 'cat2, 'gid, 'total])
        Expand(
           projections = [('key, null, null, 0, cast('value as bigint), 'id),
                          ('key, 'cat1, null, 1, null, null),
                          ('key, null, 'cat2, 2, null, null)]
           output = ['key, 'cat1, 'cat2, 'gid, 'value, 'id])
          LocalTableScan [...]

    Third example: aggregate function with distinct and filter clauses (in sql):

    SELECT
      COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt,
      COUNT(DISTINCT cat2) FILTER (WHERE id > 2) as cat2_cnt,
      SUM(value) FILTER (WHERE id > 3) AS total
    FROM
      data
    GROUP BY
      key

    This translates to the following (pseudo) logical plan:

    Aggregate(
       key = ['key]
       functions = [COUNT(DISTINCT 'cat1) FILTER (WHERE 'id > 1),
                    COUNT(DISTINCT 'cat2) FILTER (WHERE 'id > 2),
                    sum('value) FILTER (WHERE 'id > 3)]
       output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
      LocalTableScan [...]

    This rule rewrites this logical plan to the following (pseudo) logical plan:

    Aggregate(
       key = ['key]
       functions = [count('cat1) FILTER (WHERE 'gid = 1 and 'max_cond1),
                    count('cat2) FILTER (WHERE 'gid = 2 and 'max_cond2),
                    first('total) ignore nulls FILTER (WHERE 'gid = 0)]
       output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
      Aggregate(
         key = ['key, 'cat1, 'cat2, 'gid]
         functions = [max('cond1), max('cond2), sum('value) FILTER (WHERE 'id > 3)]
         output = ['key, 'cat1, 'cat2, 'gid, 'max_cond1, 'max_cond2, 'total])
        Expand(
           projections = [('key, null, null, 0, null, null, cast('value as bigint), 'id),
                          ('key, 'cat1, null, 1, 'id > 1, null, null, null),
                          ('key, null, 'cat2, 2, null, 'id > 2, null, null)]
           output = ['key, 'cat1, 'cat2, 'gid, 'cond1, 'cond2, 'value, 'id])
          LocalTableScan [...]

    The rule does the following things here: 1. Expand the data. There are three aggregation groups in this query:

    1. the non-distinct group; ii. the distinct 'cat1 group; iii. the distinct 'cat2 group. An expand operator is inserted to expand the child data for each group. The expand will null out all unused columns for the given group; this must be done in order to ensure correctness later on. Groups can by identified by a group id (gid) column added by the expand operator. If distinct group exists filter clause, the expand will calculate the filter and output it's result (e.g. cond1) which will be used to calculate the global conditions (e.g. max_cond1) equivalent to filter clauses. 2. De-duplicate the distinct paths and aggregate the non-aggregate path. The group by clause of this aggregate consists of the original group by clause, all the requested distinct columns and the group id. Both de-duplication of distinct column and the aggregation of the non-distinct group take advantage of the fact that we group by the group id (gid) and that we have nulled out all non-relevant columns the given group. If distinct group exists filter clause, we will use max to aggregate the results (e.g. cond1) of the filter output in the previous step. These aggregate will output the global conditions (e.g. max_cond1) equivalent to filter clauses. 3. Aggregating the distinct groups and combining this with the results of the non-distinct aggregation. In this step we use the group id and the global condition to filter the inputs for the aggregate functions. If the global condition (e.g. max_cond1) is true, it means at least one row of a distinct value satisfies the filter. This distinct value should be included in the aggregate function. The result of the non-distinct group are 'aggregated' by using the first operator, it might be more elegant to use the native UDAF merge mechanism for this in the future.

    This rule duplicates the input data by two or more times (# distinct groups + an optional non-distinct group). This will put quite a bit of memory pressure of the used aggregate and exchange operators. Keeping the number of distinct groups as low as possible should be priority, we could improve this in the current rule by applying more advanced expression canonicalization techniques.

  82. object RewriteExceptAll extends Rule[LogicalPlan]

    Replaces logical Except operator using a combination of Union, Aggregate and Generate operator.

    Replaces logical Except operator using a combination of Union, Aggregate and Generate operator.

    Input Query :

    SELECT c1 FROM ut1 EXCEPT ALL SELECT c1 FROM ut2

    Rewritten Query:

    SELECT c1
    FROM (
      SELECT replicate_rows(sum_val, c1)
        FROM (
          SELECT c1, sum_val
            FROM (
              SELECT c1, sum(vcol) AS sum_val
                FROM (
                  SELECT 1L as vcol, c1 FROM ut1
                  UNION ALL
                  SELECT -1L as vcol, c1 FROM ut2
               ) AS union_all
             GROUP BY union_all.c1
           )
         WHERE sum_val > 0
        )
    )
  83. object RewriteIntersectAll extends Rule[LogicalPlan]

    Replaces logical Intersect operator using a combination of Union, Aggregate and Generate operator.

    Replaces logical Intersect operator using a combination of Union, Aggregate and Generate operator.

    Input Query :

    SELECT c1 FROM ut1 INTERSECT ALL SELECT c1 FROM ut2

    Rewritten Query:

    SELECT c1
    FROM (
         SELECT replicate_row(min_count, c1)
         FROM (
              SELECT c1, If (vcol1_cnt > vcol2_cnt, vcol2_cnt, vcol1_cnt) AS min_count
              FROM (
                   SELECT   c1, count(vcol1) as vcol1_cnt, count(vcol2) as vcol2_cnt
                   FROM (
                        SELECT true as vcol1, null as , c1 FROM ut1
                        UNION ALL
                        SELECT null as vcol1, true as vcol2, c1 FROM ut2
                        ) AS union_all
                   GROUP BY c1
                   HAVING vcol1_cnt >= 1 AND vcol2_cnt >= 1
                   )
              )
          )
  84. object RewriteLateralSubquery extends Rule[LogicalPlan]

    This rule rewrites LateralSubquery expressions into joins.

  85. object RewriteNonCorrelatedExists extends Rule[LogicalPlan]

    Rewrite non correlated exists subquery to use ScalarSubquery WHERE EXISTS (SELECT A FROM TABLE B WHERE COL1 > 10) will be rewritten to WHERE (SELECT 1 FROM (SELECT A FROM TABLE B WHERE COL1 > 10) LIMIT 1) IS NOT NULL

  86. object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper

    This rule rewrites predicate sub-queries into left semi/anti joins.

    This rule rewrites predicate sub-queries into left semi/anti joins. The following predicates are supported: a. EXISTS/NOT EXISTS will be rewritten as semi/anti join, unresolved conditions in Filter will be pulled out as the join conditions. b. IN/NOT IN will be rewritten as semi/anti join, unresolved conditions in the Filter will be pulled out as join conditions, value = selected column will also be used as join condition.

  87. object SimpleTestOptimizer extends SimpleTestOptimizer

    An optimizer used in test code.

    An optimizer used in test code.

    To ensure extendability, we leave the standard rules in the abstract optimizer rules, while specific rules go to the subclasses

  88. object SimplifyBinaryComparison extends Rule[LogicalPlan] with PredicateHelper with ConstraintHelper

    Simplifies binary comparisons with semantically-equal expressions: 1) Replace '<=>' with 'true' literal.

    Simplifies binary comparisons with semantically-equal expressions: 1) Replace '<=>' with 'true' literal. 2) Replace '=', '<=', and '>=' with 'true' literal if both operands are non-nullable. 3) Replace '<' and '>' with 'false' literal if both operands are non-nullable.

  89. object SimplifyCaseConversionExpressions extends Rule[LogicalPlan]

    Removes the inner case conversion expressions that are unnecessary because the inner conversion is overwritten by the outer one.

  90. object SimplifyCasts extends Rule[LogicalPlan]

    Removes Casts that are unnecessary because the input is already the correct type.

  91. object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper

    Simplifies conditional expressions (if / case).

  92. object SimplifyConditionalsInPredicate extends Rule[LogicalPlan]

    A rule that converts conditional expressions to predicate expressions, if possible, in the search condition of the WHERE/HAVING/ON(JOIN) clauses, which contain an implicit Boolean operator "(search condition) = TRUE".

    A rule that converts conditional expressions to predicate expressions, if possible, in the search condition of the WHERE/HAVING/ON(JOIN) clauses, which contain an implicit Boolean operator "(search condition) = TRUE". After this converting, we can potentially push the filter down to the data source. This rule is null-safe.

    Supported cases are: - IF(cond, trueVal, false) => AND(cond, trueVal) - IF(cond, trueVal, true) => OR(NOT(cond), trueVal) - IF(cond, false, falseVal) => AND(NOT(cond), falseVal) - IF(cond, true, falseVal) => OR(cond, falseVal) - CASE WHEN cond THEN trueVal ELSE false END => AND(cond, trueVal) - CASE WHEN cond THEN trueVal END => AND(cond, trueVal) - CASE WHEN cond THEN trueVal ELSE null END => AND(cond, trueVal) - CASE WHEN cond THEN trueVal ELSE true END => OR(NOT(cond), trueVal) - CASE WHEN cond THEN false ELSE elseVal END => AND(NOT(cond), elseVal) - CASE WHEN cond THEN true ELSE elseVal END => OR(cond, elseVal)

  93. object SimplifyExtractValueOps extends Rule[LogicalPlan]

    Simplify redundant CreateNamedStruct, CreateArray and CreateMap expressions.

  94. object SpecialDatetimeValues extends Rule[LogicalPlan]

    Replaces casts of special datetime strings by its date/timestamp values if the input strings are foldable.

  95. object StarSchemaDetection extends PredicateHelper with SQLConfHelper

    Encapsulates star-schema detection logic.

  96. object TransposeWindow extends Rule[LogicalPlan]

    Transpose Adjacent Window Expressions.

    Transpose Adjacent Window Expressions. - If the partition spec of the parent Window expression is compatible with the partition spec of the child window expression, transpose them.

  97. object UnwrapCastInBinaryComparison extends Rule[LogicalPlan]

    Unwrap casts in binary comparison or In/InSet operations with patterns like following:

    Unwrap casts in binary comparison or In/InSet operations with patterns like following:

    - BinaryComparison(Cast(fromExp, toType), Literal(value, toType)) - BinaryComparison(Literal(value, toType), Cast(fromExp, toType)) - In(Cast(fromExp, toType), Seq(Literal(v1, toType), Literal(v2, toType), ...) - InSet(Cast(fromExp, toType), Set(v1, v2, ...))

    This rule optimizes expressions with the above pattern by either replacing the cast with simpler constructs, or moving the cast from the expression side to the literal side, which enables them to be optimized away later and pushed down to data sources.

    Currently this only handles cases where: 1). fromType (of fromExp) and toType are of numeric types (i.e., short, int, float, decimal, etc) 2). fromType can be safely coerced to toType without precision loss (e.g., short to int, int to long, but not long to int)

    If the above conditions are satisfied, the rule checks to see if the literal value is within range (min, max), where min and max are the minimum and maximum value of fromType, respectively. If this is true then it means we may safely cast value to fromType and thus able to move the cast to the literal side. That is:

    cast(fromExp, toType) op value ==> fromExp op cast(value, fromType)

    Note there are some exceptions to the above: if casting from value to fromType causes rounding up or down, the above conversion will no longer be valid. Instead, the rule does the following:

    if casting value to fromType causes rounding up:

    • cast(fromExp, toType) > value ==> fromExp >= cast(value, fromType)
    • cast(fromExp, toType) >= value ==> fromExp >= cast(value, fromType)
    • cast(fromExp, toType) === value ==> if(isnull(fromExp), null, false)
    • cast(fromExp, toType) <=> value ==> false (if fromExp is deterministic)
    • cast(fromExp, toType) <= value ==> fromExp < cast(value, fromType)
    • cast(fromExp, toType) < value ==> fromExp < cast(value, fromType)

    Similarly for the case when casting value to fromType causes rounding down.

    If the value is not within range (min, max), the rule breaks the scenario into different cases and try to replace each with simpler constructs.

    if value > max, the cases are of following:

    • cast(fromExp, toType) > value ==> if(isnull(fromExp), null, false)
    • cast(fromExp, toType) >= value ==> if(isnull(fromExp), null, false)
    • cast(fromExp, toType) === value ==> if(isnull(fromExp), null, false)
    • cast(fromExp, toType) <=> value ==> false (if fromExp is deterministic)
    • cast(fromExp, toType) <= value ==> if(isnull(fromExp), null, true)
    • cast(fromExp, toType) < value ==> if(isnull(fromExp), null, true)

    if value == max, the cases are of following:

    • cast(fromExp, toType) > value ==> if(isnull(fromExp), null, false)
    • cast(fromExp, toType) >= value ==> fromExp == max
    • cast(fromExp, toType) === value ==> fromExp == max
    • cast(fromExp, toType) <=> value ==> fromExp <=> max
    • cast(fromExp, toType) <= value ==> if(isnull(fromExp), null, true)
    • cast(fromExp, toType) < value ==> fromExp =!= max

    Similarly for the cases when value == min and value < min.

    Further, the above if(isnull(fromExp), null, false) is represented using conjunction and(isnull(fromExp), null), to enable further optimization and filter pushdown to data sources. Similarly, if(isnull(fromExp), null, true) is represented with or(isnotnull(fromExp), null).

    For In/InSet operation, first the rule transform the expression to Equals: Seq( EqualTo(Cast(fromExp, toType), Literal(v1, toType)), EqualTo(Cast(fromExp, toType), Literal(v2, toType)), ... ) and using the same rule with BinaryComparison show as before to optimize each EqualTo.

Ungrouped