object MergeScalarSubqueries extends Rule[LogicalPlan]
This rule tries to merge multiple non-correlated ScalarSubquerys to compute multiple scalar values once.
The process is the following:
- While traversing through the plan each ScalarSubquery plan is tried to merge into the cache
of already seen subquery plans. If merge is possible then cache is updated with the merged
subquery plan, if not then the new subquery plan is added to the cache.
During this first traversal each ScalarSubquery expression is replaced to a temporal
ScalarSubqueryReference reference pointing to its cached version.
The cache uses a flag to keep track of if a cache entry is a result of merging 2 or more
plans, or it is a plan that was seen only once.
Merged plans in the cache get a "Header", that contains the list of attributes form the scalar
return value of a merged subquery.
- A second traversal checks if there are merged subqueries in the cache and builds a WithCTE
node from these queries. The CTERelationDef nodes contain the merged subquery in the
following form:
Project(Seq(CreateNamedStruct(name1, attribute1, ...) AS mergedValue), mergedSubqueryPlan)
and the definitions are flagged that they host a subquery, that can return maximum one row.
During the second traversal ScalarSubqueryReference expressions that pont to a merged
subquery is either transformed to a GetStructField(ScalarSubquery(CTERelationRef(...)))
expression or restored to the original ScalarSubquery.
Eg. the following query:
SELECT (SELECT avg(a) FROM t), (SELECT sum(b) FROM t)
is optimized from:
Optimized Logical Plan
Project [scalar-subquery#242 [] AS scalarsubquery()#253, scalar-subquery#243 [] AS scalarsubquery()#254L] : :- Aggregate [avg(a#244) AS avg(a)#247] : : +- Project [a#244] : : +- Relation default.t[a#244,b#245] parquet : +- Aggregate [sum(a#251) AS sum(a)#250L] : +- Project [a#251] : +- Relation default.t[a#251,b#252] parquet +- OneRowRelation
to:
Optimized Logical Plan
Project [scalar-subquery#242 [].avg(a) AS scalarsubquery()#253, scalar-subquery#243 [].sum(a) AS scalarsubquery()#254L] : :- Project [named_struct(avg(a), avg(a)#247, sum(a), sum(a)#250L) AS mergedValue#260] : : +- Aggregate [avg(a#244) AS avg(a)#247, sum(a#244) AS sum(a)#250L] : : +- Project [a#244] : : +- Relation default.t[a#244,b#245] parquet : +- Project [named_struct(avg(a), avg(a)#247, sum(a), sum(a)#250L) AS mergedValue#260] : +- Aggregate [avg(a#244) AS avg(a)#247, sum(a#244) AS sum(a)#250L] : +- Project [a#244] : +- Relation default.t[a#244,b#245] parquet +- OneRowRelation
Physical Plan
*(1) Project [Subquery scalar-subquery#242, [id=#125].avg(a) AS scalarsubquery()#253, ReusedSubquery Subquery scalar-subquery#242, [id=#125].sum(a) AS scalarsubquery()#254L] : :- Subquery scalar-subquery#242, [id=#125] : : +- *(2) Project [named_struct(avg(a), avg(a)#247, sum(a), sum(a)#250L) AS mergedValue#260] : : +- *(2) HashAggregate(keys=[], functions=[avg(a#244), sum(a#244)], output=[avg(a)#247, sum(a)#250L]) : : +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [id=#120] : : +- *(1) HashAggregate(keys=[], functions=[partial_avg(a#244), partial_sum(a#244)], output=[sum#262, count#263L, sum#264L]) : : +- *(1) ColumnarToRow : : +- FileScan parquet default.t[a#244] ... : +- ReusedSubquery Subquery scalar-subquery#242, [id=#125] +- *(1) Scan OneRowRelation[]
- Alphabetic
- By Inheritance
- MergeScalarSubqueries
- Rule
- Logging
- SQLConfHelper
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Type Members
- case class Header(attributes: Seq[Attribute], plan: LogicalPlan, merged: Boolean, references: Set[Int]) extends Product with Serializable
An item in the cache of merged scalar subqueries.
An item in the cache of merged scalar subqueries.
- attributes
Attributes that form the struct scalar return value of a merged subquery.
- plan
The plan of a merged scalar subquery.
- merged
A flag to identify if this item is the result of merging subqueries. Please note that
attributes.size == 1doesn't always mean that the plan is not merged as there can be subqueries that are different (checkIdenticalPlans is false) due to an extra Project node in one of them. In that caseattributes.sizeremains 1 after merging, but the merged flag becomes true.- references
A set of subquery indexes in the cache to track all (including transitive) nested subqueries.
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- def apply(plan: LogicalPlan): LogicalPlan
- Definition Classes
- MergeScalarSubqueries → Rule
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- def conf: SQLConf
The active config object within the current scope.
The active config object within the current scope. See SQLConf.get for more information.
- Definition Classes
- SQLConfHelper
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def log: Logger
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logName: String
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- lazy val ruleId: RuleId
- Attributes
- protected
- Definition Classes
- Rule
- val ruleName: String
Name for this rule, automatically inferred based on class name.
Name for this rule, automatically inferred based on class name.
- Definition Classes
- Rule
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()