object MergeScalarSubqueries extends Rule[LogicalPlan]
This rule tries to merge multiple non-correlated ScalarSubquerys to compute multiple scalar values once.
The process is the following:
- While traversing through the plan each ScalarSubquery plan is tried to merge into the cache
of already seen subquery plans. If merge is possible then cache is updated with the merged
subquery plan, if not then the new subquery plan is added to the cache.
During this first traversal each ScalarSubquery expression is replaced to a temporal
ScalarSubqueryReference reference pointing to its cached version.
The cache uses a flag to keep track of if a cache entry is a result of merging 2 or more
plans, or it is a plan that was seen only once.
Merged plans in the cache get a "Header", that contains the list of attributes form the scalar
return value of a merged subquery.
- A second traversal checks if there are merged subqueries in the cache and builds a WithCTE
node from these queries. The CTERelationDef nodes contain the merged subquery in the
following form:
Project(Seq(CreateNamedStruct(name1, attribute1, ...) AS mergedValue), mergedSubqueryPlan)
and the definitions are flagged that they host a subquery, that can return maximum one row.
During the second traversal ScalarSubqueryReference expressions that pont to a merged
subquery is either transformed to a GetStructField(ScalarSubquery(CTERelationRef(...)))
expression or restored to the original ScalarSubquery.
Eg. the following query:
SELECT (SELECT avg(a) FROM t), (SELECT sum(b) FROM t)
is optimized from:
Optimized Logical Plan
Project [scalar-subquery#242 [] AS scalarsubquery()#253, scalar-subquery#243 [] AS scalarsubquery()#254L] : :- Aggregate [avg(a#244) AS avg(a)#247] : : +- Project [a#244] : : +- Relation default.t[a#244,b#245] parquet : +- Aggregate [sum(a#251) AS sum(a)#250L] : +- Project [a#251] : +- Relation default.t[a#251,b#252] parquet +- OneRowRelation
to:
Optimized Logical Plan
Project [scalar-subquery#242 [].avg(a) AS scalarsubquery()#253, scalar-subquery#243 [].sum(a) AS scalarsubquery()#254L] : :- Project [named_struct(avg(a), avg(a)#247, sum(a), sum(a)#250L) AS mergedValue#260] : : +- Aggregate [avg(a#244) AS avg(a)#247, sum(a#244) AS sum(a)#250L] : : +- Project [a#244] : : +- Relation default.t[a#244,b#245] parquet : +- Project [named_struct(avg(a), avg(a)#247, sum(a), sum(a)#250L) AS mergedValue#260] : +- Aggregate [avg(a#244) AS avg(a)#247, sum(a#244) AS sum(a)#250L] : +- Project [a#244] : +- Relation default.t[a#244,b#245] parquet +- OneRowRelation
Physical Plan
*(1) Project [Subquery scalar-subquery#242, [id=#125].avg(a) AS scalarsubquery()#253, ReusedSubquery Subquery scalar-subquery#242, [id=#125].sum(a) AS scalarsubquery()#254L] : :- Subquery scalar-subquery#242, [id=#125] : : +- *(2) Project [named_struct(avg(a), avg(a)#247, sum(a), sum(a)#250L) AS mergedValue#260] : : +- *(2) HashAggregate(keys=[], functions=[avg(a#244), sum(a#244)], output=[avg(a)#247, sum(a)#250L]) : : +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [id=#120] : : +- *(1) HashAggregate(keys=[], functions=[partial_avg(a#244), partial_sum(a#244)], output=[sum#262, count#263L, sum#264L]) : : +- *(1) ColumnarToRow : : +- FileScan parquet default.t[a#244] ... : +- ReusedSubquery Subquery scalar-subquery#242, [id=#125] +- *(1) Scan OneRowRelation[]
- Alphabetic
- By Inheritance
- MergeScalarSubqueries
- Rule
- Logging
- SQLConfHelper
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
-
case class
Header(attributes: Seq[Attribute], plan: LogicalPlan, merged: Boolean, references: Set[Int]) extends Product with Serializable
An item in the cache of merged scalar subqueries.
An item in the cache of merged scalar subqueries.
- attributes
Attributes that form the struct scalar return value of a merged subquery.
- plan
The plan of a merged scalar subquery.
- merged
A flag to identify if this item is the result of merging subqueries. Please note that
attributes.size == 1doesn't always mean that the plan is not merged as there can be subqueries that are different (checkIdenticalPlans is false) due to an extra Project node in one of them. In that caseattributes.sizeremains 1 after merging, but the merged flag becomes true.- references
A set of subquery indexes in the cache to track all (including transitive) nested subqueries.
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
apply(plan: LogicalPlan): LogicalPlan
- Definition Classes
- MergeScalarSubqueries → Rule
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
conf: SQLConf
The active config object within the current scope.
The active config object within the current scope. See SQLConf.get for more information.
- Definition Classes
- SQLConfHelper
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
log: Logger
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logName: String
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
lazy val
ruleId: RuleId
- Attributes
- protected
- Definition Classes
- Rule
-
val
ruleName: String
Name for this rule, automatically inferred based on class name.
Name for this rule, automatically inferred based on class name.
- Definition Classes
- Rule
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()