public class HiveAggregateInsertIncrementalRewritingRule
extends HiveAggregateIncrementalRewritingRuleBase<HiveAggregateIncrementalRewritingRuleBase.IncrementalComputePlan>
This rule will perform a rewriting to prepare the plan for incremental
view maintenance in case there exist aggregation operator, so we can
avoid the INSERT OVERWRITE and use a MERGE statement instead.
In particular, the INSERT OVERWRITE maintenance will look like this
(in SQL):
INSERT OVERWRITE mv
SELECT a, b, SUM(s) as s, SUM(c) AS c
FROM (
SELECT * from mv --OLD DATA
UNION ALL
SELECT a, b, SUM(x) AS s, COUNT(*) AS c --NEW DATA
FROM TAB_A
JOIN TAB_B ON (TAB_A.a = TAB_B.z)
WHERE TAB_A.ROW_ID > 5
GROUP BY a, b) inner_subq
GROUP BY a, b;
We need to transform that into:
MERGE INTO mv
USING (
SELECT a, b, SUM(x) AS s, COUNT(*) AS c --NEW DATA
FROM TAB_A
JOIN TAB_B ON (TAB_A.a = TAB_B.z)
WHERE TAB_A.ROW_ID > 5
GROUP BY a, b) source
ON (mv.a <=> source.a AND mv.b <=> source.b)
WHEN MATCHED AND mv.c + source.c <> 0
THEN UPDATE SET mv.s = mv.s + source.s, mv.c = mv.c + source.c
WHEN NOT MATCHED
THEN INSERT VALUES (source.a, source.b, s, c);
To be precise, we need to convert it into a MERGE rewritten as:
FROM (select *, true flag from mv) mv right outer join _source_ source
ON (mv.a <=> source.a AND mv.b <=> source.b)
INSERT INTO TABLE mv
SELECT source.a, source.b, s, c
WHERE mv.flag IS NULL
INSERT INTO TABLE mv
SELECT mv.ROW__ID, source.a, source.b,
CASE WHEN mv.s IS NULL AND source.s IS NULL THEN NULL
WHEN mv.s IS NULL THEN source.s
WHEN source.s IS NULL THEN mv.s
ELSE mv.s + source.s END,
<same CASE statement for mv.c + source.c like above>
WHERE mv.flag
SORT BY mv.ROW__ID;
- See Also:
CalcitePlanner