- Direct Known Subclasses:
- VectorPTFEvaluatorBytesCountDistinct, VectorPTFEvaluatorDecimalCountDistinct, VectorPTFEvaluatorDoubleCountDistinct, VectorPTFEvaluatorLongCountDistinct, VectorPTFEvaluatorTimestampCountDistinct
public abstract class VectorPTFEvaluatorCountDistinct
extends VectorPTFEvaluatorCount
This class evaluates count(column) for a PTF group where a distinct keyword is applied to the
partitioning column itself, e.g.:
SELECT
txt1,
txt2,
count(distinct txt1) over(partition by txt1) as n,
count(distinct txt2) over(partition by txt2) as m
FROM example;
In this case, the framework is still supposed to ensure sorting
on the key (let's say txt1 for the first Reducer stage), but the original
VectorPTFEvaluatorCount is not aware that a distinct keyword was applied
to the key column. This case would be simple, because such function should
return 1 every time. However, that's just a corner-case, a real scenario is
when the partitioning column is not the same. In such cases, a real count
distinct implementation is needed:
SELECT
txt1,
txt2,
count(distinct txt2) over(partition by txt1) as n,
count(distinct txt1) over(partition by txt2) as m
FROM example;