| Modifier and Type | Method and Description |
|---|---|
Table |
aggregate(Aggregate... aggregates)
Aggregates the group of columns represented by indices
Usage:
aggregate(count(), max(2),...);
example:
input : 1, 1, 1
1, 2, 1
2, 4, 5
table.groupBy(0, 2).count()
col0, col1
output: 1, 1
1, 2
2, 1 ==> aggregated count
|
Table |
aggregateWindows(WindowAggregate... windowAggregates)
Computes row-based window aggregation functions on the Table/projection,
based on windows specified in the argument.
|
Table |
aggregateWindowsOverTimeRanges(WindowAggregate... windowAggregates)
Computes time-range-based window aggregation functions on the Table/projection,
based on windows specified in the argument.
|
public Table aggregate(Aggregate... aggregates)
public Table aggregateWindows(WindowAggregate... windowAggregates)
WindowAggregate argument,
indicating:
1. the AggregateOp,
2. the number of rows preceding and following the current row, within a window,
3. the minimum number of observations within the defined window
This method returns a Table instance, with one result column for each specified
window aggregation.
In this example, for the following input:
[ // user_id, sales_amt
{ "user1", 10 },
{ "user2", 20 },
{ "user1", 20 },
{ "user1", 10 },
{ "user2", 30 },
{ "user2", 80 },
{ "user1", 50 },
{ "user1", 60 },
{ "user2", 40 }
]
Partitioning (grouping) by `user_id` yields the following `sales_amt` vector
(with 2 groups, one for each distinct `user_id`):
[ 10, 20, 10, 50, 60, 20, 30, 80, 40 ]
<-------user1-------->|<------user2------->
The SUM aggregation is applied with 1 preceding and 1 following
row, with a minimum of 1 period. The aggregation window is thus 3 rows wide,
yielding the following column:
[ 30, 40, 80, 120, 110, 50, 130, 150, 120 ]windowAggregates - the window-aggregations to be performedIllegalArgumentException - if the window arguments are not of type
WindowOptions.FrameType.ROWS,
i.e. a timestamp column is specified for a window-aggregation.public Table aggregateWindowsOverTimeRanges(WindowAggregate... windowAggregates)
WindowAggregate argument,
indicating:
1. the AggregateOp,
2. the index for the timestamp column to base the window definitions on
2. the number of DAYS preceding and following the current row's date, to consider in the window
3. the minimum number of observations within the defined window
This method returns a Table instance, with one result column for each specified
window aggregation.
In this example, for the following input:
[ // user, sales_amt, YYYYMMDD (date)
{ "user1", 10, 20200101 },
{ "user2", 20, 20200101 },
{ "user1", 20, 20200102 },
{ "user1", 10, 20200103 },
{ "user2", 30, 20200101 },
{ "user2", 80, 20200102 },
{ "user1", 50, 20200107 },
{ "user1", 60, 20200107 },
{ "user2", 40, 20200104 }
]
Partitioning (grouping) by `user_id`, and ordering by `date` yields the following `sales_amt` vector
(with 2 groups, one for each distinct `user_id`):
Date :(202001-) [ 01, 02, 03, 07, 07, 01, 01, 02, 04 ]
Input: [ 10, 20, 10, 50, 60, 20, 30, 80, 40 ]
<-------user1-------->|<---------user2--------->
The SUM aggregation is applied, with 1 day preceding, and 1 day following, with a minimum of 1 period.
The aggregation window is thus 3 *days* wide, yielding the following output column:
Results: [ 30, 40, 30, 110, 110, 130, 130, 130, 40 ]windowAggregates - the window-aggregations to be performedIllegalArgumentException - if the window arguments are not of type
WindowOptions.FrameType.RANGE,
i.e. the timestamp-column was not specified for the aggregation.Copyright © 2020. All rights reserved.