brickhouse.udf.collect
Class GroupCountUDF
java.lang.Object
org.apache.hadoop.hive.ql.exec.UDF
brickhouse.udf.collect.GroupCountUDF
public class GroupCountUDF
- extends org.apache.hadoop.hive.ql.exec.UDF
GroupCountUDF provides a sequence number for all rows which have the
same value for a particular grouping.
This allows us to count how many rows are in a grouping and cap them
off after a certain point.
For example, we can cap-off the number of records per ks_uid with something like
select
ks_uid, val, group_count(ks_uid) as rank
from
( select ks_uid, val from table1
distribute by ks_uid
sort by ks_uid, val ) ordered_keys
where group_count( ks_uid ) < 100
| Methods inherited from class org.apache.hadoop.hive.ql.exec.UDF |
getRequiredFiles, getRequiredJars, getResolver, setResolver |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
GroupCountUDF
public GroupCountUDF()
evaluate
public Integer evaluate(String grouping)
Copyright © 2013. All rights reserved.