brickhouse.udf.collect
Class GroupCountUDF

java.lang.Object
  extended by org.apache.hadoop.hive.ql.exec.UDF
      extended by brickhouse.udf.collect.GroupCountUDF

public class GroupCountUDF
extends org.apache.hadoop.hive.ql.exec.UDF

GroupCountUDF provides a sequence number for all rows which have the same value for a particular grouping. This allows us to count how many rows are in a grouping and cap them off after a certain point.

For example, we can cap-off the number of records per ks_uid with something like select ks_uid, val, group_count(ks_uid) as rank from ( select ks_uid, val from table1 distribute by ks_uid sort by ks_uid, val ) ordered_keys where group_count( ks_uid ) < 100


Constructor Summary
GroupCountUDF()
           
 
Method Summary
 Integer evaluate(String grouping)
           
 
Methods inherited from class org.apache.hadoop.hive.ql.exec.UDF
getRequiredFiles, getRequiredJars, getResolver, setResolver
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

GroupCountUDF

public GroupCountUDF()
Method Detail

evaluate

public Integer evaluate(String grouping)


Copyright © 2013. All rights reserved.