brickhouse.hbase
Class CachedGetUDF

java.lang.Object
  extended by org.apache.hadoop.hive.ql.udf.generic.GenericUDF
      extended by brickhouse.hbase.CachedGetUDF

public class CachedGetUDF
extends org.apache.hadoop.hive.ql.udf.generic.GenericUDF

Load data from HBase, and cache locally in memory, for faster access. Similar to using a distributed map, except the values are stored in HBase, and can be sharded across multiple nodes, so one can process elements which wouldn't fit into memory on a single node. This may be useful in situations where you would potentially have a cartesian product ( bayesian topic assignment, similiarity clustering ), and would want to avoid an extra join. One can cache strings, or arbitrary Hive structures, by storing values as JSON strings, and using a template object similiar to the one used in the from_json UDF. An example would be storing a map as a bag-of-words, or an array<string> to store a sketch-set


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.hadoop.hive.ql.udf.generic.GenericUDF
org.apache.hadoop.hive.ql.udf.generic.GenericUDF.DeferredJavaObject, org.apache.hadoop.hive.ql.udf.generic.GenericUDF.DeferredObject
 
Constructor Summary
CachedGetUDF()
           
 
Method Summary
 Object evaluate(org.apache.hadoop.hive.ql.udf.generic.GenericUDF.DeferredObject[] arg0)
           
 String getDisplayString(String[] arg0)
           
 Object getValue(String key)
           
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector initialize(org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector[] parameters)
          User should pass in a constant Map of HBase parameters, the String key to look up,
 
Methods inherited from class org.apache.hadoop.hive.ql.udf.generic.GenericUDF
getRequiredFiles, getRequiredJars, initializeAndFoldConstants
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CachedGetUDF

public CachedGetUDF()
Method Detail

evaluate

public Object evaluate(org.apache.hadoop.hive.ql.udf.generic.GenericUDF.DeferredObject[] arg0)
                throws org.apache.hadoop.hive.ql.metadata.HiveException
Specified by:
evaluate in class org.apache.hadoop.hive.ql.udf.generic.GenericUDF
Throws:
org.apache.hadoop.hive.ql.metadata.HiveException

getValue

public Object getValue(String key)

getDisplayString

public String getDisplayString(String[] arg0)
Specified by:
getDisplayString in class org.apache.hadoop.hive.ql.udf.generic.GenericUDF

initialize

public org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector initialize(org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector[] parameters)
                                                                         throws org.apache.hadoop.hive.ql.exec.UDFArgumentException
User should pass in a constant Map of HBase parameters, the String key to look up,

Specified by:
initialize in class org.apache.hadoop.hive.ql.udf.generic.GenericUDF
Throws:
org.apache.hadoop.hive.ql.exec.UDFArgumentException


Copyright © 2013. All rights reserved.