|
Class Summary |
| BloomAndUDF |
|
| BloomContainsUDF |
Returns true if the bloom (probably) contains the string |
| BloomFactory |
Utility class for construction and serialization of BloomFilters ... |
| BloomFilter |
Implements a Bloom filter, as defined by Bloom in 1970. |
| BloomNotUDF |
|
| BloomOrUDF |
|
| BloomUDAF |
Construct a BloomFilter by aggregating on keys
Uses hadoop util BloomFilter class
Use with bloom_contains( key, bloomfile );
insert overwrite local directory bloomfile
select bloom( ks_uid )
from big_table
where premise = true;
add file bloomfile;
select ks_uid
from other_big_table
where bloom_contains( key, distributed_bloom('bloomfile') ); |
| BloomUDAF.BloomUDAFEvaluator |
|
| DistributedBloomUDF |
UDF to acccess a bloom stored from a file stored in distributed cache
Assumes the file is a tab-separated file of name-value pairs,
which has been placed in distributed cache using the "add file" command
Example
INSERT OVERWRITE LOCAL DIRECTORY mybloom select bloom(key) from my_map_table where premise=true;
ADD FILE mybloom;
select *
from my_big_table
where bloom_contains( key, distributed_bloom('mybloom') ) == true; |