public class SortMergeJoinExample extends TezExampleBase
SortMergeJoinExample and
HashJoinExample. HashJoinExample which require one dataset(hashFile) must be small
enough to fit into memory, while in SortMergeJoinExample, it does not
load one data set into memory, it just sort the output of the datasets before
feeding to SortMergeJoinExample.SortMergeJoinProcessor, just like the sort phase before
reduce in traditional MapReduce. Then we could move forward the iterators of
two inputs in SortMergeJoinExample.SortMergeJoinProcessor to find the joined keys since
they are both sorted already. HashJoinExample It is required that keys
in the hashFile are unique. while for SortMergeJoinExample it is
required that keys in the both 2 datasets are unique.| Modifier and Type | Class and Description |
|---|---|
static class |
SortMergeJoinExample.SortMergeJoinProcessor
Join 2 inputs which has already been sorted.
|
COUNTER_LOG, DISABLE_SPLIT_GROUPING, GENERATE_SPLIT_IN_CLIENT, LEAVE_AM_RUNNING, LOCAL_MODE, RECONNECT_APP_ID| Constructor and Description |
|---|
SortMergeJoinExample() |
| Modifier and Type | Method and Description |
|---|---|
static void |
main(String[] args) |
protected void |
printUsage()
Print usage instructions for this example
|
protected int |
runJob(String[] args,
org.apache.tez.dag.api.TezConfiguration tezConf,
org.apache.tez.client.TezClient tezClient)
Create and execute the actual DAG for the example
|
protected int |
validateArgs(String[] otherArgs)
Validate the arguments
|
getAppId, isCountersLog, isDisableSplitGrouping, isGenerateSplitInClient, printExtraOptionsUsage, run, run, runDagprotected void printUsage()
TezExampleBaseprintUsage in class TezExampleBaseprotected int runJob(String[] args, org.apache.tez.dag.api.TezConfiguration tezConf, org.apache.tez.client.TezClient tezClient) throws Exception
TezExampleBaserunJob in class TezExampleBaseargs - arguments for executiontezConf - the tez configuration instance to be used while processing the DAGtezClient - the tez client instance to use to run the DAG if any custom monitoring is
required. Otherwise the utility method TezExampleBase.runDag(org.apache.tez.dag.api.DAG,
boolean, org.slf4j.Logger) should be usedIOExceptionorg.apache.tez.dag.api.TezExceptionExceptionprotected int validateArgs(String[] otherArgs)
TezExampleBasevalidateArgs in class TezExampleBaseotherArgs - arguments, if anyCopyright © 2019 Apache Software Foundation. All rights reserved.