public class PairedDataHelper<P>
extends java.lang.Object
implements java.io.Serializable
A helper class that Step implementations can use when processing paired data (e.g. train and test sets). Has the concept of a primary and secondary connection/data type, where the secondary connection/data for a given set number typically needs to be processed using a result generated from the corresponding primary connection/data. This class takes care of ensuring that the secondary connection/data is only processed once the primary has completed. Users of this helper need to provide an implementation of the PairedProcessor inner interface, where the processPrimary() method will be called to process the primary data/connection (and return a result), and processSecondary() called to deal with the secondary connection/data. The result of execution on a particular primary data set number can be retrieved by calling the getIndexedPrimaryResult() method, passing in the set number of the primary result to retrieve.
This class also provides an arbitrary storage mechanism for additional results beyond the primary type of result. It also takes care of invoking processing() and finished() on the client step's StepManager.
public class MyFunkyStep extends BaseStep
implements PairedDataHelper.PairedProcessor {
...
protected PairedDataHelper m_helper;
...
public void stepInit() {
m_helper = new PairedDataHelper(this, this,
StepManager.[CON_WHATEVER_YOUR_PRIMARY_CONNECTION_IS],
StepManager.[CON_WHATEVER_YOUR_SECONDARY_CONNECTION_IS]);
...
}
public void processIncoming(Data data) throws WekaException {
// delegate to our helper to handle primary/secondary synchronization
// issues
m_helper.process(data);
}
public MyFunkyMainResult processPrimary(Integer setNum, Integer maxSetNun,
Data data, PairedDataHelper helper) throws WekaException {
SomeDataTypeToProcess someData = data.getPrimaryPayload();
MyFunkyMainResult processor = new MyFunkyMainResult();
// do some processing using MyFunkyMainResult and SomeDataToProcess
...
// output some data to downstream steps if necessary
...
return processor;
}
public void processSecondary(Integer setNum, Integer maxSetNum, Data data,
PairedDataHelper helper) throws WekaException {
SomeDataTypeToProcess someData = data.getPrimaryPayload();
// get the MyFunkyMainResult for this set number
MyFunkyMainResult result = helper.getIndexedPrimaryResult(setNum);
// do some stuff with the result and the secondary data
...
// output some data to downstream steps if necessary
}
}
| Modifier and Type | Class and Description |
|---|---|
static interface |
PairedDataHelper.PairedProcessor<P>
Interface for processors of paired data to implement.
|
| Modifier and Type | Field and Description |
|---|---|
protected java.util.Map<java.lang.String,java.util.Map<java.lang.Integer,java.lang.Object>> |
m_namedIndexedStore
Storage of arbitrary indexed results computed during execution of
PairedProcessor.processPrimary()
|
protected Step |
m_ownerStep
The step that owns this helper
|
protected java.lang.String |
m_primaryConType
The type of connection to route to PairedProcessor.processPrimary()
|
protected java.util.Map<java.lang.Integer,P> |
m_primaryResultMap
Storage of the indexed primary result
|
protected PairedDataHelper.PairedProcessor |
m_processor
The PairedProcessor implementation that will do the actual work
|
protected java.lang.String |
m_secondaryConType
The type of connection to route to PairedProcessor.processSecondary()
|
protected java.util.Map<java.lang.Integer,Data> |
m_secondaryDataMap
Holds the secondary data objects, if they arrive before the corresponding
primary has been computed
|
protected java.util.concurrent.atomic.AtomicInteger |
m_setCount
Keep track of completed primary/secondary pairs
|
| Constructor and Description |
|---|
PairedDataHelper(Step owner,
PairedDataHelper.PairedProcessor processor,
java.lang.String primaryConType,
java.lang.String secondaryConType)
Constructor
|
| Modifier and Type | Method and Description |
|---|---|
void |
addIndexedValueToNamedStore(java.lang.String storeName,
java.lang.Integer index,
java.lang.Object value)
Adds a value to a named store with the given index.
|
void |
createNamedIndexedStore(java.lang.String name)
Create a indexed store with a given name
|
P |
getIndexedPrimaryResult(int index)
Retrieve the primary result corresponding to a given set number
|
<T> T |
getIndexedValueFromNamedStore(java.lang.String storeName,
java.lang.Integer index)
Gets an indexed value from a named store
|
boolean |
isFinished()
Return true if there is no further processing to be done
|
void |
process(Data data)
Initiate routing and processing for a particular data object
|
void |
reset()
Reset the helper.
|
protected java.util.Map<java.lang.String,java.util.Map<java.lang.Integer,java.lang.Object>> m_namedIndexedStore
protected java.util.Map<java.lang.Integer,P> m_primaryResultMap
protected java.util.Map<java.lang.Integer,Data> m_secondaryDataMap
protected java.lang.String m_primaryConType
protected java.lang.String m_secondaryConType
protected transient PairedDataHelper.PairedProcessor m_processor
protected transient Step m_ownerStep
protected transient java.util.concurrent.atomic.AtomicInteger m_setCount
public PairedDataHelper(Step owner, PairedDataHelper.PairedProcessor processor, java.lang.String primaryConType, java.lang.String secondaryConType)
owner - the owner stepprocessor - the PairedProcessor implementationprimaryConType - the primary connection typesecondaryConType - the secondary connection typepublic void process(Data data) throws WekaException
data - the data object to processWekaException - if a problem occurspublic P getIndexedPrimaryResult(int index)
index - the set number of the result to getpublic void reset()
public boolean isFinished()
public void createNamedIndexedStore(java.lang.String name)
name - the name of the store to createpublic <T> T getIndexedValueFromNamedStore(java.lang.String storeName,
java.lang.Integer index)
T - the type of the valuestoreName - the name of the store to retrieve fromindex - the index of the value to getpublic void addIndexedValueToNamedStore(java.lang.String storeName,
java.lang.Integer index,
java.lang.Object value)
storeName - the name of the store to add toindex - the index to associate with the valuevalue - the value to store