@KFStep(name="SubstringLabeler", category="Tools", toolTipText="Label instances according to substring matches in String attributes The user can specify the attributes to match against and associated label to create by defining \'match\' rules. A new attribute is appended to the data to contain the label. Rules are applied in order when processing instances, and the label associated with the first matching rule is applied. Non-matching instances can either receive a missing value for the label attribute or be \'consumed\' (i.e. they are not output).", iconPath="weka/gui/knowledgeflow/icons/DefaultFilter.gif") public class SubstringLabeler extends BaseStep
| Modifier and Type | Field and Description |
|---|---|
protected Add |
m_addFilter
Add filter for adding the new attribute
|
protected java.lang.String |
m_attName
Name of the new attribute
|
protected boolean |
m_consumeNonMatchingInstances
For multi-valued labeled rules, whether or not to consume non-matching
instances or output them with missing value for the match attribute.
|
protected boolean |
m_isReset
Step has been reset - i.e. start of processing?
|
protected java.lang.String |
m_matchDetails
Internally encoded list of match rules
|
protected SubstringLabelerRules |
m_matches
Encapsulates our match rules
|
protected boolean |
m_nominalBinary
Whether to make the binary match/non-match attribute a nominal (rather than
numeric) binary attribute.
|
protected boolean |
m_streaming
Streaming instances?
|
protected Data |
m_streamingData
Reusable data object for output
|
m_stepIsResourceIntensive, m_stepManager, m_stepName| Constructor and Description |
|---|
SubstringLabeler() |
| Modifier and Type | Method and Description |
|---|---|
boolean |
getConsumeNonMatching()
Get whether instances that do not match any of the rules should be
"consumed" rather than output with a missing value set for the new
attribute.
|
java.lang.String |
getCustomEditorForStep()
Return the fully qualified name of a custom editor component (JComponent)
to use for editing the properties of the step.
|
java.util.List<java.lang.String> |
getIncomingConnectionTypes()
Get a list of incoming connection types that this step can accept.
|
java.lang.String |
getMatchAttributeName()
Get the name of the new attribute that is created to indicate the match
|
java.lang.String |
getMatchDetails()
Get the internally encoded list of match rules
|
boolean |
getNominalBinary()
Get whether the new attribute created should be a nominal binary attribute
rather than a numeric binary attribute.
|
java.util.List<java.lang.String> |
getOutgoingConnectionTypes()
Get a list of outgoing connection types that this step can produce.
|
Instances |
outputStructureForConnectionType(java.lang.String connectionName)
If possible, get the output structure for the named connection type as a
header-only set of instances.
|
protected void |
processBatch(Data data)
Process a batch data object
|
void |
processIncoming(Data data)
Process an incoming data payload (if the step accepts incoming connections)
|
protected void |
processStreaming(Data data)
Processes a streaming data object
|
void |
setConsumeNonMatching(boolean consume)
Set whether instances that do not match any of the rules should be
"consumed" rather than output with a missing value set for the new
attribute.
|
void |
setMatchAttributeName(java.lang.String name)
Set the name of the new attribute that is created to indicate the match
|
void |
setMatchDetails(java.lang.String details)
Set internally encoded list of match rules
|
void |
setNominalBinary(boolean nom)
Set whether the new attribute created should be a nominal binary attribute
rather than a numeric binary attribute.
|
void |
stepInit()
Initialize the step
|
environmentSubstitute, getDefaultSettings, getInteractiveViewers, getInteractiveViewersImpls, getName, getStepManager, globalInfo, isResourceIntensive, isStopRequested, setName, setStepIsResourceIntensive, setStepManager, setStepMustRunSingleThreaded, start, stepMustRunSingleThreaded, stopprotected java.lang.String m_matchDetails
protected transient SubstringLabelerRules m_matches
protected boolean m_nominalBinary
protected boolean m_consumeNonMatchingInstances
protected Add m_addFilter
protected java.lang.String m_attName
protected boolean m_isReset
protected Data m_streamingData
protected boolean m_streaming
@ProgrammaticProperty public void setMatchDetails(java.lang.String details)
details - the list of match rulespublic java.lang.String getMatchDetails()
@OptionMetadata(displayName="Make a nominal binary attribute", description="Whether to encode the new attribute as nominal when it is binary (as opposed to numeric)", displayOrder=1) public void setNominalBinary(boolean nom)
nom - true if the attribute should be a nominal binary onepublic boolean getNominalBinary()
@OptionMetadata(displayName="Consume non matching instances", description="Instances that do not match any rules will be consumed, rather than being output with a missing value for the new attribute", displayOrder=2) public void setConsumeNonMatching(boolean consume)
consume - true if non matching instances should be consumed by the
component.public boolean getConsumeNonMatching()
@OptionMetadata(displayName="Name of the new attribute", description="Name to give the new attribute", displayOrder=0) public void setMatchAttributeName(java.lang.String name)
name - the name of the new attributepublic java.lang.String getMatchAttributeName()
public void stepInit()
throws WekaException
WekaException - if a problem occurspublic java.util.List<java.lang.String> getIncomingConnectionTypes()
public java.util.List<java.lang.String> getOutgoingConnectionTypes()
public void processIncoming(Data data) throws WekaException
processIncoming in interface BaseStepExtenderprocessIncoming in interface StepprocessIncoming in class BaseStepdata - the data to processWekaException - if a problem occursprotected void processStreaming(Data data) throws WekaException
data - the data to processWekaException - if a problem occursprotected void processBatch(Data data) throws WekaException
data - the data to processWekaException - if a problem occurspublic Instances outputStructureForConnectionType(java.lang.String connectionName) throws WekaException
outputStructureForConnectionType in interface StepoutputStructureForConnectionType in class BaseStepconnectionName - the name of the connection type to get the output
structure forWekaException - if a problem occurspublic java.lang.String getCustomEditorForStep()
getCustomEditorForStep in interface StepgetCustomEditorForStep in class BaseStep