public class InterquartileRange extends SimpleBatchFilter implements WeightedAttributesHandler
-D Turns on output of debugging information.
-R <col1,col2-col4,...> Specifies list of columns to base outlier/extreme value detection on. If an instance is considered in at least one of those attributes an outlier/extreme value, it is tagged accordingly. 'first' and 'last' are valid indexes. (default none)
-O <num> The factor for outlier detection. (default: 3)
-E <num> The factor for extreme values detection. (default: 2*Outlier Factor)
-E-as-O Tags extreme values also as outliers. (default: off)
-P Generates Outlier/ExtremeValue pair for each numeric attribute in the range, not just a single indicator pair for all the attributes. (default: off)
-M
Generates an additional attribute 'Offset' per Outlier/ExtremeValue
pair that contains the multiplier that the value is off the median.
value = median + 'multiplier' * IQR
Note: implicitely sets '-P'. (default: off)
Thanks to Dale for a few brainstorming sessions.| Modifier and Type | Class and Description |
|---|---|
static class |
InterquartileRange.ValueType
enum for obtaining the various determined IQR values.
|
| Modifier and Type | Field and Description |
|---|---|
protected int[] |
m_AttributeIndices
the generated indices (only for performance reasons)
|
protected Range |
m_Attributes
the attribute range to work on
|
protected boolean |
m_DetectionPerAttribute
whether to generate Outlier/ExtremeValue attributes for each attribute
instead of a general one
|
protected boolean |
m_ExtremeValuesAsOutliers
whether extreme values are also tagged as outliers
|
protected double |
m_ExtremeValuesFactor
the factor for detecting extreme values, by default 2*m_OutlierFactor
|
protected double[] |
m_IQR
the interquartile range
|
protected double[] |
m_LowerExtremeValue
the lower extreme value threshold (= Q1 - EVF*IQR)
|
protected double[] |
m_LowerOutlier
the lower outlier threshold (= Q1 - OF*IQR)
|
protected double[] |
m_Median
the median
|
protected int[] |
m_OutlierAttributePosition
the position of the outlier attribute
|
protected double |
m_OutlierFactor
the factor for detecting outliers
|
protected boolean |
m_OutputOffsetMultiplier
whether to add another attribute called "Offset", that lists the
'multiplier' by which the outlier/extreme value is away from the median,
i.e., value = median + 'multiplier' * IQR
automatically enables m_DetectionPerAttribute! |
protected double[] |
m_UpperExtremeValue
the upper extreme value threshold (= Q3 + EVF*IQR)
|
protected double[] |
m_UpperOutlier
the upper outlier threshold (= Q3 + OF*IQR)
|
static int |
NON_NUMERIC
indicator for non-numeric attributes
|
m_Debug, m_DoNotCheckCapabilities, m_FirstBatchDone, m_InputRelAtts, m_InputStringAtts, m_NewBatch, m_OutputRelAtts, m_OutputStringAtts| Constructor and Description |
|---|
InterquartileRange() |
| Modifier and Type | Method and Description |
|---|---|
java.lang.String |
attributeIndicesTipText()
Returns the tip text for this property
|
protected double |
calculateMultiplier(Instance inst,
int index)
returns the mulitplier of the IQR the instance is off the median for this
particular attribute.
|
protected void |
computeThresholds(Instances instances)
computes the thresholds for outliers and extreme values
|
java.lang.String |
detectionPerAttributeTipText()
Returns the tip text for this property
|
protected Instances |
determineOutputFormat(Instances inputFormat)
Determines the output format based on the input format and returns this.
|
java.lang.String |
extremeValuesAsOutliersTipText()
Returns the tip text for this property
|
java.lang.String |
extremeValuesFactorTipText()
Returns the tip text for this property
|
java.lang.String |
getAttributeIndices()
Gets the current range selection
|
Capabilities |
getCapabilities()
Returns the Capabilities of this filter.
|
boolean |
getDetectionPerAttribute()
Gets whether an Outlier/ExtremeValue attribute pair is generated for each
numeric attribute ("true") or just one pair for all numeric attributes
together ("false").
|
boolean |
getExtremeValuesAsOutliers()
Get whether extreme values are also tagged as outliers.
|
double |
getExtremeValuesFactor()
Gets the factor for determining the thresholds for extreme values.
|
java.lang.String[] |
getOptions()
Gets the current settings of the filter.
|
double |
getOutlierFactor()
Gets the factor for determining the thresholds for outliers.
|
boolean |
getOutputOffsetMultiplier()
Gets whether an additional attribute "Offset" is generated per
Outlier/ExtremeValue attribute pair that lists the multiplier the value is
off the median: value = median + 'multiplier' * IQR.
|
java.lang.String |
getRevision()
Returns the revision string.
|
double[] |
getValues(InterquartileRange.ValueType type)
Returns the values for the specified type.
|
java.lang.String |
globalInfo()
Returns a string describing this filter
|
protected boolean |
isExtremeValue(Instance inst)
returns whether the instance is an extreme value or not
|
protected boolean |
isExtremeValue(Instance inst,
int index)
returns whether the instance has an extreme value in the specified
attribute or not
|
protected boolean |
isOutlier(Instance inst)
returns whether the instance is an outlier or not
|
protected boolean |
isOutlier(Instance inst,
int index)
returns whether the instance has an outlier in the specified attribute or
not
|
java.util.Enumeration<Option> |
listOptions()
Returns an enumeration describing the available options.
|
static void |
main(java.lang.String[] args)
Main method for testing this class.
|
java.lang.String |
outlierFactorTipText()
Returns the tip text for this property
|
java.lang.String |
outputOffsetMultiplierTipText()
Returns the tip text for this property
|
protected Instances |
process(Instances instances)
Processes the given data (may change the provided dataset) and returns the
modified version.
|
void |
setAttributeIndices(java.lang.String value)
Sets which attributes are to be used for interquartile calculations and
outlier/extreme value detection (only numeric attributes among the
selection will be used).
|
void |
setAttributeIndicesArray(int[] value)
Sets which attributes are to be used for interquartile calculations and
outlier/extreme value detection (only numeric attributes among the
selection will be used).
|
void |
setDetectionPerAttribute(boolean value)
Set whether an Outlier/ExtremeValue attribute pair is generated for each
numeric attribute ("true") or just one pair for all numeric attributes
together ("false").
|
void |
setExtremeValuesAsOutliers(boolean value)
Set whether extreme values are also tagged as outliers.
|
void |
setExtremeValuesFactor(double value)
Sets the factor for determining the thresholds for extreme values.
|
void |
setOptions(java.lang.String[] options)
Parses a list of options for this object.
|
void |
setOutlierFactor(double value)
Sets the factor for determining the thresholds for outliers.
|
void |
setOutputOffsetMultiplier(boolean value)
Set whether an additional attribute "Offset" is generated per
Outlier/ExtremeValue attribute pair that lists the multiplier the value is
off the median: value = median + 'multiplier' * IQR.
|
allowAccessToFullInputFormat, batchFinished, hasImmediateOutputFormat, inputreset, setInputFormatbatchFilterFile, bufferInput, copyValues, copyValues, debugTipText, doNotCheckCapabilitiesTipText, filterFile, flushInput, getCapabilities, getCopyOfInputFormat, getDebug, getDoNotCheckCapabilities, getInputFormat, getOutputFormat, initInputLocators, initOutputLocators, inputFormatPeek, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputFormatPeek, outputPeek, postExecution, preExecution, push, push, resetQueue, run, runFilter, setDebug, setDoNotCheckCapabilities, setOutputFormat, testInputFormat, toString, useFilter, wekaStaticWrapperpublic static final int NON_NUMERIC
protected Range m_Attributes
protected int[] m_AttributeIndices
protected double m_OutlierFactor
protected double m_ExtremeValuesFactor
protected boolean m_ExtremeValuesAsOutliers
protected double[] m_UpperExtremeValue
protected double[] m_UpperOutlier
protected double[] m_LowerOutlier
protected double[] m_IQR
protected double[] m_Median
protected double[] m_LowerExtremeValue
protected boolean m_DetectionPerAttribute
protected int[] m_OutlierAttributePosition
protected boolean m_OutputOffsetMultiplier
public java.lang.String globalInfo()
globalInfo in class SimpleFilterpublic java.util.Enumeration<Option> listOptions()
listOptions in interface OptionHandlerlistOptions in class Filterpublic void setOptions(java.lang.String[] options)
throws java.lang.Exception
-D Turns on output of debugging information.
-R <col1,col2-col4,...> Specifies list of columns to base outlier/extreme value detection on. If an instance is considered in at least one of those attributes an outlier/extreme value, it is tagged accordingly. 'first' and 'last' are valid indexes. (default none)
-O <num> The factor for outlier detection. (default: 3)
-E <num> The factor for extreme values detection. (default: 2*Outlier Factor)
-E-as-O Tags extreme values also as outliers. (default: off)
-P Generates Outlier/ExtremeValue pair for each numeric attribute in the range, not just a single indicator pair for all the attributes. (default: off)
-M
Generates an additional attribute 'Offset' per Outlier/ExtremeValue
pair that contains the multiplier that the value is off the median.
value = median + 'multiplier' * IQR
Note: implicitely sets '-P'. (default: off)
setOptions in interface OptionHandlersetOptions in class Filteroptions - the list of options as an array of stringsjava.lang.Exception - if an option is not supportedpublic java.lang.String[] getOptions()
getOptions in interface OptionHandlergetOptions in class Filterpublic java.lang.String attributeIndicesTipText()
public java.lang.String getAttributeIndices()
public void setAttributeIndices(java.lang.String value)
value - a string representing the list of attributes. Since the string
will typically come from a user, attributes are indexed from 1. java.lang.IllegalArgumentException - if an invalid range list is suppliedpublic void setAttributeIndicesArray(int[] value)
value - an array containing indexes of attributes to work on. Since
the array will typically come from a program, attributes are
indexed from 0.java.lang.IllegalArgumentException - if an invalid set of ranges is suppliedpublic java.lang.String outlierFactorTipText()
public void setOutlierFactor(double value)
value - the factor.public double getOutlierFactor()
public java.lang.String extremeValuesFactorTipText()
public void setExtremeValuesFactor(double value)
value - the factor.public double getExtremeValuesFactor()
public java.lang.String extremeValuesAsOutliersTipText()
public void setExtremeValuesAsOutliers(boolean value)
value - whether or not to tag extreme values also as outliers.public boolean getExtremeValuesAsOutliers()
public java.lang.String detectionPerAttributeTipText()
public void setDetectionPerAttribute(boolean value)
value - whether or not to generate indicator attribute pairs for each
numeric attribute.public boolean getDetectionPerAttribute()
public java.lang.String outputOffsetMultiplierTipText()
public void setOutputOffsetMultiplier(boolean value)
value - whether or not to generate the additional attribute.public boolean getOutputOffsetMultiplier()
public Capabilities getCapabilities()
getCapabilities in interface CapabilitiesHandlergetCapabilities in class FilterCapabilitiesprotected Instances determineOutputFormat(Instances inputFormat) throws java.lang.Exception
determineOutputFormat in class SimpleFilterinputFormat - the input format to base the output format onjava.lang.Exception - in case the determination goes wrongSimpleBatchFilter.hasImmediateOutputFormat(),
SimpleBatchFilter.batchFinished()protected void computeThresholds(Instances instances)
instances - the data to work onpublic double[] getValues(InterquartileRange.ValueType type)
type - the type of values to returnprotected boolean isOutlier(Instance inst, int index)
inst - the instance to testindex - the attribute indexprotected boolean isOutlier(Instance inst)
inst - the instance to testprotected boolean isExtremeValue(Instance inst, int index)
inst - the instance to testindex - the attribute indexprotected boolean isExtremeValue(Instance inst)
inst - the instance to testprotected double calculateMultiplier(Instance inst, int index)
inst - the instance to testindex - the attribute indexprotected Instances process(Instances instances) throws java.lang.Exception
process in class SimpleFilterinstances - the data to processjava.lang.Exception - in case the processing goes wrongSimpleBatchFilter.batchFinished()public java.lang.String getRevision()
getRevision in interface RevisionHandlergetRevision in class Filterpublic static void main(java.lang.String[] args)
args - should contain arguments to the filter: use -h for help