public class Discretize extends Filter implements SupervisedFilter, OptionHandler, WeightedInstancesHandler, WeightedAttributesHandler, TechnicalInformationHandler
@inproceedings{Fayyad1993,
author = {Usama M. Fayyad and Keki B. Irani},
booktitle = {Thirteenth International Joint Conference on Articial Intelligence},
pages = {1022-1027},
publisher = {Morgan Kaufmann Publishers},
title = {Multi-interval discretization of continuousvalued attributes for classification learning},
volume = {2},
year = {1993}
}
@inproceedings{Kononenko1995,
author = {Igor Kononenko},
booktitle = {14th International Joint Conference on Articial Intelligence},
pages = {1034-1040},
title = {On Biases in Estimating Multi-Valued Attributes},
year = {1995},
PS = {http://ai.fri.uni-lj.si/papers/kononenko95-ijcai.ps.gz}
}
Valid options are:
-R <col1,col2-col4,...> Specifies list of columns to Discretize. First and last are valid indexes. (default none)
-V Invert matching sense of column indexes.
-D Output binary attributes for discretized attributes.
-Y Use bin numbers rather than ranges for discretized attributes.
-E Use better encoding of split point for MDL.
-K Use Kononenko's MDL criterion.
-precision <integer> Precision for bin boundary labels. (default = 6 decimal places).
-spread-attribute-weight When generating binary attributes, spread weight of old attribute across new attributes. Do not give each new attribute the old weight.
| Modifier and Type | Field and Description |
|---|---|
protected int |
m_BinRangePrecision
Precision for bin range labels
|
protected double[][] |
m_CutPoints
Store the current cutpoints
|
protected Range |
m_DiscretizeCols
Stores which columns to Discretize
|
protected boolean |
m_MakeBinary
Output binary attributes for discretized attributes.
|
protected boolean |
m_SpreadAttributeWeight
Whether to spread attribute weight when creating binary attributes
|
protected boolean |
m_UseBetterEncoding
Use better encoding of split point for MDL.
|
protected boolean |
m_UseBinNumbers
Use bin numbers rather than ranges for discretized attributes.
|
protected boolean |
m_UseKononenko
Use Kononenko's MDL criterion instead of Fayyad et al.'
|
m_Debug, m_DoNotCheckCapabilities, m_FirstBatchDone, m_InputRelAtts, m_InputStringAtts, m_NewBatch, m_OutputRelAtts, m_OutputStringAtts| Constructor and Description |
|---|
Discretize()
Constructor - initialises the filter
|
| Modifier and Type | Method and Description |
|---|---|
java.lang.String |
attributeIndicesTipText()
Returns the tip text for this property
|
boolean |
batchFinished()
Signifies that this batch of input to the filter is finished.
|
java.lang.String |
binRangePrecisionTipText()
Returns the tip text for this property
|
protected void |
calculateCutPoints()
Generate the cutpoints for each attribute
|
protected void |
calculateCutPointsByMDL(int index,
Instances data)
Set cutpoints for a single attribute using MDL.
|
protected void |
convertInstance(Instance instance)
Convert a single instance over.
|
java.lang.String |
getAttributeIndices()
Gets the current range selection
|
int |
getBinRangePrecision()
Get the precision for bin boundaries.
|
java.lang.String |
getBinRangesString(int attributeIndex)
Gets the bin ranges string for an attribute
|
Capabilities |
getCapabilities()
Returns the Capabilities of this filter.
|
double[] |
getCutPoints(int attributeIndex)
Gets the cut points for an attribute
|
boolean |
getInvertSelection()
Gets whether the supplied columns are to be removed or kept
|
boolean |
getMakeBinary()
Gets whether binary attributes should be made for discretized ones.
|
java.lang.String[] |
getOptions()
Gets the current settings of the filter.
|
java.lang.String |
getRevision()
Returns the revision string.
|
boolean |
getSpreadAttributeWeight()
If true, when generating binary attributes, spread weight of old attribute across new attributes.
|
TechnicalInformation |
getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the
technical background of this class, e.g., paper reference or book this class is based on.
|
boolean |
getUseBetterEncoding()
Gets whether better encoding is to be used for MDL.
|
boolean |
getUseBinNumbers()
Gets whether bin numbers rather than ranges should be used for discretized attributes.
|
boolean |
getUseKononenko()
Gets whether Kononenko's MDL criterion is to be used.
|
java.lang.String |
globalInfo()
Returns a string describing this filter
|
boolean |
input(Instance instance)
Input an instance for filtering.
|
java.lang.String |
invertSelectionTipText()
Returns the tip text for this property
|
java.util.Enumeration<Option> |
listOptions()
Gets an enumeration describing the available options.
|
static void |
main(java.lang.String[] argv)
Main method for testing this class.
|
java.lang.String |
makeBinaryTipText()
Returns the tip text for this property
|
void |
setAttributeIndices(java.lang.String rangeList)
Sets which attributes are to be Discretized (only numeric attributes among the selection will be
Discretized).
|
void |
setAttributeIndicesArray(int[] attributes)
Sets which attributes are to be Discretized (only numeric attributes among the selection will be
Discretized).
|
void |
setBinRangePrecision(int p)
Set the precision for bin boundaries.
|
boolean |
setInputFormat(Instances instanceInfo)
Sets the format of the input instances.
|
void |
setInvertSelection(boolean invert)
Sets whether selected columns should be removed or kept.
|
void |
setMakeBinary(boolean makeBinary)
Sets whether binary attributes should be made for discretized ones.
|
void |
setOptions(java.lang.String[] options)
Parses a given list of options.
|
protected void |
setOutputFormat()
Set the output format.
|
void |
setSpreadAttributeWeight(boolean p)
If true, when generating binary attributes, spread weight of old attribute across new attributes.
|
void |
setUseBetterEncoding(boolean useBetterEncoding)
Sets whether better encoding is to be used for MDL.
|
void |
setUseBinNumbers(boolean useBinNumbers)
Sets whether bin numbers rather than ranges should be used for discretized attributes.
|
void |
setUseKononenko(boolean useKon)
Sets whether Kononenko's MDL criterion is to be used.
|
java.lang.String |
spreadAttributeWeightTipText()
Returns the tip text for this property
|
java.lang.String |
useBetterEncodingTipText()
Returns the tip text for this property
|
java.lang.String |
useBinNumbersTipText()
Returns the tip text for this property
|
java.lang.String |
useKononenkoTipText()
Returns the tip text for this property
|
batchFilterFile, bufferInput, copyValues, copyValues, debugTipText, doNotCheckCapabilitiesTipText, filterFile, flushInput, getCapabilities, getCopyOfInputFormat, getDebug, getDoNotCheckCapabilities, getInputFormat, getOutputFormat, initInputLocators, initOutputLocators, inputFormatPeek, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputFormatPeek, outputPeek, postExecution, preExecution, push, push, resetQueue, run, runFilter, setDebug, setDoNotCheckCapabilities, setOutputFormat, testInputFormat, toString, useFilter, wekaStaticWrapperprotected Range m_DiscretizeCols
protected double[][] m_CutPoints
protected boolean m_MakeBinary
protected boolean m_UseBinNumbers
protected boolean m_UseBetterEncoding
protected boolean m_UseKononenko
protected int m_BinRangePrecision
protected boolean m_SpreadAttributeWeight
public java.util.Enumeration<Option> listOptions()
listOptions in interface OptionHandlerlistOptions in class Filterpublic void setOptions(java.lang.String[] options)
throws java.lang.Exception
-R <col1,col2-col4,...> Specifies list of columns to Discretize. First and last are valid indexes. (default none)
-V Invert matching sense of column indexes.
-D Output binary attributes for discretized attributes.
-Y Use bin numbers rather than ranges for discretized attributes.
-E Use better encoding of split point for MDL.
-K Use Kononenko's MDL criterion.
-precision <integer> Precision for bin boundary labels. (default = 6 decimal places).
-spread-attribute-weight When generating binary attributes, spread weight of old attribute across new attributes. Do not give each new attribute the old weight.
setOptions in interface OptionHandlersetOptions in class Filteroptions - the list of options as an array of stringsjava.lang.Exception - if an option is not supportedpublic java.lang.String[] getOptions()
getOptions in interface OptionHandlergetOptions in class Filterpublic Capabilities getCapabilities()
getCapabilities in interface CapabilitiesHandlergetCapabilities in class FilterCapabilitiespublic boolean setInputFormat(Instances instanceInfo) throws java.lang.Exception
setInputFormat in class FilterinstanceInfo - an Instances object containing the input instance structure (any instances contained in
the object are ignored - only the structure is required).java.lang.Exception - if the input format can't be set successfullypublic boolean input(Instance instance)
public boolean batchFinished()
throws java.lang.InterruptedException
batchFinished in class Filterjava.lang.InterruptedExceptionjava.lang.IllegalStateException - if no input structure has been definedpublic java.lang.String globalInfo()
public TechnicalInformation getTechnicalInformation()
getTechnicalInformation in interface TechnicalInformationHandlerpublic java.lang.String spreadAttributeWeightTipText()
public void setSpreadAttributeWeight(boolean p)
p - whether weight is spreadpublic boolean getSpreadAttributeWeight()
public java.lang.String binRangePrecisionTipText()
public void setBinRangePrecision(int p)
p - the precision for bin boundariespublic int getBinRangePrecision()
public java.lang.String makeBinaryTipText()
public boolean getMakeBinary()
public void setMakeBinary(boolean makeBinary)
makeBinary - if binary attributes are to be madepublic java.lang.String useBinNumbersTipText()
public boolean getUseBinNumbers()
public void setUseBinNumbers(boolean useBinNumbers)
useBinNumbers - if bin numbers should be usedpublic java.lang.String useKononenkoTipText()
public boolean getUseKononenko()
public void setUseKononenko(boolean useKon)
useKon - true if Kononenko's one is to be usedpublic java.lang.String useBetterEncodingTipText()
public boolean getUseBetterEncoding()
public void setUseBetterEncoding(boolean useBetterEncoding)
useBetterEncoding - true if better encoding to be used.public java.lang.String invertSelectionTipText()
public boolean getInvertSelection()
public void setInvertSelection(boolean invert)
invert - the new invert settingpublic java.lang.String attributeIndicesTipText()
public java.lang.String getAttributeIndices()
public void setAttributeIndices(java.lang.String rangeList)
rangeList - a string representing the list of attributes. Since the string will typically come from
a user, attributes are indexed from 1. java.lang.IllegalArgumentException - if an invalid range list is suppliedpublic void setAttributeIndicesArray(int[] attributes)
attributes - an array containing indexes of attributes to Discretize. Since the array will typically
come from a program, attributes are indexed from 0.java.lang.IllegalArgumentException - if an invalid set of ranges is suppliedpublic double[] getCutPoints(int attributeIndex)
attributeIndex - the index (from 0) of the attribute to get the cut points ofpublic java.lang.String getBinRangesString(int attributeIndex)
attributeIndex - the index (from 0) of the attribute to get the bin ranges string ofprotected void calculateCutPoints()
throws java.lang.InterruptedException
java.lang.InterruptedExceptionprotected void calculateCutPointsByMDL(int index,
Instances data)
throws java.lang.InterruptedException
index - the index of the attribute to set cutpoints fordata - the data to work withjava.lang.InterruptedExceptionprotected void setOutputFormat()
protected void convertInstance(Instance instance)
instance - the instance to convertpublic java.lang.String getRevision()
getRevision in interface RevisionHandlergetRevision in class Filterpublic static void main(java.lang.String[] argv)
argv - should contain arguments to the filter: use -h for help