public class CheckClusterer extends CheckScheme
java weka.clusterers.CheckClusterer -W clusterer_name
-- clusterer_options
CheckClusterer reports on the following:
weka.clusterers.AbstractClustererTest uses this class to
test all the clusterers. Any changes here, have to be checked in that
abstract test class, too.
Valid options are:
-D Turn on debugging output.
-S Silent mode - prints nothing to stdout.
-N <num> The number of instances in the datasets (default 20).
-nominal <num> The number of nominal attributes (default 2).
-nominal-values <num> The number of values for nominal attributes (default 1).
-numeric <num> The number of numeric attributes (default 1).
-string <num> The number of string attributes (default 1).
-date <num> The number of date attributes (default 1).
-relational <num> The number of relational attributes (default 1).
-num-instances-relational <num> The number of instances in relational/bag attributes (default 10).
-words <comma-separated-list> The words to use in string attributes.
-word-separators <chars> The word separators to use in string attributes.
-W Full name of the clusterer analyzed. eg: weka.clusterers.SimpleKMeans (default weka.clusterers.SimpleKMeans)
Options specific to clusterer weka.clusterers.SimpleKMeans:
-N <num> number of clusters. (default 2).
-V Display std. deviations for centroids.
-M Replace missing values with mean/mode.
-S <num> Random number seed. (default 10)Options after -- are passed to the designated clusterer.
TestInstancesCheckScheme.PostProcessor| Modifier and Type | Field and Description |
|---|---|
protected Clusterer |
m_Clusterer
The clusterer to be examined
|
m_ClasspathProblems, m_NumDate, m_NumInstances, m_NumInstancesRelational, m_NumNominal, m_NumNumeric, m_NumRelational, m_NumString, m_PostProcessor, m_Words, m_WordSeparators| Constructor and Description |
|---|
CheckClusterer()
default constructor
|
| Modifier and Type | Method and Description |
|---|---|
protected void |
addMissing(Instances data,
int level,
boolean predictorMissing)
Add missing values to a dataset.
|
protected boolean[] |
canHandleMissing(boolean nominalPredictor,
boolean numericPredictor,
boolean stringPredictor,
boolean datePredictor,
boolean relationalPredictor,
boolean multiInstance,
boolean predictorMissing,
int missingLevel)
Checks basic missing value handling of the scheme.
|
protected boolean[] |
canHandleZeroTraining(boolean nominalPredictor,
boolean numericPredictor,
boolean stringPredictor,
boolean datePredictor,
boolean relationalPredictor,
boolean multiInstance)
Checks whether the scheme can handle zero training instances.
|
protected boolean[] |
canPredict(boolean nominalPredictor,
boolean numericPredictor,
boolean stringPredictor,
boolean datePredictor,
boolean relationalPredictor,
boolean multiInstance)
Checks basic prediction of the scheme, for simple non-troublesome datasets.
|
protected boolean[] |
canTakeOptions()
Checks whether the scheme can take command line options.
|
protected boolean[] |
correctBuildInitialisation(boolean nominalPredictor,
boolean numericPredictor,
boolean stringPredictor,
boolean datePredictor,
boolean relationalPredictor,
boolean multiInstance)
Checks whether the scheme correctly initialises models when buildClusterer
is called.
|
protected boolean[] |
datasetIntegrity(boolean nominalPredictor,
boolean numericPredictor,
boolean stringPredictor,
boolean datePredictor,
boolean relationalPredictor,
boolean multiInstance,
boolean predictorMissing)
Checks whether the scheme alters the training dataset during training.
|
protected boolean[] |
declaresSerialVersionUID()
tests for a serialVersionUID.
|
void |
doTests()
Begin the tests, reporting results to System.out
|
Clusterer |
getClusterer()
Get the clusterer used as the clusterer
|
java.lang.String[] |
getOptions()
Gets the current settings of the CheckClusterer.
|
java.lang.String |
getRevision()
Returns the revision string.
|
protected boolean[] |
instanceWeights(boolean nominalPredictor,
boolean numericPredictor,
boolean stringPredictor,
boolean datePredictor,
boolean relationalPredictor,
boolean multiInstance)
Checks whether the clusterer can handle instance weights.
|
java.util.Enumeration<Option> |
listOptions()
Returns an enumeration describing the available options.
|
static void |
main(java.lang.String[] args)
Test method for this class
|
protected Instances |
makeTestDataset(int seed,
int numInstances,
int numNominal,
int numNumeric,
int numString,
int numDate,
int numRelational,
boolean multiInstance)
Make a simple set of instances with variable position of the class
attribute, which can later be modified for use in specific tests.
|
protected boolean[] |
multiInstanceHandler()
Checks whether the scheme handles multi-instance data.
|
protected void |
printAttributeSummary(boolean nominalPredictor,
boolean numericPredictor,
boolean stringPredictor,
boolean datePredictor,
boolean relationalPredictor,
boolean multiInstance)
Print out a short summary string for the dataset characteristics
|
protected boolean[] |
runBasicTest(boolean nominalPredictor,
boolean numericPredictor,
boolean stringPredictor,
boolean datePredictor,
boolean relationalPredictor,
boolean multiInstance,
int missingLevel,
boolean predictorMissing,
int numTrain,
java.util.Vector<java.lang.String> accepts)
Runs a text on the datasets with the given characteristics.
|
protected void |
runTests(boolean weighted,
boolean multiInstance,
boolean updateable)
Run a battery of tests
|
void |
setClusterer(Clusterer newClusterer)
Set the clusterer for testing.
|
void |
setOptions(java.lang.String[] options)
Parses a given list of options.
|
protected boolean[] |
updateableClusterer()
Checks whether the scheme can build models incrementally.
|
protected boolean[] |
updatingEquality(boolean nominalPredictor,
boolean numericPredictor,
boolean stringPredictor,
boolean datePredictor,
boolean relationalPredictor,
boolean multiInstance)
Checks whether an updateable scheme produces the same model when trained
incrementally as when batch trained.
|
protected boolean[] |
weightedInstancesHandler()
Checks whether the scheme says it can handle instance weights.
|
addMissing, arrayToList, attributeTypeToString, compareDatasets, getNumDate, getNumInstances, getNumInstancesRelational, getNumNominal, getNumNumeric, getNumRelational, getNumString, getPostProcessor, getWords, getWordSeparators, hasClasspathProblems, listToArray, process, setNumDate, setNumInstances, setNumInstancesRelational, setNumNominal, setNumNumeric, setNumRelational, setNumString, setPostProcessor, setWords, setWordSeparatorsprotected Clusterer m_Clusterer
public java.util.Enumeration<Option> listOptions()
listOptions in interface OptionHandlerlistOptions in class CheckSchemepublic void setOptions(java.lang.String[] options)
throws java.lang.Exception
-D Turn on debugging output.
-S Silent mode - prints nothing to stdout.
-N <num> The number of instances in the datasets (default 20).
-nominal <num> The number of nominal attributes (default 2).
-nominal-values <num> The number of values for nominal attributes (default 1).
-numeric <num> The number of numeric attributes (default 1).
-string <num> The number of string attributes (default 1).
-date <num> The number of date attributes (default 1).
-relational <num> The number of relational attributes (default 1).
-num-instances-relational <num> The number of instances in relational/bag attributes (default 10).
-words <comma-separated-list> The words to use in string attributes.
-word-separators <chars> The word separators to use in string attributes.
-W Full name of the clusterer analyzed. eg: weka.clusterers.SimpleKMeans (default weka.clusterers.SimpleKMeans)
Options specific to clusterer weka.clusterers.SimpleKMeans:
-N <num> number of clusters. (default 2).
-V Display std. deviations for centroids.
-M Replace missing values with mean/mode.
-S <num> Random number seed. (default 10)
setOptions in interface OptionHandlersetOptions in class CheckSchemeoptions - the list of options as an array of stringsjava.lang.Exception - if an option is not supportedpublic java.lang.String[] getOptions()
getOptions in interface OptionHandlergetOptions in class CheckSchemepublic void doTests()
doTests in class CheckSchemepublic void setClusterer(Clusterer newClusterer)
newClusterer - the Clusterer to use.public Clusterer getClusterer()
protected void runTests(boolean weighted,
boolean multiInstance,
boolean updateable)
weighted - true if the clusterer says it handles weightsmultiInstance - true if the clusterer is a multi-instance clustererupdateable - true if the classifier is updateableprotected boolean[] canTakeOptions()
protected boolean[] updateableClusterer()
protected boolean[] weightedInstancesHandler()
protected boolean[] multiInstanceHandler()
protected boolean[] declaresSerialVersionUID()
protected boolean[] canPredict(boolean nominalPredictor,
boolean numericPredictor,
boolean stringPredictor,
boolean datePredictor,
boolean relationalPredictor,
boolean multiInstance)
nominalPredictor - if true use nominal predictor attributesnumericPredictor - if true use numeric predictor attributesstringPredictor - if true use string predictor attributesdatePredictor - if true use date predictor attributesrelationalPredictor - if true use relational predictor attributesmultiInstance - whether multi-instance is neededprotected boolean[] canHandleZeroTraining(boolean nominalPredictor,
boolean numericPredictor,
boolean stringPredictor,
boolean datePredictor,
boolean relationalPredictor,
boolean multiInstance)
nominalPredictor - if true use nominal predictor attributesnumericPredictor - if true use numeric predictor attributesstringPredictor - if true use string predictor attributesdatePredictor - if true use date predictor attributesrelationalPredictor - if true use relational predictor attributesmultiInstance - whether multi-instance is neededprotected boolean[] correctBuildInitialisation(boolean nominalPredictor,
boolean numericPredictor,
boolean stringPredictor,
boolean datePredictor,
boolean relationalPredictor,
boolean multiInstance)
nominalPredictor - if true use nominal predictor attributesnumericPredictor - if true use numeric predictor attributesstringPredictor - if true use string predictor attributesdatePredictor - if true use date predictor attributesrelationalPredictor - if true use relational predictor attributesmultiInstance - whether multi-instance is neededprotected boolean[] canHandleMissing(boolean nominalPredictor,
boolean numericPredictor,
boolean stringPredictor,
boolean datePredictor,
boolean relationalPredictor,
boolean multiInstance,
boolean predictorMissing,
int missingLevel)
nominalPredictor - if true use nominal predictor attributesnumericPredictor - if true use numeric predictor attributesstringPredictor - if true use string predictor attributesdatePredictor - if true use date predictor attributesrelationalPredictor - if true use relational predictor attributesmultiInstance - whether multi-instance is neededpredictorMissing - true if the missing values may be in the predictorsmissingLevel - the percentage of missing valuesprotected boolean[] instanceWeights(boolean nominalPredictor,
boolean numericPredictor,
boolean stringPredictor,
boolean datePredictor,
boolean relationalPredictor,
boolean multiInstance)
nominalPredictor - if true use nominal predictor attributesnumericPredictor - if true use numeric predictor attributesstringPredictor - if true use string predictor attributesdatePredictor - if true use date predictor attributesrelationalPredictor - if true use relational predictor attributesmultiInstance - whether multi-instance is neededprotected boolean[] datasetIntegrity(boolean nominalPredictor,
boolean numericPredictor,
boolean stringPredictor,
boolean datePredictor,
boolean relationalPredictor,
boolean multiInstance,
boolean predictorMissing)
nominalPredictor - if true use nominal predictor attributesnumericPredictor - if true use numeric predictor attributesstringPredictor - if true use string predictor attributesdatePredictor - if true use date predictor attributesrelationalPredictor - if true use relational predictor attributesmultiInstance - whether multi-instance is neededpredictorMissing - true if we know the clusterer can handle (at least)
moderate missing predictor valuesprotected boolean[] updatingEquality(boolean nominalPredictor,
boolean numericPredictor,
boolean stringPredictor,
boolean datePredictor,
boolean relationalPredictor,
boolean multiInstance)
nominalPredictor - if true use nominal predictor attributesnumericPredictor - if true use numeric predictor attributesstringPredictor - if true use string predictor attributesdatePredictor - if true use date predictor attributesrelationalPredictor - if true use relational predictor attributesmultiInstance - whether multi-instance is neededprotected boolean[] runBasicTest(boolean nominalPredictor,
boolean numericPredictor,
boolean stringPredictor,
boolean datePredictor,
boolean relationalPredictor,
boolean multiInstance,
int missingLevel,
boolean predictorMissing,
int numTrain,
java.util.Vector<java.lang.String> accepts)
nominalPredictor - if true use nominal predictor attributesnumericPredictor - if true use numeric predictor attributesstringPredictor - if true use string predictor attributesdatePredictor - if true use date predictor attributesrelationalPredictor - if true use relational predictor attributesmultiInstance - whether multi-instance is neededmissingLevel - the percentage of missing valuespredictorMissing - true if the missing values may be in the predictorsnumTrain - the number of instances in the training setaccepts - the acceptable string in an exceptionprotected void addMissing(Instances data, int level, boolean predictorMissing)
data - the instances to add missing values tolevel - the level of missing values to add (if positive, this is the
probability that a value will be set to missing, if negative all
but one value will be set to missing (not yet implemented))predictorMissing - if true, predictor attributes will be modifiedprotected Instances makeTestDataset(int seed, int numInstances, int numNominal, int numNumeric, int numString, int numDate, int numRelational, boolean multiInstance) throws java.lang.Exception
seed - the random number seednumInstances - the number of instances to generatenumNominal - the number of nominal attributesnumNumeric - the number of numeric attributesnumString - the number of string attributesnumDate - the number of date attributesnumRelational - the number of relational attributesmultiInstance - whether the dataset should a multi-instance datasetjava.lang.Exception - if the dataset couldn't be generatedTestInstances.CLASS_IS_LASTprotected void printAttributeSummary(boolean nominalPredictor,
boolean numericPredictor,
boolean stringPredictor,
boolean datePredictor,
boolean relationalPredictor,
boolean multiInstance)
nominalPredictor - true if nominal predictor attributes are presentnumericPredictor - true if numeric predictor attributes are presentstringPredictor - true if string predictor attributes are presentdatePredictor - true if date predictor attributes are presentrelationalPredictor - true if relational predictor attributes are
presentmultiInstance - whether multi-instance is neededpublic java.lang.String getRevision()
public static void main(java.lang.String[] args)
args - the commandline options