Package ai.libs.jaicore.basic.kvstore
Class KVStoreStatisticsUtil
- java.lang.Object
-
- ai.libs.jaicore.basic.kvstore.KVStoreStatisticsUtil
-
public class KVStoreStatisticsUtil extends java.lang.ObjectThis util may be used to compute some statistics and carrying out significance tests. In particular implementations for three different significance tests are provided: t-test - requirements: data distribution must follow a normal distribution and it must be sampled independently from the two populations. Wilcoxon signed-rank test - requirements: sample variables d_i = x_i,1 - x_i,2 have to be iid and symmetric. MannWhitneyU - requirements: all observations from both groups are independent of each other, responses are (at least) ordinal, i.e. one can say which one is better.
-
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static java.util.Map<java.lang.String,org.apache.commons.math3.stat.descriptive.DescriptiveStatistics>averageRank(KVStoreCollection groupedAll, java.lang.String sampleIDs, java.lang.String rank)Computes a statistic of average rankings for sampleIDs.static voidbest(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues)For each setting this method finds the best mean value for settingsettingamong all thesampleIDsaveraging thesampledValues(minimization).static voidbest(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues, java.lang.String output)For each setting this method finds the best mean value for settingsettingamong all thesampleIDsaveraging thesampledValues(minimization).static voidbest(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues, java.lang.String output, boolean minimize)For each setting this method finds the best mean value for settingsettingamong all thesampleIDsaveraging thesampledValues(minimization).static voidbest(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues, java.util.Set<java.lang.String> sampleIDsToConsider, java.lang.String output, boolean minimize)For each setting this method finds the best mean value for settingsettingamong all thesampleIDsaveraging thesampledValues(minimization).static voidbestFilter(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues)This method searches for the best performing KVStores and afterwards projects the collection to the subset of best KVStore per setting.static voidbestFilter(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues, java.lang.String output)This method searches for the best performing KVStores and afterwards projects the collection to the subset of best KVStore per setting.static voidbestTTest(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues, java.lang.String output)Computes a t-test for each setting comparing the best population to the others.static voidbestWilcoxonSignedRankTest(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String pairingIndices, java.lang.String sampledValues, java.lang.String output)static voidmannWhitneyU(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues, java.lang.String nameOfTestPopulation, java.lang.String output)Computes a (pair-wise) 1-to-n MannWhitneyU statistic to compare a single sample from one population to each other sample of the other populations.static voidrank(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues)static voidrank(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues, java.lang.String output)static voidrank(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues, java.lang.String output, boolean minimize)static voidtTest(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues, java.lang.String nameOfTestPopulation, java.lang.String output)Carries out a t-test (which requires the tested populations to stem from a normal distribution) to make a pair-wise 1-to-n test.static KVStoreCollectionwilcoxonSignedRankTest(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleIDs, java.lang.String pairingIndex, java.lang.String sampledValues, java.lang.String nameOfTestPopulation, java.lang.String output)Computes a 1-to-n Wilcoxon signed rank test to compare a single sample to each other sample of the collection.
-
-
-
Method Detail
-
best
public static void best(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues)
For each setting this method finds the best mean value for settingsettingamong all thesampleIDsaveraging thesampledValues(minimization).- Parameters:
collection- The collection of KVStores.setting- The field name of the setting description, e.g. dataset.sampleID- The field name of the ids for the different populations, e.g. algorithm.sampledValues- The field name of the values sampled from the populations, e.g. error rates.
-
rank
public static void rank(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues)
-
rank
public static void rank(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues, java.lang.String output)
-
rank
public static void rank(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues, java.lang.String output, boolean minimize)
-
best
public static void best(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues, java.lang.String output, boolean minimize)
For each setting this method finds the best mean value for settingsettingamong all thesampleIDsaveraging thesampledValues(minimization).- Parameters:
collection- The collection of KVStores.setting- The field name of the setting description, e.g. dataset.sampleID- The field name of the ids for the different populations, e.g. algorithm.sampledValues- The field name of the values sampled from the populations, e.g. error rates.output- The name of the field where to store the result to.minimize- Whether minimum is better or not.
-
best
public static void best(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues, java.lang.String output)
For each setting this method finds the best mean value for settingsettingamong all thesampleIDsaveraging thesampledValues(minimization).- Parameters:
collection- The collection of KVStores.setting- The field name of the setting description, e.g. dataset.sampleID- The field name of the ids for the different populations, e.g. algorithm.sampledValues- The field name of the values sampled from the populations, e.g. error rates.output- The name of the field where to store the result to.
-
best
public static void best(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues, java.util.Set<java.lang.String> sampleIDsToConsider, java.lang.String output, boolean minimize)
For each setting this method finds the best mean value for settingsettingamong all thesampleIDsaveraging thesampledValues(minimization).- Parameters:
collection- The collection of KVStores.setting- The field name of the setting description, e.g. dataset.sampleID- The field name of the ids for the different populations, e.g. algorithm.sampledValues- The field name of the values sampled from the populations, e.g. error rates.sampleIDsToConsider- The set of sample IDs which are to be considered in the comparison.output- The name of the field where to store the result to.minimize- Whether minimum is better or not.
-
wilcoxonSignedRankTest
public static KVStoreCollection wilcoxonSignedRankTest(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleIDs, java.lang.String pairingIndex, java.lang.String sampledValues, java.lang.String nameOfTestPopulation, java.lang.String output)
Computes a 1-to-n Wilcoxon signed rank test to compare a single sample to each other sample of the collection. For the significance test a pair-wise signed rank test is used to test the hypothesis whether the two considered related samples stem from the same distribution (H_0).- Parameters:
collection- The collection of KVStores to carry out the wilcoxon signed rank test for.setting- The field name of the setting description for each of which the wilcoxon has to be computed, e.g. dataset.sampleIDs- The field name of the identifier of what is to be compared, e.g. the different approaches.pairingIndex- The field name of the index according to which samples are internally paired, e.g. seed for the random object.nam- )eOfTestPopulation The value of the targetOfComparison field that is to be used as the 1 sample which is compared to the n other samples.output- The field name where to put the results of the significance tests.
-
mannWhitneyU
public static void mannWhitneyU(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues, java.lang.String nameOfTestPopulation, java.lang.String output)
Computes a (pair-wise) 1-to-n MannWhitneyU statistic to compare a single sample from one population to each other sample of the other populations. For the significance test the MannWhitneyU test statistic is used to test the hypothesis whether the two considered related samples stem from the same distribution (H_0). As a result the tests KVStores have a sig-test result value in the output field. The output is to be interpreted as how the other population compares to the test population, if the value is superior the other population is significantly better than the tested population and vice versa.- Parameters:
collection- The collection of KVStores to carry out the MannWhitneyU test for.setting- The field name of the setting description for each of which the MannWhitneyU has to be computed, e.g. dataset.sampleIDs- The field name of the identifier of what is to be compared, e.g. the different approaches.pairingIndex- The field name of the index according to which samples are internally paired, e.g. seed for the random object.nameOfTestPopulation- The value of the targetOfComparison field that is to be used as the 1 sample which is compared to the n other samples.output- The field name where to put the results of the significance tests.
-
bestTTest
public static void bestTTest(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues, java.lang.String output)
Computes a t-test for each setting comparing the best population to the others.- Parameters:
collection- The collection of KVStores.setting- The field name of the setting description, e.g. dataset.sampleIDs- The field name of the ids for the different populations, e.g. algorithm.sampledValues- The field name of the values sampled from the populations, e.g. error rates.outputFieldName- The name of the field where to store the result to.
-
tTest
public static void tTest(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues, java.lang.String nameOfTestPopulation, java.lang.String output)
Carries out a t-test (which requires the tested populations to stem from a normal distribution) to make a pair-wise 1-to-n test.- Parameters:
collection- The collection of KVStores.setting- The field name of the setting description, e.g. dataset.sampleID- The field name of the ids for the different populations, e.g. algorithm.sampledValues- The field name of the values sampled from the populations, e.g. error rates.nameOfTestPopulation- The value of the targetOfComparison field that is to be used as the 1 sample which is compared to the n other samples.output- The name of the field where to store the result to.
-
bestFilter
public static void bestFilter(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues)
This method searches for the best performing KVStores and afterwards projects the collection to the subset of best KVStore per setting.- Parameters:
collection- The collection of KVStores.setting- The field name of the setting description, e.g. dataset.sampleID- The field name of the ids for the different populations, e.g. algorithm.sampledValues- The field name of the values sampled from the populations, e.g. error rates.
-
bestFilter
public static void bestFilter(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues, java.lang.String output)
This method searches for the best performing KVStores and afterwards projects the collection to the subset of best KVStore per setting.- Parameters:
collection- The collection of KVStores.setting- The field name of the setting description, e.g. dataset.sampleID- The field name of the ids for the different populations, e.g. algorithm.sampledValues- The field name of the values sampled from the populations, e.g. error rates.output- The name of the field where to store the result to.
-
averageRank
public static java.util.Map<java.lang.String,org.apache.commons.math3.stat.descriptive.DescriptiveStatistics> averageRank(KVStoreCollection groupedAll, java.lang.String sampleIDs, java.lang.String rank)
Computes a statistic of average rankings for sampleIDs.- Parameters:
groupedAll- The collection of KVStores to compute the average rank for the respective sampleIDs.sampleIDs- The name of the field distinguishing the different samples.rank- The name of the field containing the rank information.- Returns:
-
bestWilcoxonSignedRankTest
public static void bestWilcoxonSignedRankTest(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String pairingIndices, java.lang.String sampledValues, java.lang.String output)
-
-