Class KVStoreStatisticsUtil


  • public class KVStoreStatisticsUtil
    extends java.lang.Object
    This util may be used to compute some statistics and carrying out significance tests. In particular implementations for three different significance tests are provided: t-test - requirements: data distribution must follow a normal distribution and it must be sampled independently from the two populations. Wilcoxon signed-rank test - requirements: sample variables d_i = x_i,1 - x_i,2 have to be iid and symmetric. MannWhitneyU - requirements: all observations from both groups are independent of each other, responses are (at least) ordinal, i.e. one can say which one is better.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static java.util.Map<java.lang.String,​org.apache.commons.math3.stat.descriptive.DescriptiveStatistics> averageRank​(KVStoreCollection groupedAll, java.lang.String sampleIDs, java.lang.String rank)
      Computes a statistic of average rankings for sampleIDs.
      static void best​(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues)
      For each setting this method finds the best mean value for setting setting among all the sampleIDs averaging the sampledValues (minimization).
      static void best​(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues, java.lang.String output)
      For each setting this method finds the best mean value for setting setting among all the sampleIDs averaging the sampledValues (minimization).
      static void best​(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues, java.lang.String output, boolean minimize)
      For each setting this method finds the best mean value for setting setting among all the sampleIDs averaging the sampledValues (minimization).
      static void best​(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues, java.util.Set<java.lang.String> sampleIDsToConsider, java.lang.String output, boolean minimize)
      For each setting this method finds the best mean value for setting setting among all the sampleIDs averaging the sampledValues (minimization).
      static void bestFilter​(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues)
      This method searches for the best performing KVStores and afterwards projects the collection to the subset of best KVStore per setting.
      static void bestFilter​(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues, java.lang.String output)
      This method searches for the best performing KVStores and afterwards projects the collection to the subset of best KVStore per setting.
      static void bestTTest​(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues, java.lang.String output)
      Computes a t-test for each setting comparing the best population to the others.
      static void bestWilcoxonSignedRankTest​(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String pairingIndices, java.lang.String sampledValues, java.lang.String output)  
      static void mannWhitneyU​(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues, java.lang.String nameOfTestPopulation, java.lang.String output)
      Computes a (pair-wise) 1-to-n MannWhitneyU statistic to compare a single sample from one population to each other sample of the other populations.
      static void rank​(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues)  
      static void rank​(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues, java.lang.String output)  
      static void rank​(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues, java.lang.String output, boolean minimize)  
      static void tTest​(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleID, java.lang.String sampledValues, java.lang.String nameOfTestPopulation, java.lang.String output)
      Carries out a t-test (which requires the tested populations to stem from a normal distribution) to make a pair-wise 1-to-n test.
      static KVStoreCollection wilcoxonSignedRankTest​(KVStoreCollection collection, java.lang.String setting, java.lang.String sampleIDs, java.lang.String pairingIndex, java.lang.String sampledValues, java.lang.String nameOfTestPopulation, java.lang.String output)
      Computes a 1-to-n Wilcoxon signed rank test to compare a single sample to each other sample of the collection.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • best

        public static void best​(KVStoreCollection collection,
                                java.lang.String setting,
                                java.lang.String sampleID,
                                java.lang.String sampledValues)
        For each setting this method finds the best mean value for setting setting among all the sampleIDs averaging the sampledValues (minimization).
        Parameters:
        collection - The collection of KVStores.
        setting - The field name of the setting description, e.g. dataset.
        sampleID - The field name of the ids for the different populations, e.g. algorithm.
        sampledValues - The field name of the values sampled from the populations, e.g. error rates.
      • rank

        public static void rank​(KVStoreCollection collection,
                                java.lang.String setting,
                                java.lang.String sampleID,
                                java.lang.String sampledValues)
      • rank

        public static void rank​(KVStoreCollection collection,
                                java.lang.String setting,
                                java.lang.String sampleID,
                                java.lang.String sampledValues,
                                java.lang.String output)
      • rank

        public static void rank​(KVStoreCollection collection,
                                java.lang.String setting,
                                java.lang.String sampleID,
                                java.lang.String sampledValues,
                                java.lang.String output,
                                boolean minimize)
      • best

        public static void best​(KVStoreCollection collection,
                                java.lang.String setting,
                                java.lang.String sampleID,
                                java.lang.String sampledValues,
                                java.lang.String output,
                                boolean minimize)
        For each setting this method finds the best mean value for setting setting among all the sampleIDs averaging the sampledValues (minimization).
        Parameters:
        collection - The collection of KVStores.
        setting - The field name of the setting description, e.g. dataset.
        sampleID - The field name of the ids for the different populations, e.g. algorithm.
        sampledValues - The field name of the values sampled from the populations, e.g. error rates.
        output - The name of the field where to store the result to.
        minimize - Whether minimum is better or not.
      • best

        public static void best​(KVStoreCollection collection,
                                java.lang.String setting,
                                java.lang.String sampleID,
                                java.lang.String sampledValues,
                                java.lang.String output)
        For each setting this method finds the best mean value for setting setting among all the sampleIDs averaging the sampledValues (minimization).
        Parameters:
        collection - The collection of KVStores.
        setting - The field name of the setting description, e.g. dataset.
        sampleID - The field name of the ids for the different populations, e.g. algorithm.
        sampledValues - The field name of the values sampled from the populations, e.g. error rates.
        output - The name of the field where to store the result to.
      • best

        public static void best​(KVStoreCollection collection,
                                java.lang.String setting,
                                java.lang.String sampleID,
                                java.lang.String sampledValues,
                                java.util.Set<java.lang.String> sampleIDsToConsider,
                                java.lang.String output,
                                boolean minimize)
        For each setting this method finds the best mean value for setting setting among all the sampleIDs averaging the sampledValues (minimization).
        Parameters:
        collection - The collection of KVStores.
        setting - The field name of the setting description, e.g. dataset.
        sampleID - The field name of the ids for the different populations, e.g. algorithm.
        sampledValues - The field name of the values sampled from the populations, e.g. error rates.
        sampleIDsToConsider - The set of sample IDs which are to be considered in the comparison.
        output - The name of the field where to store the result to.
        minimize - Whether minimum is better or not.
      • wilcoxonSignedRankTest

        public static KVStoreCollection wilcoxonSignedRankTest​(KVStoreCollection collection,
                                                               java.lang.String setting,
                                                               java.lang.String sampleIDs,
                                                               java.lang.String pairingIndex,
                                                               java.lang.String sampledValues,
                                                               java.lang.String nameOfTestPopulation,
                                                               java.lang.String output)
        Computes a 1-to-n Wilcoxon signed rank test to compare a single sample to each other sample of the collection. For the significance test a pair-wise signed rank test is used to test the hypothesis whether the two considered related samples stem from the same distribution (H_0).
        Parameters:
        collection - The collection of KVStores to carry out the wilcoxon signed rank test for.
        setting - The field name of the setting description for each of which the wilcoxon has to be computed, e.g. dataset.
        sampleIDs - The field name of the identifier of what is to be compared, e.g. the different approaches.
        pairingIndex - The field name of the index according to which samples are internally paired, e.g. seed for the random object.
        nam - )eOfTestPopulation The value of the targetOfComparison field that is to be used as the 1 sample which is compared to the n other samples.
        output - The field name where to put the results of the significance tests.
      • mannWhitneyU

        public static void mannWhitneyU​(KVStoreCollection collection,
                                        java.lang.String setting,
                                        java.lang.String sampleID,
                                        java.lang.String sampledValues,
                                        java.lang.String nameOfTestPopulation,
                                        java.lang.String output)
        Computes a (pair-wise) 1-to-n MannWhitneyU statistic to compare a single sample from one population to each other sample of the other populations. For the significance test the MannWhitneyU test statistic is used to test the hypothesis whether the two considered related samples stem from the same distribution (H_0). As a result the tests KVStores have a sig-test result value in the output field. The output is to be interpreted as how the other population compares to the test population, if the value is superior the other population is significantly better than the tested population and vice versa.
        Parameters:
        collection - The collection of KVStores to carry out the MannWhitneyU test for.
        setting - The field name of the setting description for each of which the MannWhitneyU has to be computed, e.g. dataset.
        sampleIDs - The field name of the identifier of what is to be compared, e.g. the different approaches.
        pairingIndex - The field name of the index according to which samples are internally paired, e.g. seed for the random object.
        nameOfTestPopulation - The value of the targetOfComparison field that is to be used as the 1 sample which is compared to the n other samples.
        output - The field name where to put the results of the significance tests.
      • bestTTest

        public static void bestTTest​(KVStoreCollection collection,
                                     java.lang.String setting,
                                     java.lang.String sampleID,
                                     java.lang.String sampledValues,
                                     java.lang.String output)
        Computes a t-test for each setting comparing the best population to the others.
        Parameters:
        collection - The collection of KVStores.
        setting - The field name of the setting description, e.g. dataset.
        sampleIDs - The field name of the ids for the different populations, e.g. algorithm.
        sampledValues - The field name of the values sampled from the populations, e.g. error rates.
        outputFieldName - The name of the field where to store the result to.
      • tTest

        public static void tTest​(KVStoreCollection collection,
                                 java.lang.String setting,
                                 java.lang.String sampleID,
                                 java.lang.String sampledValues,
                                 java.lang.String nameOfTestPopulation,
                                 java.lang.String output)
        Carries out a t-test (which requires the tested populations to stem from a normal distribution) to make a pair-wise 1-to-n test.
        Parameters:
        collection - The collection of KVStores.
        setting - The field name of the setting description, e.g. dataset.
        sampleID - The field name of the ids for the different populations, e.g. algorithm.
        sampledValues - The field name of the values sampled from the populations, e.g. error rates.
        nameOfTestPopulation - The value of the targetOfComparison field that is to be used as the 1 sample which is compared to the n other samples.
        output - The name of the field where to store the result to.
      • bestFilter

        public static void bestFilter​(KVStoreCollection collection,
                                      java.lang.String setting,
                                      java.lang.String sampleID,
                                      java.lang.String sampledValues)
        This method searches for the best performing KVStores and afterwards projects the collection to the subset of best KVStore per setting.
        Parameters:
        collection - The collection of KVStores.
        setting - The field name of the setting description, e.g. dataset.
        sampleID - The field name of the ids for the different populations, e.g. algorithm.
        sampledValues - The field name of the values sampled from the populations, e.g. error rates.
      • bestFilter

        public static void bestFilter​(KVStoreCollection collection,
                                      java.lang.String setting,
                                      java.lang.String sampleID,
                                      java.lang.String sampledValues,
                                      java.lang.String output)
        This method searches for the best performing KVStores and afterwards projects the collection to the subset of best KVStore per setting.
        Parameters:
        collection - The collection of KVStores.
        setting - The field name of the setting description, e.g. dataset.
        sampleID - The field name of the ids for the different populations, e.g. algorithm.
        sampledValues - The field name of the values sampled from the populations, e.g. error rates.
        output - The name of the field where to store the result to.
      • averageRank

        public static java.util.Map<java.lang.String,​org.apache.commons.math3.stat.descriptive.DescriptiveStatistics> averageRank​(KVStoreCollection groupedAll,
                                                                                                                                        java.lang.String sampleIDs,
                                                                                                                                        java.lang.String rank)
        Computes a statistic of average rankings for sampleIDs.
        Parameters:
        groupedAll - The collection of KVStores to compute the average rank for the respective sampleIDs.
        sampleIDs - The name of the field distinguishing the different samples.
        rank - The name of the field containing the rank information.
        Returns:
      • bestWilcoxonSignedRankTest

        public static void bestWilcoxonSignedRankTest​(KVStoreCollection collection,
                                                      java.lang.String setting,
                                                      java.lang.String sampleID,
                                                      java.lang.String pairingIndices,
                                                      java.lang.String sampledValues,
                                                      java.lang.String output)