Class AbstractKMeansQualityMeasure<O extends elki.data.NumberVector>

  • All Implemented Interfaces:
    KMeansQualityMeasure<O>
    Direct Known Subclasses:
    AkaikeInformationCriterion, AkaikeInformationCriterionXMeans, BayesianInformationCriterion, BayesianInformationCriterionXMeans, BayesianInformationCriterionZhao, WithinClusterVariance

    public abstract class AbstractKMeansQualityMeasure<O extends elki.data.NumberVector>
    extends java.lang.Object
    implements KMeansQualityMeasure<O>
    Base class for evaluating clusterings by information criteria (such as AIC or BIC). Provides helper functions (e.g., max likelihood calculation) to its subclasses.

    References:

    The use of information-theoretic criteria for evaluating k-means was popularized by X-means (see BayesianInformationCriterionXMeans):

    D. Pelleg, A. Moore
    X-means: Extending K-means with Efficient Estimation on the Number of Clusters
    Proc. 17th Int. Conf. on Machine Learning (ICML 2000)

    A different version of logLikelihood is derived in (see BayesianInformationCriterionZhao):

    Q. Zhao, M. Xu, P. Fränti
    Knee Point Detection on Bayesian Information Criterion
    20th IEEE International Conference on Tools with Artificial Intelligence

    A longer derivation (but with a sign mistake) can be found in:

    A. Foglia, B. Hancock
    Notes on Bayesian Information Criterion Calculation for X-Means Clustering
    https://github.com/bobhancock/goxmeans/blob/master/doc/BIC_notes.pdf

    Since:
    0.7.0
    Author:
    Tibor Goldschwendt, Erich Schubert
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static double logLikelihood​(elki.database.relation.Relation<? extends elki.data.NumberVector> relation, Clustering<? extends MeanModel> clustering, elki.distance.NumberVectorDistance<?> distance)
      Computes log likelihood of an entire clustering.
      static int numberOfFreeParameters​(elki.database.relation.Relation<? extends elki.data.NumberVector> relation, Clustering<? extends MeanModel> clustering)
      Compute the number of free parameters.
      static int numPoints​(Clustering<? extends MeanModel> clustering)
      Compute the number of points in a given set of clusters (which may be less than the complete data set for X-means!)
      static double varianceContributionOfCluster​(Cluster<? extends MeanModel> cluster, elki.distance.NumberVectorDistance<?> distance, elki.database.relation.Relation<? extends elki.data.NumberVector> relation)
      Variance contribution of a single cluster.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • AbstractKMeansQualityMeasure

        public AbstractKMeansQualityMeasure()
    • Method Detail

      • numPoints

        public static int numPoints​(Clustering<? extends MeanModel> clustering)
        Compute the number of points in a given set of clusters (which may be less than the complete data set for X-means!)
        Parameters:
        clustering - Clustering to analyze
        Returns:
        Number of points
      • varianceContributionOfCluster

        public static double varianceContributionOfCluster​(Cluster<? extends MeanModel> cluster,
                                                           elki.distance.NumberVectorDistance<?> distance,
                                                           elki.database.relation.Relation<? extends elki.data.NumberVector> relation)
        Variance contribution of a single cluster.

        If possible, this information is reused from the clustering process (when a KMeansModel is returned).

        Parameters:
        cluster - Cluster to access
        distance - Distance function
        relation - Data relation
        Returns:
        Cluster variance
      • logLikelihood

        @Reference(authors="A. Foglia, B. Hancock",
                   title="Notes on Bayesian Information Criterion Calculation for X-Means Clustering",
                   booktitle="Online",
                   url="https://github.com/bobhancock/goxmeans/blob/master/doc/BIC_notes.pdf",
                   bibkey="web/FogliaH12")
        public static double logLikelihood​(elki.database.relation.Relation<? extends elki.data.NumberVector> relation,
                                           Clustering<? extends MeanModel> clustering,
                                           elki.distance.NumberVectorDistance<?> distance)
        Computes log likelihood of an entire clustering.

        A version that is supposed to correct some mistakes in the X-means publication, but experimentally they do not make much of a difference.

        Parameters:
        relation - Data relation
        clustering - Clustering
        distance - Distance function
        Returns:
        Log Likelihood.
      • numberOfFreeParameters

        public static int numberOfFreeParameters​(elki.database.relation.Relation<? extends elki.data.NumberVector> relation,
                                                 Clustering<? extends MeanModel> clustering)
        Compute the number of free parameters.
        Parameters:
        relation - Data relation (for dimensionality)
        clustering - Set of clusters
        Returns:
        Number of free parameters