Class BIRCHLloydKMeans

  • All Implemented Interfaces:
    elki.Algorithm, ClusteringAlgorithm<Clustering<MeanModel>>

    @Reference(authors="T. Zhang, R. Ramakrishnan, M. Livny",title="BIRCH: An Efficient Data Clustering Method for Very Large Databases",booktitle="Proc. 1996 ACM SIGMOD International Conference on Management of Data",url="https://doi.org/10.1145/233269.233324",bibkey="DBLP:conf/sigmod/ZhangRL96") @Reference(authors="T. Zhang, R. Ramakrishnan, M. Livny",title="BIRCH: A New Data Clustering Algorithm and Its Applications",booktitle="Data Min. Knowl. Discovery",url="https://doi.org/10.1023/A:1009783824328",bibkey="DBLP:journals/datamine/ZhangRL97")
    public class BIRCHLloydKMeans
    extends java.lang.Object
    implements ClusteringAlgorithm<Clustering<MeanModel>>
    BIRCH-based clustering algorithm that simply treats the leafs of the CFTree as clusters.

    References:

    T. Zhang, R. Ramakrishnan, M. Livny
    BIRCH: An Efficient Data Clustering Method for Very Large Databases Proc. 1996 ACM SIGMOD International Conference on Management of Data

    T. Zhang, R. Ramakrishnan, M. Livny
    BIRCH: A New Data Clustering Algorithm and Its Applications Data. Min. Knowl. Discovery

    Since:
    0.7.5
    Author:
    Erich Schubert
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  BIRCHLloydKMeans.Par
      Parameterization class.
      • Nested classes/interfaces inherited from interface elki.Algorithm

        elki.Algorithm.Utils
    • Field Summary

      Fields 
      Modifier and Type Field Description
      (package private) CFTree.Factory cffactory
      CFTree factory.
      (package private) BIRCHKMeansPlusPlus initialization
      k-means++ initialization
      (package private) int k
      Number of cluster centers to initialize.
      private static elki.logging.Logging LOG
      Class logger.
      (package private) int maxiter
      Maximum number of iterations.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private int assignToNearestCluster​(int[] assignment, double[][] means, double[][] cfmeans, ClusteringFeature[] cfs, int[] weights)
      Assign each element to nearest cluster.
      private double[] calculateVariances​(int[] assignment, double[][] means, ClusteringFeature[] cfs, int[] weights)
      Calculate variance of clusters based on clustering features.
      protected double distance​(double[] x, double[] y)
      Compute a distance (and count the distance computations).
      protected double distance​(elki.data.NumberVector x, double[] y)
      Compute a distance (and count the distance computations).
      elki.data.type.TypeInformation[] getInputTypeRestriction()  
      private double[][] kmeans​(double[][] cfmeans, ClusteringFeature[] cfs, int[] assignment, int[] weights)
      Perform k-means clustering.
      private double[][] means​(int[] assignment, double[][] means, ClusteringFeature[] cfs, int[] weights)
      Calculate means of clusters.
      Clustering<KMeansModel> run​(elki.database.relation.Relation<elki.data.NumberVector> relation)
      Run the clustering algorithm.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • LOG

        private static final elki.logging.Logging LOG
        Class logger.
      • k

        int k
        Number of cluster centers to initialize.
      • maxiter

        int maxiter
        Maximum number of iterations.
    • Constructor Detail

      • BIRCHLloydKMeans

        public BIRCHLloydKMeans​(CFTree.Factory cffactory,
                                int k,
                                int maxiter,
                                BIRCHKMeansPlusPlus initialization)
        Constructor.
        Parameters:
        cffactory - CFTree factory
        k - Number of clusters
        maxiter - Maximum number of iterations
        initialization - Initialization method for k-means
    • Method Detail

      • getInputTypeRestriction

        public elki.data.type.TypeInformation[] getInputTypeRestriction()
        Specified by:
        getInputTypeRestriction in interface elki.Algorithm
      • run

        public Clustering<KMeansModel> run​(elki.database.relation.Relation<elki.data.NumberVector> relation)
        Run the clustering algorithm.
        Parameters:
        relation - Input data
        Returns:
        Clustering
      • kmeans

        private double[][] kmeans​(double[][] cfmeans,
                                  ClusteringFeature[] cfs,
                                  int[] assignment,
                                  int[] weights)
        Perform k-means clustering.
        Parameters:
        cfmeans - Cluster feature means
        cfs - Cluster features
        assignment - Cluster assignment of each CF
        weights - Cluster weight output
        Returns:
        Cluster means
      • means

        private double[][] means​(int[] assignment,
                                 double[][] means,
                                 ClusteringFeature[] cfs,
                                 int[] weights)
        Calculate means of clusters.
        Parameters:
        assignment - Cluster assignment
        means - Means of clusters
        cfs - Clustering features
        Returns:
        Means of clusters.
      • assignToNearestCluster

        private int assignToNearestCluster​(int[] assignment,
                                           double[][] means,
                                           double[][] cfmeans,
                                           ClusteringFeature[] cfs,
                                           int[] weights)
        Assign each element to nearest cluster.
        Parameters:
        assignment - Current cluster assignment
        means - k-means cluster means
        cfmeans - Cluster feature means
        cfs - Cluster features
        weights - Cluster weights (output)
        Returns:
        Number of reassigned elements
      • distance

        protected double distance​(elki.data.NumberVector x,
                                  double[] y)
        Compute a distance (and count the distance computations).
        Parameters:
        x - First object
        y - Second object
        Returns:
        Distance
      • distance

        protected double distance​(double[] x,
                                  double[] y)
        Compute a distance (and count the distance computations).
        Parameters:
        x - First object
        y - Second object
        Returns:
        Distance
      • calculateVariances

        private double[] calculateVariances​(int[] assignment,
                                            double[][] means,
                                            ClusteringFeature[] cfs,
                                            int[] weights)
        Calculate variance of clusters based on clustering features.

        The result is only correct after updating the means!

        Parameters:
        assignment - Cluster assignment of CFs
        means - Cluster means
        cfs - CF leaves
        weights - Cluster weights
        Returns:
        Per-cluster variances