Class AbstractKMeans.Instance

    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected elki.database.datastore.WritableIntegerDataStore assignment
      A mapping of elements to cluster ids.
      protected java.util.List<elki.database.ids.ModifiableDBIDs> clusters
      Store the elements per cluster.
      private elki.distance.NumberVectorDistance<?> df
      Distance function.
      protected long diststat
      Number of distance computations
      protected boolean isSquared
      Indicates whether the distance function is squared.
      protected int k
      Number of clusters.
      protected java.lang.String key
      Key for statistics logging.
      protected double[][] means
      Cluster means.
      protected elki.database.relation.Relation<? extends elki.data.NumberVector> relation
      Data relation.
      protected double[] varsum
      Sum of squared deviations in each cluster.
    • Constructor Summary

      Constructors 
      Constructor Description
      Instance​(elki.database.relation.Relation<? extends elki.data.NumberVector> relation, elki.distance.NumberVectorDistance<?> df, double[][] means)
      Constructor.
    • Method Summary

      All Methods Instance Methods Abstract Methods Concrete Methods 
      Modifier and Type Method Description
      protected int assignToNearestCluster()
      Assign each object to the nearest cluster.
      Clustering<KMeansModel> buildResult()
      Build a standard k-means result, with known cluster variance sums.
      Clustering<KMeansModel> buildResult​(boolean varstat, elki.database.relation.Relation<? extends elki.data.NumberVector> relation)
      Build the result, recomputing the cluster variance if varstat is set to true.
      protected void computeSquaredSeparation​(double[][] cost)
      Initial separation of means.
      protected void copyMeans​(double[][] src, double[][] dst)
      Copy means
      protected double distance​(double[] x, double[] y)
      Compute the squared distance (and count the distance computations).
      protected double distance​(elki.data.NumberVector x, double[] y)
      Compute the squared distance (and count the distance computations).
      protected double distance​(elki.data.NumberVector x, elki.data.NumberVector y)
      Compute the squared distance (and count the distance computations).
      protected abstract elki.logging.Logging getLogger()
      Get the class logger.
      protected void initialSeperation​(double[][] cdist)
      Initial separation of means.
      protected abstract int iterate​(int iteration)
      Main loop function.
      protected void meansFromSums​(double[][] dst, double[][] sums, double[][] prev)
      Compute means from cluster sums by averaging.
      protected void movedDistance​(double[][] means, double[][] newmeans, double[] dists)
      Maximum distance moved.
      protected void recomputeSeperation​(double[] sep, double[][] cdist)
      Recompute the separation of cluster means.
      protected void recomputeVariance​(elki.database.relation.Relation<? extends elki.data.NumberVector> relation)
      Recompute the cluster variances.
      void run​(int maxiter)
      Run the clustering.
      protected double sqrtdistance​(double[] x, double[] y)
      Compute the distance (and count the distance computations).
      protected double sqrtdistance​(elki.data.NumberVector x, double[] y)
      Compute the distance (and count the distance computations).
      protected double sqrtdistance​(elki.data.NumberVector x, elki.data.NumberVector y)
      Compute the distance (and count the distance computations).
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • means

        protected double[][] means
        Cluster means.
      • clusters

        protected java.util.List<elki.database.ids.ModifiableDBIDs> clusters
        Store the elements per cluster.
      • assignment

        protected elki.database.datastore.WritableIntegerDataStore assignment
        A mapping of elements to cluster ids.
      • varsum

        protected double[] varsum
        Sum of squared deviations in each cluster.
      • relation

        protected elki.database.relation.Relation<? extends elki.data.NumberVector> relation
        Data relation.
      • diststat

        protected long diststat
        Number of distance computations
      • df

        private final elki.distance.NumberVectorDistance<?> df
        Distance function.
      • k

        protected final int k
        Number of clusters.
      • isSquared

        protected final boolean isSquared
        Indicates whether the distance function is squared.
      • key

        protected java.lang.String key
        Key for statistics logging.
    • Constructor Detail

      • Instance

        public Instance​(elki.database.relation.Relation<? extends elki.data.NumberVector> relation,
                        elki.distance.NumberVectorDistance<?> df,
                        double[][] means)
        Constructor.
        Parameters:
        relation - Relation to process
        means - Initial mean
    • Method Detail

      • distance

        protected double distance​(elki.data.NumberVector x,
                                  elki.data.NumberVector y)
        Compute the squared distance (and count the distance computations).
        Parameters:
        x - First object
        y - Second object
        Returns:
        Distance
      • distance

        protected double distance​(elki.data.NumberVector x,
                                  double[] y)
        Compute the squared distance (and count the distance computations).
        Parameters:
        x - First object
        y - Second object
        Returns:
        Distance
      • distance

        protected double distance​(double[] x,
                                  double[] y)
        Compute the squared distance (and count the distance computations).
        Parameters:
        x - First object
        y - Second object
        Returns:
        Distance
      • sqrtdistance

        protected double sqrtdistance​(elki.data.NumberVector x,
                                      elki.data.NumberVector y)
        Compute the distance (and count the distance computations). If the distance is squared, also compute the square root.
        Parameters:
        x - First object
        y - Second object
        Returns:
        Distance
      • sqrtdistance

        protected double sqrtdistance​(elki.data.NumberVector x,
                                      double[] y)
        Compute the distance (and count the distance computations). If the distance is squared, also compute the square root.
        Parameters:
        x - First object
        y - Second object
        Returns:
        Distance
      • sqrtdistance

        protected double sqrtdistance​(double[] x,
                                      double[] y)
        Compute the distance (and count the distance computations). If the distance is squared, also compute the square root.
        Parameters:
        x - First object
        y - Second object
        Returns:
        Distance
      • run

        public void run​(int maxiter)
        Run the clustering.
        Parameters:
        maxiter - Maximum number of iterations
      • iterate

        protected abstract int iterate​(int iteration)
        Main loop function.
        Parameters:
        iteration - Iteration number (beginning at 1)
        Returns:
        Number of reassigned points
      • meansFromSums

        protected void meansFromSums​(double[][] dst,
                                     double[][] sums,
                                     double[][] prev)
        Compute means from cluster sums by averaging.
        Parameters:
        dst - Output means
        sums - Input sums
        prev - Previous means, to handle empty clusters
      • copyMeans

        protected void copyMeans​(double[][] src,
                                 double[][] dst)
        Copy means
        Parameters:
        src - Source values
        dst - Destination values
      • assignToNearestCluster

        protected int assignToNearestCluster()
        Assign each object to the nearest cluster.
        Returns:
        number of objects reassigned
      • recomputeSeperation

        protected void recomputeSeperation​(double[] sep,
                                           double[][] cdist)
        Recompute the separation of cluster means.

        Used by Elkan's variant and Exponion.

        Parameters:
        sep - Output array of separation
        cdist - Center-to-Center distances (half-sqrt scaled)
      • initialSeperation

        protected void initialSeperation​(double[][] cdist)
        Initial separation of means. Used by Elkan, SimplifiedElkan.
        Parameters:
        cdist - Pairwise separation output (as sqrt/2)
      • computeSquaredSeparation

        protected void computeSquaredSeparation​(double[][] cost)
        Initial separation of means. Used by Hamerly, Exponion, and Annulus.
        Parameters:
        cost - Pairwise separation output (as squared/4)
      • movedDistance

        protected void movedDistance​(double[][] means,
                                     double[][] newmeans,
                                     double[] dists)
        Maximum distance moved.

        Used by Hamerly, Elkan, and derived classes.

        Parameters:
        means - Old means
        newmeans - New means
        dists - Distances moved (output)
      • buildResult

        public Clustering<KMeansModel> buildResult()
        Build a standard k-means result, with known cluster variance sums.

        Note: this expects the varsum field to be correct!

        Returns:
        Clustering result
      • buildResult

        public Clustering<KMeansModel> buildResult​(boolean varstat,
                                                   elki.database.relation.Relation<? extends elki.data.NumberVector> relation)
        Build the result, recomputing the cluster variance if varstat is set to true.
        Parameters:
        varstat - Recompute cluster variance
        relation - Data relation (only needed if varstat is set)
        Returns:
        Clustering result
      • recomputeVariance

        protected void recomputeVariance​(elki.database.relation.Relation<? extends elki.data.NumberVector> relation)
        Recompute the cluster variances.
        Parameters:
        relation - Data relation
      • getLogger

        protected abstract elki.logging.Logging getLogger()
        Get the class logger.
        Returns:
        Logger