Class AlternatingKMedoids<O>

  • Type Parameters:
    O - vector datatype
    All Implemented Interfaces:
    elki.Algorithm, ClusteringAlgorithm<Clustering<MedoidModel>>, KMedoidsClustering<O>

    @Priority(-100)
    @Reference(authors="F. E. Maranzana",title="On the location of supply points to minimize transport costs",booktitle="Journal of the Operational Research Society 15.3",url="https://doi.org/10.1057/jors.1964.47",bibkey="doi:10.1057/jors.1964.47") @Reference(authors="A. P. Reynolds, G. Richards, B. de la Iglesia, V. J. Rayward-Smith",title="Clustering Rules: A Comparison of Partitioning and Hierarchical Clustering Algorithms",booktitle="J. Math. Model. Algorithms 5(4)",url="https://doi.org/10.1007/s10852-005-9022-1",bibkey="DBLP:journals/jmma/ReynoldsRIR06") @Reference(authors="H.-S. Park, C.-H. Jun",title="A simple and fast algorithm for K-medoids clustering",booktitle="Expert Systems with Applications 36(2)",url="https://doi.org/10.1016/j.eswa.2008.01.039",bibkey="DBLP:journals/eswa/ParkJ09")
    public class AlternatingKMedoids<O>
    extends java.lang.Object
    implements KMedoidsClustering<O>
    A k-medoids clustering algorithm, implemented as EM-style batch algorithm; known in literature as the "alternate" method.

    In contrast to PAM, which will in each iteration update one medoid with one (arbitrary) non-medoid, this implementation follows the EM pattern. In the expectation step, the best medoid from the cluster members is chosen; in the M-step, the objects are reassigned to their nearest medoid.

    This implementation evolved naturally from EM and k-means algorithms. We then came across a similar approach was published by Park and Jun, and made this code match their suggestions (in particular, the default initialization is as proposed by them, despite its shortcomings). But similar ideas were already discussed by Reynolds et al. as a side note, and can be further traced back to Maranzana.

    In our experiments, it tends to be much faster than PAM, but also find less good solutions, as the medoids are only chosen from the cluster members. This aligns with findings of Reynolds et al. and can be explained with the requirement of the new medoid to cover the entire cluster. Similar observations were also made by Teitz and Bart, 1967, who described its performance as "erratic".

    Reference:

    F. E. Maranzana
    On the location of supply points to minimize transport costs
    Journal of the Operational Research Society 15.3

    A. P. Reynolds, G. Richards, B. de la Iglesia, V. J. Rayward-Smith
    Clustering Rules: A Comparison of Partitioning and Hierarchical Clustering Algorithms
    J. Math. Model. Algorithms 5(4)

    H.-S. Park, C.-H. Jun
    A simple and fast algorithm for K-medoids clustering
    Expert Systems with Applications 36(2)

    Since:
    0.5.0
    Author:
    Erich Schubert
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  AlternatingKMedoids.Par<V>
      Parameterization class.
      • Nested classes/interfaces inherited from interface elki.Algorithm

        elki.Algorithm.Utils
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected elki.distance.Distance<? super O> distance
      Distance function used.
      protected KMedoidsInitialization<O> initializer
      Method to choose initial means.
      protected int k
      Number of clusters to find.
      private static java.lang.String KEY
      Key for statistics logging.
      private static elki.logging.Logging LOG
      The logger for this class.
      protected int maxiter
      Maximum number of iterations.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      elki.data.type.TypeInformation[] getInputTypeRestriction()  
      Clustering<MedoidModel> run​(elki.database.relation.Relation<O> relation)
      Run k-medoids clustering.
      Clustering<MedoidModel> run​(elki.database.relation.Relation<O> relation, int k, elki.database.query.distance.DistanceQuery<? super O> distQ)
      Run k-medoids clustering with a given distance query.
      Not a very elegant API, but needed for some types of nested k-medoids.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • LOG

        private static final elki.logging.Logging LOG
        The logger for this class.
      • KEY

        private static final java.lang.String KEY
        Key for statistics logging.
      • distance

        protected elki.distance.Distance<? super O> distance
        Distance function used.
      • k

        protected int k
        Number of clusters to find.
      • maxiter

        protected int maxiter
        Maximum number of iterations.
    • Constructor Detail

      • AlternatingKMedoids

        public AlternatingKMedoids​(elki.distance.Distance<? super O> distance,
                                   int k,
                                   int maxiter,
                                   KMedoidsInitialization<O> initializer)
        Constructor.
        Parameters:
        distance - distance function
        k - k parameter
        maxiter - Maxiter parameter
        initializer - Function to generate the initial means
    • Method Detail

      • getInputTypeRestriction

        public elki.data.type.TypeInformation[] getInputTypeRestriction()
        Specified by:
        getInputTypeRestriction in interface elki.Algorithm
      • run

        public Clustering<MedoidModel> run​(elki.database.relation.Relation<O> relation,
                                           int k,
                                           elki.database.query.distance.DistanceQuery<? super O> distQ)
        Description copied from interface: KMedoidsClustering
        Run k-medoids clustering with a given distance query.
        Not a very elegant API, but needed for some types of nested k-medoids.
        Specified by:
        run in interface KMedoidsClustering<O>
        Parameters:
        relation - relation to use
        k - Number of clusters
        distQ - Distance query to use
        Returns:
        result