Class FastMSC<O>

  • Type Parameters:
    O -
    All Implemented Interfaces:
    elki.Algorithm, ClusteringAlgorithm<Clustering<MedoidModel>>, KMedoidsClustering<O>
    Direct Known Subclasses:
    FasterMSC

    @Reference(authors="Lars Lenssen and Erich Schubert",
               title="Clustering by Direct Optimization of the Medoid Silhouette",
               booktitle="Int. Conf. on Similarity Search and Applications, SISAP 2022",
               url="https://doi.org/10.1007/978-3-031-17849-8_15",
               bibkey="DBLP:conf/sisap/LenssenS22")
    public class FastMSC<O>
    extends PAMMEDSIL<O>
    Fast Medoid Silhouette Clustering.

    This clustering algorithm tries to find an optimal silhouette clustering for an approximation to the silhouette called "medoid silhouette" using a swap-based heuristic similar to PAM. By also caching the distance to the third nearest center (compare to FastPAM, which only used the second nearest), we are able to reduce the runtime per iteration to just O(n²), which yields an acceptable run time for many use cases, while often finding a solution with better silhouette than other clustering methods.

    Reference:

    Lars Lenssen and Erich Schubert
    Clustering by Direct Optimization of the Medoid Silhouette
    Int. Conf. on Similarity Search and Applications, SISAP 2022

    Author:
    Erich Schubert
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      protected class  FastMSC.Instance
      FastMSC clustering instance for a particular data set.
      protected class  FastMSC.Instance2
      Simplified FastMSC clustering instance for k=2.
      static class  FastMSC.Par<O>
      Parameterization class.
      protected static class  FastMSC.Record
      Data stored per point.
      • Nested classes/interfaces inherited from interface elki.Algorithm

        elki.Algorithm.Utils
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private static elki.logging.Logging LOG
      The logger for this class.
    • Constructor Summary

      Constructors 
      Constructor Description
      FastMSC​(elki.distance.Distance<? super O> distance, int k, int maxiter, KMedoidsInitialization<O> initializer)
      Constructor.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected elki.logging.Logging getLogger()
      Get the static class logger.
      protected static double loss​(double a, double b)
      Loss function used - here simply a/b, 0 if a=b=0.
      Clustering<MedoidModel> run​(elki.database.relation.Relation<O> relation, int k, elki.database.query.distance.DistanceQuery<? super O> distQ)
      Run k-medoids clustering with a given distance query.
      Not a very elegant API, but needed for some types of nested k-medoids.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • LOG

        private static final elki.logging.Logging LOG
        The logger for this class.
    • Constructor Detail

      • FastMSC

        public FastMSC​(elki.distance.Distance<? super O> distance,
                       int k,
                       int maxiter,
                       KMedoidsInitialization<O> initializer)
        Constructor.
        Parameters:
        distance - Distance function
        k - Number of cluster
        maxiter - Maximum number of iterations
        initializer - Initialization
    • Method Detail

      • run

        public Clustering<MedoidModel> run​(elki.database.relation.Relation<O> relation,
                                           int k,
                                           elki.database.query.distance.DistanceQuery<? super O> distQ)
        Description copied from interface: KMedoidsClustering
        Run k-medoids clustering with a given distance query.
        Not a very elegant API, but needed for some types of nested k-medoids.
        Specified by:
        run in interface KMedoidsClustering<O>
        Overrides:
        run in class PAMMEDSIL<O>
        Parameters:
        relation - relation to use
        k - Number of clusters
        distQ - Distance query to use
        Returns:
        result
      • loss

        protected static final double loss​(double a,
                                           double b)
        Loss function used - here simply a/b, 0 if a=b=0.
        Parameters:
        a - distance to nearest
        b - distance to second
        Returns:
        loss, a/b or 0.
      • getLogger

        protected elki.logging.Logging getLogger()
        Description copied from class: PAM
        Get the static class logger.
        Overrides:
        getLogger in class PAMMEDSIL<O>