Class LMCLUS

  • All Implemented Interfaces:
    elki.Algorithm, ClusteringAlgorithm<Clustering<Model>>

    @Reference(authors="R. Haralick, R. Harpaz",
               title="Linear manifold clustering in high dimensional spaces by stochastic search",
               booktitle="Pattern Recognition volume 40, Issue 10",
               url="https://doi.org/10.1016/j.patcog.2007.01.020",
               bibkey="DBLP:journals/pr/HaralickH07")
    public class LMCLUS
    extends java.lang.Object
    implements ClusteringAlgorithm<Clustering<Model>>
    Linear manifold clustering in high dimensional spaces by stochastic search.

    Reference:

    R. Haralick, R. Harpaz
    Linear manifold clustering in high dimensional spaces by stochastic search
    In: Pattern Recognition volume 40, Issue 10

    Implementation note: the LMCLUS algorithm seems to lack good stopping criterions. We can't entirely reproduce the good results from the original publication, in particular not on noisy data. But the questionable parts are as in the original publication, associated thesis and published source code. The minimum cluster size however can serve as a hidden stopping criterion.

    Since:
    0.5.0
    Author:
    Ernst Waas, Erich Schubert
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  LMCLUS.Par
      Parameterization class
      private static class  LMCLUS.Separation
      Class to represent a linear manifold separation
      • Nested classes/interfaces inherited from interface elki.Algorithm

        elki.Algorithm.Utils
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private static int BINS
      Histogram resolution
      private static elki.logging.Logging LOG
      The logger for this class.
      private static double LOG_NOT_FROM_ONE_CLUSTER_PROBABILITY
      Epsilon
      private int maxLMDim
      Maximum cluster dimensionality
      private int minsize
      Minimum cluster size
      private elki.utilities.random.RandomFactory rnd
      Random factory
      private int samplingLevel
      Number of sampling rounds to find a good split
      private double sensitivityThreshold
      The current threshold value calculated by the findSeperation Method.
    • Constructor Summary

      Constructors 
      Constructor Description
      LMCLUS​(int maxdim, int minsize, int samplingLevel, double sensitivityThreshold, elki.utilities.random.RandomFactory rnd)
      Constructor.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private double deviation​(double[] delta, double[][] beta)
      Deviation from a manifold described by beta.
      private double[] findAndEvaluateThreshold​(elki.utilities.datastructures.histogram.DoubleDynamicHistogram histogram)
      Evaluate the histogram to find a suitable threshold
      private LMCLUS.Separation findSeparation​(elki.database.relation.Relation<? extends elki.data.NumberVector> relation, elki.database.ids.DBIDs currentids, int dimension, java.util.Random r)
      This method samples a number of linear manifolds an tries to determine which the one with the best cluster is.
      private double[][] generateOrthonormalBasis​(java.util.List<double[]> vectors)
      This Method generates an orthonormal basis from a set of Vectors.
      elki.data.type.TypeInformation[] getInputTypeRestriction()  
      Clustering<Model> run​(elki.database.relation.Relation<? extends elki.data.NumberVector> relation)
      The main LMCLUS (Linear manifold clustering algorithm) is processed in this method.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • LOG

        private static final elki.logging.Logging LOG
        The logger for this class.
      • LOG_NOT_FROM_ONE_CLUSTER_PROBABILITY

        private static final double LOG_NOT_FROM_ONE_CLUSTER_PROBABILITY
        Epsilon
      • sensitivityThreshold

        private final double sensitivityThreshold
        The current threshold value calculated by the findSeperation Method.
      • maxLMDim

        private final int maxLMDim
        Maximum cluster dimensionality
      • minsize

        private final int minsize
        Minimum cluster size
      • samplingLevel

        private final int samplingLevel
        Number of sampling rounds to find a good split
      • rnd

        private final elki.utilities.random.RandomFactory rnd
        Random factory
    • Constructor Detail

      • LMCLUS

        public LMCLUS​(int maxdim,
                      int minsize,
                      int samplingLevel,
                      double sensitivityThreshold,
                      elki.utilities.random.RandomFactory rnd)
        Constructor.
        Parameters:
        maxdim - Maximum dimensionality
        minsize - Minimum cluster size
        samplingLevel - Sampling level
        sensitivityThreshold - Threshold
        rnd - Random factory
    • Method Detail

      • getInputTypeRestriction

        public elki.data.type.TypeInformation[] getInputTypeRestriction()
        Specified by:
        getInputTypeRestriction in interface elki.Algorithm
      • run

        public Clustering<Model> run​(elki.database.relation.Relation<? extends elki.data.NumberVector> relation)
        The main LMCLUS (Linear manifold clustering algorithm) is processed in this method.

        The algorithm samples random linear manifolds and tries to find clusters in it. It calculates a distance histogram searches for a threshold and partitions the points in two groups the ones in the cluster and everything else. Then the best fitting linear manifold is searched and registered as a cluster. The process is started over until all points are clustered. The last cluster should contain all the outliers (or the whole data if no clusters have been found.)

        Parameters:
        relation - Relation
        Returns:
        Clustering result
      • deviation

        private double deviation​(double[] delta,
                                 double[][] beta)
        Deviation from a manifold described by beta.
        Parameters:
        delta - Delta from origin vector
        beta - Manifold
        Returns:
        Deviation score
      • findSeparation

        private LMCLUS.Separation findSeparation​(elki.database.relation.Relation<? extends elki.data.NumberVector> relation,
                                                 elki.database.ids.DBIDs currentids,
                                                 int dimension,
                                                 java.util.Random r)
        This method samples a number of linear manifolds an tries to determine which the one with the best cluster is.
         A number of sample points according to the dimension of the linear manifold are taken.
         The basis (B) and the origin(o) of the manifold are calculated.
         A distance histogram using  the distance function ||x-o|| -||B^t*(x-o)|| is generated.
         The best threshold is searched using the elevate threshold function.
         The overall goodness of the threshold is determined.
         The process is redone until a specific number of samples is taken.
         
        Parameters:
        relation - The vector relation
        currentids - Current DBIDs
        dimension - the dimension of the linear manifold to sample.
        r - Random generator
        Returns:
        the overall goodness of the separation. The values origin basis and threshold are returned indirectly over class variables.
      • generateOrthonormalBasis

        private double[][] generateOrthonormalBasis​(java.util.List<double[]> vectors)
        This Method generates an orthonormal basis from a set of Vectors. It uses the established Gram-Schmidt algorithm for orthonormalisation:
         u_1 = v_1
         u_k = v_k -proj_u1(v_k)...proj_u(k-1)(v_k);
        
         Where proj_u(v) = <v,u>/<u,u> *u
         
        Parameters:
        vectors - The set of vectors to generate the orthonormal basis from
        Returns:
        the orthonormal basis generated by this method.
        Throws:
        java.lang.RuntimeException - if the given vectors are not linear independent.
      • findAndEvaluateThreshold

        private double[] findAndEvaluateThreshold​(elki.utilities.datastructures.histogram.DoubleDynamicHistogram histogram)
        Evaluate the histogram to find a suitable threshold
        Parameters:
        histogram - Histogram to evaluate
        Returns:
        Position and goodness