Class PROCLUS

  • All Implemented Interfaces:
    elki.Algorithm, ClusteringAlgorithm<Clustering<SubspaceModel>>, SubspaceClusteringAlgorithm<SubspaceModel>

    @Title("PROCLUS: PROjected CLUStering")
    @Description("Algorithm to find subspace clusters in high dimensional spaces.")
    @Reference(authors="C. C. Aggarwal, C. Procopiuc, J. L. Wolf, P. S. Yu, J. S. Park",
               title="Fast Algorithms for Projected Clustering",
               booktitle="Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD \'99)",
               url="https://doi.org/10.1145/304181.304188",
               bibkey="doi:10.1145/304181.304188")
    public class PROCLUS
    extends AbstractProjectedClustering<Clustering<SubspaceModel>>
    implements SubspaceClusteringAlgorithm<SubspaceModel>
    The PROCLUS algorithm, an algorithm to find subspace clusters in high dimensional spaces.

    Reference:

    C. C. Aggarwal, C. Procopiuc, J. L. Wolf, P. S. Yu, J. S. Park
    Fast Algorithms for Projected Clustering
    Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD '99).

    Since:
    0.1
    Author:
    Elke Achtert
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      private static class  PROCLUS.DoubleIntInt
      Simple triple.
      static class  PROCLUS.Par
      Parameterization class.
      private static class  PROCLUS.PROCLUSCluster
      Encapsulates the attributes of a cluster.
      • Nested classes/interfaces inherited from interface elki.Algorithm

        elki.Algorithm.Utils
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private static elki.logging.Logging LOG
      The logger for this class.
      private int m_i
      Multiplier for the initial number of medoids.
      private elki.utilities.random.RandomFactory rnd
      Random generator
    • Constructor Summary

      Constructors 
      Constructor Description
      PROCLUS​(int k, int k_i, int l, int m_i, elki.utilities.random.RandomFactory rnd)
      Java constructor.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private java.util.ArrayList<PROCLUS.PROCLUSCluster> assignPoints​(elki.database.ids.ArrayDBIDs m_current, long[][] dimensions, elki.database.relation.Relation<? extends elki.data.NumberVector> database)
      Assigns the objects to the clusters.
      private double avgDistance​(double[] centroid, elki.database.ids.DBIDs objectIDs, elki.database.relation.Relation<? extends elki.data.NumberVector> database, int dimension)
      Computes the average distance of the objects to the centroid along the specified dimension.
      private elki.database.ids.DBIDs computeBadMedoids​(elki.database.ids.ArrayDBIDs m_current, java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters, int threshold)
      Computes the bad medoids, where the medoid of a cluster with less than the specified threshold of objects is bad.
      private long[][] computeDimensionMap​(java.util.List<PROCLUS.DoubleIntInt> z_ijs, int dim, int numc)
      Compute the dimension map.
      private elki.database.ids.ArrayDBIDs computeM_current​(elki.database.ids.DBIDs m, elki.database.ids.DBIDs m_best, elki.database.ids.DBIDs m_bad, java.util.Random random)
      Computes the set of medoids in current iteration.
      private java.util.List<PROCLUS.DoubleIntInt> computeZijs​(double[][] averageDistances, int dim)
      Compute the z_ij values.
      private double evaluateClusters​(java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters, long[][] dimensions, elki.database.relation.Relation<? extends elki.data.NumberVector> database)
      Evaluates the quality of the clusters.
      private java.util.List<PROCLUS.PROCLUSCluster> finalAssignment​(java.util.List<elki.utilities.pairs.Pair<double[],​long[]>> dimensions, elki.database.relation.Relation<? extends elki.data.NumberVector> database)
      Refinement step to assign the objects to the final clusters.
      private long[][] findDimensions​(elki.database.ids.ArrayDBIDs medoids, elki.database.relation.Relation<? extends elki.data.NumberVector> relation, elki.database.query.distance.DistanceQuery<? extends elki.data.NumberVector> distance, elki.database.query.range.RangeSearcher<elki.database.ids.DBIDRef> rangeQuery)
      Determines the set of correlated dimensions for each medoid in the specified medoid set.
      private java.util.List<elki.utilities.pairs.Pair<double[],​long[]>> findDimensions​(java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters, elki.database.relation.Relation<? extends elki.data.NumberVector> database)
      Refinement step that determines the set of correlated dimensions for each cluster centroid.
      elki.data.type.TypeInformation[] getInputTypeRestriction()  
      private elki.database.datastore.DataStore<elki.database.ids.DBIDs> getLocalities​(elki.database.ids.DBIDs medoids, elki.database.query.distance.DistanceQuery<? extends elki.data.NumberVector> distance, elki.database.query.range.RangeSearcher<elki.database.ids.DBIDRef> rangeQuery)
      Computes the localities of the specified medoids: for each medoid m the objects in the sphere centered at m with radius minDist are determined, where minDist is the minimum distance between medoid m and any other medoid m_i.
      private elki.database.ids.ArrayDBIDs greedy​(elki.database.query.distance.DistanceQuery<? extends elki.data.NumberVector> distance, elki.database.ids.DBIDs sampleSet, int m, java.util.Random random)
      Returns a piercing set of k medoids from the specified sample set.
      private elki.database.ids.ArrayDBIDs initialSet​(elki.database.ids.DBIDs sampleSet, int k, java.util.Random random)
      Returns a set of k elements from the specified sample set.
      private double manhattanSegmentalDistance​(elki.data.NumberVector o1, double[] o2, long[] dimensions)
      Returns the Manhattan segmental distance between o1 and o2 relative to the specified dimensions.
      private double manhattanSegmentalDistance​(elki.data.NumberVector o1, elki.data.NumberVector o2, long[] dimensions)
      Returns the Manhattan segmental distance between o1 and o2 relative to the specified dimensions.
      <V extends elki.data.NumberVector>
      Clustering<SubspaceModel>
      run​(elki.database.relation.Relation<V> relation)
      Performs the PROCLUS algorithm on the given database.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • LOG

        private static final elki.logging.Logging LOG
        The logger for this class.
      • m_i

        private int m_i
        Multiplier for the initial number of medoids.
      • rnd

        private elki.utilities.random.RandomFactory rnd
        Random generator
    • Constructor Detail

      • PROCLUS

        public PROCLUS​(int k,
                       int k_i,
                       int l,
                       int m_i,
                       elki.utilities.random.RandomFactory rnd)
        Java constructor.
        Parameters:
        k - k Parameter
        k_i - k_i Parameter
        l - l Parameter
        m_i - m_i Parameter
        rnd - Random generator
    • Method Detail

      • getInputTypeRestriction

        public elki.data.type.TypeInformation[] getInputTypeRestriction()
        Specified by:
        getInputTypeRestriction in interface elki.Algorithm
      • run

        public <V extends elki.data.NumberVector> Clustering<SubspaceModel> run​(elki.database.relation.Relation<V> relation)
        Performs the PROCLUS algorithm on the given database.
        Parameters:
        relation - Relation to process
      • greedy

        private elki.database.ids.ArrayDBIDs greedy​(elki.database.query.distance.DistanceQuery<? extends elki.data.NumberVector> distance,
                                                    elki.database.ids.DBIDs sampleSet,
                                                    int m,
                                                    java.util.Random random)
        Returns a piercing set of k medoids from the specified sample set.
        Parameters:
        distance - the distance function
        sampleSet - the sample set
        m - the number of medoids to be returned
        random - random number generator
        Returns:
        a piercing set of m medoids from the specified sample set
      • initialSet

        private elki.database.ids.ArrayDBIDs initialSet​(elki.database.ids.DBIDs sampleSet,
                                                        int k,
                                                        java.util.Random random)
        Returns a set of k elements from the specified sample set.
        Parameters:
        sampleSet - the sample set
        k - the number of samples to be returned
        random - random number generator
        Returns:
        a set of k elements from the specified sample set
      • computeM_current

        private elki.database.ids.ArrayDBIDs computeM_current​(elki.database.ids.DBIDs m,
                                                              elki.database.ids.DBIDs m_best,
                                                              elki.database.ids.DBIDs m_bad,
                                                              java.util.Random random)
        Computes the set of medoids in current iteration.
        Parameters:
        m - the medoids
        m_best - the best set of medoids found so far
        m_bad - the bad medoids
        random - random number generator
        Returns:
        m_current, the set of medoids in current iteration
      • getLocalities

        private elki.database.datastore.DataStore<elki.database.ids.DBIDs> getLocalities​(elki.database.ids.DBIDs medoids,
                                                                                         elki.database.query.distance.DistanceQuery<? extends elki.data.NumberVector> distance,
                                                                                         elki.database.query.range.RangeSearcher<elki.database.ids.DBIDRef> rangeQuery)
        Computes the localities of the specified medoids: for each medoid m the objects in the sphere centered at m with radius minDist are determined, where minDist is the minimum distance between medoid m and any other medoid m_i.
        Parameters:
        medoids - the ids of the medoids
        distance - the distance function
        Returns:
        a mapping of the medoid's id to its locality
      • findDimensions

        private long[][] findDimensions​(elki.database.ids.ArrayDBIDs medoids,
                                        elki.database.relation.Relation<? extends elki.data.NumberVector> relation,
                                        elki.database.query.distance.DistanceQuery<? extends elki.data.NumberVector> distance,
                                        elki.database.query.range.RangeSearcher<elki.database.ids.DBIDRef> rangeQuery)
        Determines the set of correlated dimensions for each medoid in the specified medoid set.
        Parameters:
        medoids - the set of medoids
        relation - the relation containing the objects
        distance - the distance function
        Returns:
        the set of correlated dimensions for each medoid in the specified medoid set
      • findDimensions

        private java.util.List<elki.utilities.pairs.Pair<double[],​long[]>> findDimensions​(java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters,
                                                                                                elki.database.relation.Relation<? extends elki.data.NumberVector> database)
        Refinement step that determines the set of correlated dimensions for each cluster centroid.
        Parameters:
        clusters - the list of clusters
        database - the database containing the objects
        Returns:
        the set of correlated dimensions for each specified cluster centroid
      • computeZijs

        private java.util.List<PROCLUS.DoubleIntInt> computeZijs​(double[][] averageDistances,
                                                                 int dim)
        Compute the z_ij values.
        Parameters:
        averageDistances - Average distances
        dim - Dimensions
        Returns:
        z_ij values
      • computeDimensionMap

        private long[][] computeDimensionMap​(java.util.List<PROCLUS.DoubleIntInt> z_ijs,
                                             int dim,
                                             int numc)
        Compute the dimension map.
        Parameters:
        z_ijs - z_ij values
        dim - Number of dimensions
        numc - Number of clusters
        Returns:
        Bitmap of dimensions used
      • assignPoints

        private java.util.ArrayList<PROCLUS.PROCLUSCluster> assignPoints​(elki.database.ids.ArrayDBIDs m_current,
                                                                         long[][] dimensions,
                                                                         elki.database.relation.Relation<? extends elki.data.NumberVector> database)
        Assigns the objects to the clusters.
        Parameters:
        m_current - Current centers
        dimensions - set of correlated dimensions for each medoid of the cluster
        database - the database containing the objects
        Returns:
        the assignments of the object to the clusters
      • finalAssignment

        private java.util.List<PROCLUS.PROCLUSCluster> finalAssignment​(java.util.List<elki.utilities.pairs.Pair<double[],​long[]>> dimensions,
                                                                       elki.database.relation.Relation<? extends elki.data.NumberVector> database)
        Refinement step to assign the objects to the final clusters.
        Parameters:
        dimensions - pair containing the centroid and the set of correlated dimensions for the centroid
        database - the database containing the objects
        Returns:
        the assignments of the object to the clusters
      • manhattanSegmentalDistance

        private double manhattanSegmentalDistance​(elki.data.NumberVector o1,
                                                  elki.data.NumberVector o2,
                                                  long[] dimensions)
        Returns the Manhattan segmental distance between o1 and o2 relative to the specified dimensions.
        Parameters:
        o1 - the first object
        o2 - the second object
        dimensions - the dimensions to be considered
        Returns:
        the Manhattan segmental distance between o1 and o2 relative to the specified dimensions
      • manhattanSegmentalDistance

        private double manhattanSegmentalDistance​(elki.data.NumberVector o1,
                                                  double[] o2,
                                                  long[] dimensions)
        Returns the Manhattan segmental distance between o1 and o2 relative to the specified dimensions.
        Parameters:
        o1 - the first object
        o2 - the second object
        dimensions - the dimensions to be considered
        Returns:
        the Manhattan segmental distance between o1 and o2 relative to the specified dimensions
      • evaluateClusters

        private double evaluateClusters​(java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters,
                                        long[][] dimensions,
                                        elki.database.relation.Relation<? extends elki.data.NumberVector> database)
        Evaluates the quality of the clusters.
        Parameters:
        clusters - the clusters to be evaluated
        dimensions - the dimensions associated with each cluster
        database - the database holding the objects
        Returns:
        a measure for the cluster quality
      • avgDistance

        private double avgDistance​(double[] centroid,
                                   elki.database.ids.DBIDs objectIDs,
                                   elki.database.relation.Relation<? extends elki.data.NumberVector> database,
                                   int dimension)
        Computes the average distance of the objects to the centroid along the specified dimension.
        Parameters:
        centroid - the centroid
        objectIDs - the set of objects ids
        database - the database holding the objects
        dimension - the dimension for which the average distance is computed
        Returns:
        the average distance of the objects to the centroid along the specified dimension
      • computeBadMedoids

        private elki.database.ids.DBIDs computeBadMedoids​(elki.database.ids.ArrayDBIDs m_current,
                                                          java.util.ArrayList<PROCLUS.PROCLUSCluster> clusters,
                                                          int threshold)
        Computes the bad medoids, where the medoid of a cluster with less than the specified threshold of objects is bad.
        Parameters:
        m_current - Current medoids
        clusters - the clusters
        threshold - the threshold
        Returns:
        the bad medoids