Class ORCLUS

  • All Implemented Interfaces:
    elki.Algorithm, ClusteringAlgorithm<Clustering<Model>>

    @Title("ORCLUS: Arbitrarily ORiented projected CLUSter generation")
    @Description("Algorithm to find correlation clusters in high dimensional spaces.")
    @Reference(authors="C. C. Aggarwal, P. S. Yu",
               title="Finding Generalized Projected Clusters in High Dimensional Spaces",
               booktitle="Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD \'00)",
               url="https://doi.org/10.1145/342009.335383",
               bibkey="DBLP:conf/sigmod/AggarwalY00")
    public class ORCLUS
    extends AbstractProjectedClustering<Clustering<Model>>
    ORCLUS: Arbitrarily ORiented projected CLUSter generation.

    Reference:

    C. C. Aggarwal, P. S. Yu
    Finding Generalized Projected Clusters in High Dimensional Spaces
    Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD '00).

    Since:
    0.1
    Author:
    Elke Achtert
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      private static class  ORCLUS.ORCLUSCluster
      Encapsulates the attributes of a cluster.
      static class  ORCLUS.Par
      Parameterization class.
      private static class  ORCLUS.ProjectedEnergy
      Encapsulates the projected energy for a cluster.
      • Nested classes/interfaces inherited from interface elki.Algorithm

        elki.Algorithm.Utils
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private double alpha
      Holds the value of ORCLUS.Par.ALPHA_ID.
      private static elki.logging.Logging LOG
      The logger for this class.
      private elki.math.linearalgebra.pca.PCARunner pca
      The PCA utility object.
      private elki.utilities.random.RandomFactory rnd
      Random generator
    • Constructor Summary

      Constructors 
      Constructor Description
      ORCLUS​(int k, int k_i, int l, double alpha, elki.utilities.random.RandomFactory rnd, elki.math.linearalgebra.pca.PCARunner pca)
      Java constructor.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private void assign​(elki.database.relation.Relation<? extends elki.data.NumberVector> database, java.util.List<ORCLUS.ORCLUSCluster> clusters)
      Creates a partitioning of the database by assigning each object to its closest seed.
      private double[][] findBasis​(elki.database.relation.Relation<? extends elki.data.NumberVector> database, ORCLUS.ORCLUSCluster cluster, int dim)
      Finds the basis of the subspace of dimensionality dim for the specified cluster.
      elki.data.type.TypeInformation[] getInputTypeRestriction()  
      private java.util.List<ORCLUS.ORCLUSCluster> initialSeeds​(elki.database.relation.Relation<? extends elki.data.NumberVector> database, int k)
      Initializes the list of seeds wit a random sample of size k.
      private void merge​(elki.database.relation.Relation<? extends elki.data.NumberVector> relation, java.util.List<ORCLUS.ORCLUSCluster> clusters, int k_new, int d_new, elki.logging.progress.IndefiniteProgress cprogress)
      Reduces the number of seeds to k_new
      private ORCLUS.ProjectedEnergy projectedEnergy​(elki.database.relation.Relation<? extends elki.data.NumberVector> relation, ORCLUS.ORCLUSCluster c_i, ORCLUS.ORCLUSCluster c_j, int i, int j, int dim)
      Computes the projected energy of the specified clusters.
      Clustering<Model> run​(elki.database.relation.Relation<? extends elki.data.NumberVector> relation)
      Performs the ORCLUS algorithm on the given database.
      private ORCLUS.ORCLUSCluster union​(elki.database.relation.Relation<? extends elki.data.NumberVector> relation, ORCLUS.ORCLUSCluster c1, ORCLUS.ORCLUSCluster c2, int dim)
      Returns the union of the two specified clusters.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • LOG

        private static final elki.logging.Logging LOG
        The logger for this class.
      • rnd

        private elki.utilities.random.RandomFactory rnd
        Random generator
      • pca

        private elki.math.linearalgebra.pca.PCARunner pca
        The PCA utility object.
    • Constructor Detail

      • ORCLUS

        public ORCLUS​(int k,
                      int k_i,
                      int l,
                      double alpha,
                      elki.utilities.random.RandomFactory rnd,
                      elki.math.linearalgebra.pca.PCARunner pca)
        Java constructor.
        Parameters:
        k - k Parameter
        k_i - k_i Parameter
        l - l Parameter
        alpha - Alpha Parameter
        rnd - Random generator
        pca - PCA runner
    • Method Detail

      • getInputTypeRestriction

        public elki.data.type.TypeInformation[] getInputTypeRestriction()
      • run

        public Clustering<Model> run​(elki.database.relation.Relation<? extends elki.data.NumberVector> relation)
        Performs the ORCLUS algorithm on the given database.
        Parameters:
        relation - Relation
      • initialSeeds

        private java.util.List<ORCLUS.ORCLUSCluster> initialSeeds​(elki.database.relation.Relation<? extends elki.data.NumberVector> database,
                                                                  int k)
        Initializes the list of seeds wit a random sample of size k.
        Parameters:
        database - the database holding the objects
        k - the size of the random sample
        Returns:
        the initial seed list
      • assign

        private void assign​(elki.database.relation.Relation<? extends elki.data.NumberVector> database,
                            java.util.List<ORCLUS.ORCLUSCluster> clusters)
        Creates a partitioning of the database by assigning each object to its closest seed.
        Parameters:
        database - the database holding the objects
        clusters - the array of clusters to which the objects should be assigned to
      • findBasis

        private double[][] findBasis​(elki.database.relation.Relation<? extends elki.data.NumberVector> database,
                                     ORCLUS.ORCLUSCluster cluster,
                                     int dim)
        Finds the basis of the subspace of dimensionality dim for the specified cluster.
        Parameters:
        database - the database to run the algorithm on
        cluster - the cluster
        dim - the dimensionality of the subspace
        Returns:
        matrix defining the basis of the subspace for the specified cluster
      • merge

        private void merge​(elki.database.relation.Relation<? extends elki.data.NumberVector> relation,
                           java.util.List<ORCLUS.ORCLUSCluster> clusters,
                           int k_new,
                           int d_new,
                           elki.logging.progress.IndefiniteProgress cprogress)
        Reduces the number of seeds to k_new
        Parameters:
        relation - the database holding the objects
        clusters - the set of current seeds
        k_new - the new number of seeds
        d_new - the new dimensionality of the subspaces for each seed
      • projectedEnergy

        private ORCLUS.ProjectedEnergy projectedEnergy​(elki.database.relation.Relation<? extends elki.data.NumberVector> relation,
                                                       ORCLUS.ORCLUSCluster c_i,
                                                       ORCLUS.ORCLUSCluster c_j,
                                                       int i,
                                                       int j,
                                                       int dim)
        Computes the projected energy of the specified clusters. The projected energy is given by the mean square distance of the points to the centroid of the union cluster c, when all points in c are projected to the subspace of c.
        Parameters:
        relation - the relation holding the objects
        c_i - the first cluster
        c_j - the second cluster
        i - the index of cluster c_i in the cluster list
        j - the index of cluster c_j in the cluster list
        dim - the dimensionality of the clusters
        Returns:
        the projected energy of the specified cluster
      • union

        private ORCLUS.ORCLUSCluster union​(elki.database.relation.Relation<? extends elki.data.NumberVector> relation,
                                           ORCLUS.ORCLUSCluster c1,
                                           ORCLUS.ORCLUSCluster c2,
                                           int dim)
        Returns the union of the two specified clusters.
        Parameters:
        relation - the database holding the objects
        c1 - the first cluster
        c2 - the second cluster
        dim - the dimensionality of the union cluster
        Returns:
        the union of the two specified clusters