Class SUBCLU<V extends elki.data.NumberVector>

  • Type Parameters:
    V - the type of NumberVector handled by this algorithm
    All Implemented Interfaces:
    elki.Algorithm, ClusteringAlgorithm<Clustering<SubspaceModel>>, SubspaceClusteringAlgorithm<SubspaceModel>

    @Title("SUBCLU: Density connected Subspace Clustering")
    @Description("Algorithm to detect arbitrarily shaped and positioned clusters in subspaces. SUBCLU delivers for each subspace the same clusters DBSCAN would have found, when applied to this subspace seperately.")
    @Reference(authors="Karin Kailing, Hans-Peter Kriegel, Peer Kr\u00f6ger",
               title="Density Connected Subspace Clustering for High Dimensional Data",
               booktitle="Proc. SIAM Int. Conf. on Data Mining (SDM\'04)",
               url="https://doi.org/10.1137/1.9781611972740.23",
               bibkey="DBLP:conf/sdm/KroegerKK04")
    public class SUBCLU<V extends elki.data.NumberVector>
    extends java.lang.Object
    implements SubspaceClusteringAlgorithm<SubspaceModel>
    Implementation of the SUBCLU algorithm, an algorithm to detect arbitrarily shaped and positioned clusters in subspaces. SUBCLU delivers for each subspace the same clusters DBSCAN would have found, when applied to this subspace separately.

    Note: this implementation does not yet implemented the suggested indexing based on inverted files, and that we also only query subsets of them, this makes the implementation as a reusable index challenging.

    The original paper is not very clear on which clusters to return, as any subspace cluster must be part of a lower-dimensional projected cluster, so these results would be highly redundant. In this implementation, we only include points in clusters that are not already part of sub-clusters (note that this does not remove overlap of independent subspaces).

    Reference:

    Karin Kailing, Hans-Peter Kriegel, Peer Kröger
    Density Connected Subspace Clustering for High Dimensional Data
    Proc. SIAM Int. Conf. on Data Mining (SDM'04)

    Since:
    0.3
    Author:
    Elke Achtert
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  SUBCLU.Par<V extends elki.data.NumberVector>
      Parameterization class.
      • Nested classes/interfaces inherited from interface elki.Algorithm

        elki.Algorithm.Utils
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected elki.distance.subspace.DimensionSelectingSubspaceDistance<V> distance
      The distance function to determine the distance between objects.
      protected double epsilon
      Maximum radius of the neighborhood to be considered.
      private static elki.logging.Logging LOG
      The logger for this class.
      protected int mindim
      Minimum dimensionality.
      protected int minpts
      Minimum number of points.
    • Constructor Summary

      Constructors 
      Constructor Description
      SUBCLU​(elki.distance.subspace.DimensionSelectingSubspaceDistance<V> distance, double epsilon, int minpts, int mindim)
      Constructor.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private Subspace bestSubspace​(java.util.List<Subspace> subspaces, Subspace candidate, java.util.TreeMap<Subspace,​java.util.List<Cluster<Model>>> clusterMap)
      Determines the d-dimensional subspace of the (d+1) -dimensional candidate with minimal number of objects in the cluster.
      private boolean checkLower​(Subspace candidate, java.util.List<Subspace> subspaces)
      Perform Apriori-style pruning.
      private java.util.List<Subspace> generateSubspaceCandidates​(java.util.List<Subspace> subspaces)
      Generates d+1-dimensional subspace candidates from the specified d-dimensional subspaces.
      elki.data.type.TypeInformation[] getInputTypeRestriction()  
      Clustering<SubspaceModel> run​(elki.database.relation.Relation<V> relation)
      Performs the SUBCLU algorithm on the given database.
      private java.util.List<Cluster<Model>> runDBSCAN​(elki.database.relation.Relation<V> relation, elki.database.ids.DBIDs ids, Subspace subspace)
      Runs the DBSCAN algorithm on the specified partition of the database in the given subspace.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • LOG

        private static final elki.logging.Logging LOG
        The logger for this class.
      • distance

        protected elki.distance.subspace.DimensionSelectingSubspaceDistance<V extends elki.data.NumberVector> distance
        The distance function to determine the distance between objects.
      • epsilon

        protected double epsilon
        Maximum radius of the neighborhood to be considered.
      • minpts

        protected int minpts
        Minimum number of points.
      • mindim

        protected int mindim
        Minimum dimensionality.
    • Constructor Detail

      • SUBCLU

        public SUBCLU​(elki.distance.subspace.DimensionSelectingSubspaceDistance<V> distance,
                      double epsilon,
                      int minpts,
                      int mindim)
        Constructor.
        Parameters:
        distance - Distance function
        epsilon - Epsilon value
        minpts - Minpts value
        mindim - Minimum dimensionality
    • Method Detail

      • getInputTypeRestriction

        public elki.data.type.TypeInformation[] getInputTypeRestriction()
        Specified by:
        getInputTypeRestriction in interface elki.Algorithm
      • run

        public Clustering<SubspaceModel> run​(elki.database.relation.Relation<V> relation)
        Performs the SUBCLU algorithm on the given database.
        Parameters:
        relation - Relation to process
        Returns:
        Clustering result
      • runDBSCAN

        private java.util.List<Cluster<Model>> runDBSCAN​(elki.database.relation.Relation<V> relation,
                                                         elki.database.ids.DBIDs ids,
                                                         Subspace subspace)
        Runs the DBSCAN algorithm on the specified partition of the database in the given subspace. If parameter ids is null DBSCAN will be applied to the whole database.
        Parameters:
        relation - the database holding the objects to run DBSCAN on
        ids - the IDs of the database defining the partition to run DBSCAN on - if this parameter is null DBSCAN will be applied to the whole database
        subspace - the subspace to run DBSCAN on
        Returns:
        the clustering result of the DBSCAN run
      • generateSubspaceCandidates

        private java.util.List<Subspace> generateSubspaceCandidates​(java.util.List<Subspace> subspaces)
        Generates d+1-dimensional subspace candidates from the specified d-dimensional subspaces.
        Parameters:
        subspaces - the d-dimensional subspaces
        Returns:
        the d+1-dimensional subspace candidates
      • checkLower

        private boolean checkLower​(Subspace candidate,
                                   java.util.List<Subspace> subspaces)
        Perform Apriori-style pruning.
        Parameters:
        candidate - Current candidate
        subspaces - Subspaces
        Returns:
        true if all lower-dimensional subspaces exist
      • bestSubspace

        private Subspace bestSubspace​(java.util.List<Subspace> subspaces,
                                      Subspace candidate,
                                      java.util.TreeMap<Subspace,​java.util.List<Cluster<Model>>> clusterMap)
        Determines the d-dimensional subspace of the (d+1) -dimensional candidate with minimal number of objects in the cluster.
        Parameters:
        subspaces - the list of d-dimensional subspaces containing clusters
        candidate - the (d+1)-dimensional candidate subspace
        clusterMap - the mapping of subspaces to clusters
        Returns:
        the d-dimensional subspace of the (d+1) -dimensional candidate with minimal number of objects in the cluster