Class SLINKHDBSCANLinearMemory<O>

  • All Implemented Interfaces:
    elki.Algorithm, HierarchicalClusteringAlgorithm

    @Reference(authors="R. J. G. B. Campello, D. Moulavi, J. Sander",
               title="Density-Based Clustering Based on Hierarchical Density Estimates",
               booktitle="Pacific-Asia Conf. Advances in Knowledge Discovery and Data Mining (PAKDD)",
               url="https://doi.org/10.1007/978-3-642-37456-2_14",
               bibkey="DBLP:conf/pakdd/CampelloMS13")
    public class SLINKHDBSCANLinearMemory<O>
    extends AbstractHDBSCAN<O>
    implements HierarchicalClusteringAlgorithm
    Linear memory implementation of HDBSCAN clustering based on SLINK.

    By not building a distance matrix, we can reduce memory usage to linear memory only; but at the cost of roughly double the runtime (unless using indexes) as we first need to compute all kNN distances (for core sizes), then recompute distances when building the spanning tree.

    This version uses the SLINK algorithm to directly produce the pointer representation expected by the extraction methods. The SLINK algorithm is closely related to Prim's minimum spanning tree, but produces the more compact pointer representation instead of an edges list.

    This implementation does not include the cluster extraction discussed as Step 4. This functionality should however already be provided by HDBSCANHierarchyExtraction. For this reason, we also do not include self-edges.

    Reference:

    R. J. G. B. Campello, D. Moulavi, J. Sander
    Density-Based Clustering Based on Hierarchical Density Estimates
    Pacific-Asia Conf. Advances in Knowledge Discovery and Data Mining (PAKDD)

    Since:
    0.7.0
    Author:
    Erich Schubert
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private static elki.logging.Logging LOG
      Class logger.
    • Constructor Summary

      Constructors 
      Constructor Description
      SLINKHDBSCANLinearMemory​(elki.distance.Distance<? super O> distance, int minPts)
      Constructor.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      elki.data.type.TypeInformation[] getInputTypeRestriction()  
      protected elki.logging.Logging getLogger()
      Get the (STATIC) logger for this class.
      ClusterMergeHistory run​(elki.database.relation.Relation<O> relation)
      Run the algorithm
      private void step2​(elki.database.ids.DBIDRef id, elki.database.ids.DBIDs processedIDs, elki.database.query.distance.DistanceQuery<? super O> distQuery, elki.database.datastore.DoubleDataStore coredists, elki.database.datastore.WritableDoubleDataStore m)
      Second step: Determine the pairwise distances from all objects in the pointer representation to the new object with the specified id.
      private void step3​(elki.database.ids.DBIDRef id, elki.database.datastore.WritableDBIDDataStore pi, elki.database.datastore.WritableDoubleDataStore lambda, elki.database.ids.DBIDs processedIDs, elki.database.datastore.WritableDoubleDataStore m)
      Third step: Determine the values for P and L
      private void step4​(elki.database.ids.DBIDRef id, elki.database.datastore.WritableDBIDDataStore pi, elki.database.datastore.WritableDoubleDataStore lambda, elki.database.ids.DBIDs processedIDs)
      Fourth step: Actualize the clusters if necessary
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • LOG

        private static final elki.logging.Logging LOG
        Class logger.
    • Constructor Detail

      • SLINKHDBSCANLinearMemory

        public SLINKHDBSCANLinearMemory​(elki.distance.Distance<? super O> distance,
                                        int minPts)
        Constructor.
        Parameters:
        distance - Distance function
        minPts - Minimum number of points for coredists
    • Method Detail

      • getInputTypeRestriction

        public elki.data.type.TypeInformation[] getInputTypeRestriction()
        Specified by:
        getInputTypeRestriction in interface elki.Algorithm
        Overrides:
        getInputTypeRestriction in class AbstractHDBSCAN<O>
      • run

        public ClusterMergeHistory run​(elki.database.relation.Relation<O> relation)
        Run the algorithm
        Parameters:
        relation - Relation
        Returns:
        Clustering hierarchy
      • step2

        private void step2​(elki.database.ids.DBIDRef id,
                           elki.database.ids.DBIDs processedIDs,
                           elki.database.query.distance.DistanceQuery<? super O> distQuery,
                           elki.database.datastore.DoubleDataStore coredists,
                           elki.database.datastore.WritableDoubleDataStore m)
        Second step: Determine the pairwise distances from all objects in the pointer representation to the new object with the specified id.
        Parameters:
        id - the id of the object to be inserted into the pointer representation
        processedIDs - the already processed ids
        distQuery - Distance query
        m - Data store
      • step3

        private void step3​(elki.database.ids.DBIDRef id,
                           elki.database.datastore.WritableDBIDDataStore pi,
                           elki.database.datastore.WritableDoubleDataStore lambda,
                           elki.database.ids.DBIDs processedIDs,
                           elki.database.datastore.WritableDoubleDataStore m)
        Third step: Determine the values for P and L
        Parameters:
        id - the id of the object to be inserted into the pointer representation
        pi - Pi data store
        lambda - Lambda data store
        processedIDs - the already processed ids
        m - Data store
      • step4

        private void step4​(elki.database.ids.DBIDRef id,
                           elki.database.datastore.WritableDBIDDataStore pi,
                           elki.database.datastore.WritableDoubleDataStore lambda,
                           elki.database.ids.DBIDs processedIDs)
        Fourth step: Actualize the clusters if necessary
        Parameters:
        id - the id of the current object
        pi - Pi data store
        lambda - Lambda data store
        processedIDs - the already processed ids
      • getLogger

        protected elki.logging.Logging getLogger()
        Description copied from class: AbstractHDBSCAN
        Get the (STATIC) logger for this class.
        Specified by:
        getLogger in class AbstractHDBSCAN<O>
        Returns:
        the static logger