Class HDBSCANLinearMemory<O>

  • Type Parameters:
    O - Object type
    All Implemented Interfaces:
    elki.Algorithm, HierarchicalClusteringAlgorithm

    @Title("HDBSCAN: Hierarchical Density-Based Spatial Clustering of Applications with Noise")
    @Description("Density-Based Clustering Based on Hierarchical Density Estimates")
    @Reference(authors="R. J. G. B. Campello, D. Moulavi, J. Sander",
               title="Density-Based Clustering Based on Hierarchical Density Estimates",
               booktitle="Pacific-Asia Conf. Advances in Knowledge Discovery and Data Mining (PAKDD)",
               url="https://doi.org/10.1007/978-3-642-37456-2_14",
               bibkey="DBLP:conf/pakdd/CampelloMS13")
    public class HDBSCANLinearMemory<O>
    extends AbstractHDBSCAN<O>
    implements HierarchicalClusteringAlgorithm
    Linear memory implementation of HDBSCAN clustering.

    By not building a distance matrix, we can reduce memory usage to linear memory only; but at the cost of roughly double the runtime (unless using indexes) as we first need to compute all kNN distances (for core sizes), then recompute distances when building the spanning tree.

    This implementation follows the HDBSCAN publication more closely than SLINKHDBSCANLinearMemory, by computing the minimum spanning tree using Prim's algorithm (instead of SLINK; although the two are remarkably similar). In order to produce the preferred internal format of hierarchical clusterings (the compact pointer representation introduced in SLINK) we have to perform a postprocessing conversion.

    This implementation does not include the cluster extraction discussed as Step 4, which is provided in a separate step. For this reason, we also do not include self-edges.

    Reference:

    R. J. G. B. Campello, D. Moulavi, J. Sander
    Density-Based Clustering Based on Hierarchical Density Estimates
    Pacific-Asia Conf. Advances in Knowledge Discovery and Data Mining (PAKDD)

    Since:
    0.7.0
    Author:
    Erich Schubert
    • Field Detail

      • LOG

        private static final elki.logging.Logging LOG
        Class logger.
    • Constructor Detail

      • HDBSCANLinearMemory

        public HDBSCANLinearMemory​(elki.distance.Distance<? super O> distance,
                                   int minPts)
        Constructor.
        Parameters:
        distance - Distance function
        minPts - Minimum number of points for coredists
    • Method Detail

      • getInputTypeRestriction

        public elki.data.type.TypeInformation[] getInputTypeRestriction()
        Specified by:
        getInputTypeRestriction in interface elki.Algorithm
        Overrides:
        getInputTypeRestriction in class AbstractHDBSCAN<O>
      • run

        public ClusterDensityMergeHistory run​(elki.database.relation.Relation<O> relation)
        Run the algorithm
        Parameters:
        relation - Relation
        Returns:
        Clustering hierarchy
      • getLogger

        protected elki.logging.Logging getLogger()
        Description copied from class: AbstractHDBSCAN
        Get the (STATIC) logger for this class.
        Specified by:
        getLogger in class AbstractHDBSCAN<O>
        Returns:
        the static logger