Class CanopyPreClustering<O>

  • Type Parameters:
    O - Object type
    All Implemented Interfaces:
    elki.Algorithm, ClusteringAlgorithm<Clustering<PrototypeModel<O>>>

    @Reference(authors="A. McCallum, K. Nigam, L. H. Ungar",
               title="Efficient Clustering of High Dimensional Data Sets with Application to Reference Matching",
               booktitle="Proc. 6th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining",
               url="https://doi.org/10.1145/347090.347123",
               bibkey="DBLP:conf/kdd/McCallumNU00")
    public class CanopyPreClustering<O>
    extends java.lang.Object
    implements ClusteringAlgorithm<Clustering<PrototypeModel<O>>>
    Canopy pre-clustering is a simple preprocessing step for clustering.

    Reference:

    A. McCallum, K. Nigam, L. H. Ungar
    Efficient Clustering of High Dimensional Data Sets with Application to Reference Matching
    Proc. 6th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining

    Since:
    0.6.0
    Author:
    Erich Schubert
    • Nested Class Summary

      • Nested classes/interfaces inherited from interface elki.Algorithm

        elki.Algorithm.Utils
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private elki.distance.Distance<? super O> distance
      Distance function used.
      private static elki.logging.Logging LOG
      Class logger.
      private double t1
      Threshold for inclusion
      private double t2
      Threshold for removal
    • Constructor Summary

      Constructors 
      Constructor Description
      CanopyPreClustering​(elki.distance.Distance<? super O> distance, double t1, double t2)
      Constructor.
    • Field Detail

      • LOG

        private static final elki.logging.Logging LOG
        Class logger.
      • distance

        private elki.distance.Distance<? super O> distance
        Distance function used.
      • t1

        private double t1
        Threshold for inclusion
      • t2

        private double t2
        Threshold for removal
    • Constructor Detail

      • CanopyPreClustering

        public CanopyPreClustering​(elki.distance.Distance<? super O> distance,
                                   double t1,
                                   double t2)
        Constructor.
        Parameters:
        distance - Distance function
        t1 - Inclusion threshold
        t2 - Exclusion threshold
    • Method Detail

      • run

        public Clustering<PrototypeModel<O>> run​(elki.database.relation.Relation<O> relation)
        Run the canopy clustering algorithm
        Parameters:
        relation - Relation to process
      • getInputTypeRestriction

        public elki.data.type.TypeInformation[] getInputTypeRestriction()
        Specified by:
        getInputTypeRestriction in interface elki.Algorithm