Class KDTreeEM

  • All Implemented Interfaces:
    elki.Algorithm, ClusteringAlgorithm<Clustering<EMModel>>

    @Description("Gaussian mixture modeling accelerated using a kd-tree")
    @Reference(authors="Andrew W. Moore",
               booktitle="Advances in Neural Information Processing Systems 11 (NIPS 1998)",
               title="Very Fast EM-based Mixture Model Clustering using Multiresolution kd-trees",
               bibkey="DBLP:conf/nips/Moore98")
    public class KDTreeEM
    extends java.lang.Object
    implements ClusteringAlgorithm<Clustering<EMModel>>
    Clustering by expectation maximization (EM-Algorithm), also known as Gaussian Mixture Modeling (GMM), calculated on a kd-tree. If supported, tries to prune during calculation.

    Reference:

    A. W. Moore:
    Very Fast EM-based Mixture Model Clustering using Multiresolution kd-trees.
    Neural Information Processing Systems (NIPS 1998)

    Since:
    0.8.0
    Author:
    Robert Gehde
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      (package private) static class  KDTreeEM.KDTree
      KDTree class with the statistics needed for EM clustering.
      static class  KDTreeEM.Par
      Parameterization class.
      • Nested classes/interfaces inherited from interface elki.Algorithm

        elki.Algorithm.Utils
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private double delta
      Delta parameter
      protected boolean exactAssign
      Perform exact cluster assignments
      private double ipiPow
      Gaussian scaling factor for likelihood.
      private int k
      number of models
      private static elki.logging.Logging LOG
      Logging object
      private int maxiter
      maximum amount of iterations
      private double mbw
      minimum leaf size
      private TextbookMultivariateGaussianModelFactory mfactory
      Factory for producing the initial cluster model.
      private int miniter
      minimum amount of iterations
      private java.util.List<TextbookMultivariateGaussianModel> models
      Current clusters.
      private java.util.List<TextbookMultivariateGaussianModel> newmodels
      Models for next iteration.
      private boolean soft
      Retain soft assignments.
      static elki.data.type.SimpleTypeInformation<double[]> SOFT_TYPE
      Soft assignment result type.
      private elki.math.linearalgebra.ConstrainedQuadraticProblemSolver solver
      Solver for quadratic problems
      protected elki.database.ids.ArrayModifiableDBIDs sorted
      kd-tree object order
      private double tau
      tau, low for precise, high for fast results.
      private double tauClass
      Drop one class if the maximum weight of a class in the bounding box is lower than tauClass * wmin_max, where wmin_max is the maximum minimum weight of all classes
      private double[] wsum
      Cluster weights
    • Constructor Summary

      Constructors 
      Constructor Description
      KDTreeEM​(int k, double mbw, double tau, double tauclass, double delta, TextbookMultivariateGaussianModelFactory mfactory, int miniter, int maxiter, boolean soft, boolean exactAssign)
      Constructor.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private double[] analyseDimWidth​(elki.database.relation.Relation<? extends elki.data.NumberVector> relation)
      Helper method to retrieve the widths of all data in all dimensions.
      private void calculateModelLimits​(KDTreeEM.KDTree node, TextbookMultivariateGaussianModel model, double[] minpnt, double[] maxpnt, double[] ret)
      Calculates the model limits inside this node by translating the Gaussian model into a squared function.
      private int[] checkStoppingCondition​(KDTreeEM.KDTree node, int[] indices)
      This methods checks the different stopping conditions given in the paper, thus calculating the Dimensions, that will be considered for child-trees.
      elki.data.type.TypeInformation[] getInputTypeRestriction()  
      private double makeStats​(KDTreeEM.KDTree node, int[] indices, elki.database.datastore.WritableDataStore<double[]> probs)
      Calculates the statistics on the kd-tree needed for the calculation of the new models
      Clustering<EMModel> run​(elki.database.relation.Relation<? extends elki.data.NumberVector> relation)
      Calculates the EM Clustering with the given values by calling makeStats and calculation the new models from the given results
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • LOG

        private static final elki.logging.Logging LOG
        Logging object
      • soft

        private boolean soft
        Retain soft assignments.
      • delta

        private double delta
        Delta parameter
      • SOFT_TYPE

        public static final elki.data.type.SimpleTypeInformation<double[]> SOFT_TYPE
        Soft assignment result type.
      • k

        private int k
        number of models
      • mbw

        private double mbw
        minimum leaf size
      • tau

        private double tau
        tau, low for precise, high for fast results.
      • tauClass

        private double tauClass
        Drop one class if the maximum weight of a class in the bounding box is lower than tauClass * wmin_max, where wmin_max is the maximum minimum weight of all classes
      • miniter

        private int miniter
        minimum amount of iterations
      • maxiter

        private int maxiter
        maximum amount of iterations
      • sorted

        protected elki.database.ids.ArrayModifiableDBIDs sorted
        kd-tree object order
      • solver

        private elki.math.linearalgebra.ConstrainedQuadraticProblemSolver solver
        Solver for quadratic problems
      • ipiPow

        private double ipiPow
        Gaussian scaling factor for likelihood.
      • wsum

        private double[] wsum
        Cluster weights
      • exactAssign

        protected boolean exactAssign
        Perform exact cluster assignments
    • Constructor Detail

      • KDTreeEM

        public KDTreeEM​(int k,
                        double mbw,
                        double tau,
                        double tauclass,
                        double delta,
                        TextbookMultivariateGaussianModelFactory mfactory,
                        int miniter,
                        int maxiter,
                        boolean soft,
                        boolean exactAssign)
        Constructor.
        Parameters:
        k - number of classes
        mbw - minimum relative size of leaf nodes
        tau - pruning parameter
        tauclass - pruning parameter for single classes
        delta - delta parameter
        mfactory - EM cluster model factory
        miniter - Minimum number of iterations
        maxiter - Maximum number of iterations
        soft - Include soft assignments
        exactAssign - Perform exact assignments at the end
    • Method Detail

      • run

        public Clustering<EMModel> run​(elki.database.relation.Relation<? extends elki.data.NumberVector> relation)
        Calculates the EM Clustering with the given values by calling makeStats and calculation the new models from the given results
        Parameters:
        relation - Data Relation
        Returns:
        Clustering KDTreeEM Clustering
      • analyseDimWidth

        private double[] analyseDimWidth​(elki.database.relation.Relation<? extends elki.data.NumberVector> relation)
        Helper method to retrieve the widths of all data in all dimensions.
        Parameters:
        relation - Relation to analyze
        Returns:
        width of each dimension
      • checkStoppingCondition

        private int[] checkStoppingCondition​(KDTreeEM.KDTree node,
                                             int[] indices)
        This methods checks the different stopping conditions given in the paper, thus calculating the Dimensions, that will be considered for child-trees. If this method returns a non-empty subset of the input dimension set, it means that missing dimensions are dropped because their weight was too small. If it returns a null array it means that the expected error of all remaining models is small enough to consider this node a leaf node.
        Parameters:
        node - kd tree node
        indices - list of indices to check
        Returns:
        indices that are not pruned, null if everything was pruned
      • calculateModelLimits

        private void calculateModelLimits​(KDTreeEM.KDTree node,
                                          TextbookMultivariateGaussianModel model,
                                          double[] minpnt,
                                          double[] maxpnt,
                                          double[] ret)
        Calculates the model limits inside this node by translating the Gaussian model into a squared function.
        Parameters:
        model - model to calculate the limits for
        minpnt - result array for argmin
        maxpnt - result array for argmax
        ret - Return array
      • makeStats

        private double makeStats​(KDTreeEM.KDTree node,
                                 int[] indices,
                                 elki.database.datastore.WritableDataStore<double[]> probs)
        Calculates the statistics on the kd-tree needed for the calculation of the new models
        Parameters:
        node - next node
        indices - list of indices to use in calculation, initially all
        probs - cluster assignment
        Returns:
        log likelihood of the model
      • getInputTypeRestriction

        public elki.data.type.TypeInformation[] getInputTypeRestriction()
        Specified by:
        getInputTypeRestriction in interface elki.Algorithm