Class KDEOS<O>

  • Type Parameters:
    O - Object type
    All Implemented Interfaces:
    elki.Algorithm, OutlierAlgorithm

    @Title("KDEOS: Kernel Density Estimator Outlier Score")
    @Reference(authors="Erich Schubert, Arthur Zimek, Hans-Peter Kriegel",
               title="Generalized Outlier Detection with Flexible Kernel Density Estimates",
               booktitle="Proc. 14th SIAM International Conference on Data Mining (SDM 2014)",
               url="https://doi.org/10.1137/1.9781611973440.63",
               bibkey="DBLP:conf/sdm/SchubertZK14")
    public class KDEOS<O>
    extends java.lang.Object
    implements OutlierAlgorithm
    Generalized Outlier Detection with Flexible Kernel Density Estimates.

    This is an outlier detection inspired by LOF, but using kernel density estimation (KDE) from statistics. Unfortunately, for higher dimensional data, kernel density estimation itself becomes difficult. At this point, the kdeos.idim parameter can become useful, which allows to either disable dimensionality adjustment completely (0) or to set it to a lower dimensionality than the data representation. This may sound like a hack at first, but real data is often of lower intrinsic dimensionality, and embedded into a higher data representation. Adjusting the kernel to account for the representation seems to yield worse results than using a lower, intrinsic, dimensionality.

    If your data set has many duplicates, the kdeos.kernel.minbw parameter sets a minimum kernel bandwidth, which may improve results in these cases, as it prevents kernels from degenerating to single points.

    Reference:

    Erich Schubert, Arthur Zimek, Hans-Peter Kriegel
    Generalized Outlier Detection with Flexible Kernel Density Estimates
    Proc. 14th SIAM International Conference on Data Mining (SDM 2014)

    Since:
    0.7.0
    Author:
    Erich Schubert
    • Nested Class Summary

      • Nested classes/interfaces inherited from interface elki.Algorithm

        elki.Algorithm.Utils
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private static double CUTOFF
      Significance cutoff when computing kernel density.
      protected elki.distance.Distance<? super O> distance
      Distance function used.
      protected int idim
      Intrinsic dimensionality.
      protected elki.math.statistics.kernelfunctions.KernelDensityFunction kernel
      Kernel function to use for density estimation.
      protected int kmax
      Maximum number of neighbors to use.
      protected int kmin
      Minimum number of neighbors to use.
      private static elki.logging.Logging LOG
      Class logger.
      protected double minBandwidth
      Kernel minimum bandwidth.
      protected double scale
      Kernel scaling parameter.
    • Constructor Summary

      Constructors 
      Constructor Description
      KDEOS​(elki.distance.Distance<? super O> distance, int kmin, int kmax, elki.math.statistics.kernelfunctions.KernelDensityFunction kernel, double minBandwidth, double scale, int idim)
      Constructor.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected void computeOutlierScores​(elki.database.query.knn.KNNSearcher<elki.database.ids.DBIDRef> knnq, elki.database.ids.DBIDs ids, elki.database.datastore.WritableDataStore<double[]> densities, elki.database.datastore.WritableDoubleDataStore kdeos, elki.math.DoubleMinMax minmax)
      Compute the final KDEOS scores.
      private int dimensionality​(elki.database.relation.Relation<O> rel)
      Ugly hack to allow using this implementation without having a well-defined dimensionality.
      protected void estimateDensities​(elki.database.relation.Relation<O> rel, elki.database.query.knn.KNNSearcher<elki.database.ids.DBIDRef> knnq, elki.database.ids.DBIDs ids, elki.database.datastore.WritableDataStore<double[]> densities)
      Perform the kernel density estimation step.
      elki.data.type.TypeInformation[] getInputTypeRestriction()  
      OutlierResult run​(elki.database.relation.Relation<O> rel)
      Run the KDEOS outlier detection algorithm.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • LOG

        private static final elki.logging.Logging LOG
        Class logger.
      • CUTOFF

        private static final double CUTOFF
        Significance cutoff when computing kernel density.
        See Also:
        Constant Field Values
      • distance

        protected elki.distance.Distance<? super O> distance
        Distance function used.
      • kernel

        protected elki.math.statistics.kernelfunctions.KernelDensityFunction kernel
        Kernel function to use for density estimation.
      • kmin

        protected int kmin
        Minimum number of neighbors to use.
      • kmax

        protected int kmax
        Maximum number of neighbors to use.
      • scale

        protected double scale
        Kernel scaling parameter.
      • minBandwidth

        protected double minBandwidth
        Kernel minimum bandwidth.
      • idim

        protected int idim
        Intrinsic dimensionality.
    • Constructor Detail

      • KDEOS

        public KDEOS​(elki.distance.Distance<? super O> distance,
                     int kmin,
                     int kmax,
                     elki.math.statistics.kernelfunctions.KernelDensityFunction kernel,
                     double minBandwidth,
                     double scale,
                     int idim)
        Constructor.
        Parameters:
        distance - Distance function
        kmin - Minimum number of neighbors
        kmax - Maximum number of neighbors
        kernel - Kernel function
        minBandwidth - Minimum bandwidth
        scale - Kernel scaling parameter
        idim - Intrinsic dimensionality (use 0 to use real dimensionality)
    • Method Detail

      • getInputTypeRestriction

        public elki.data.type.TypeInformation[] getInputTypeRestriction()
        Specified by:
        getInputTypeRestriction in interface elki.Algorithm
      • run

        public OutlierResult run​(elki.database.relation.Relation<O> rel)
        Run the KDEOS outlier detection algorithm.
        Parameters:
        rel - Relation to process
        Returns:
        Outlier detection result
      • estimateDensities

        protected void estimateDensities​(elki.database.relation.Relation<O> rel,
                                         elki.database.query.knn.KNNSearcher<elki.database.ids.DBIDRef> knnq,
                                         elki.database.ids.DBIDs ids,
                                         elki.database.datastore.WritableDataStore<double[]> densities)
        Perform the kernel density estimation step.
        Parameters:
        rel - Relation to query
        knnq - kNN query
        ids - IDs to process
        densities - Density storage
      • dimensionality

        private int dimensionality​(elki.database.relation.Relation<O> rel)
        Ugly hack to allow using this implementation without having a well-defined dimensionality.
        Parameters:
        rel - Data relation
        Returns:
        Dimensionality
      • computeOutlierScores

        protected void computeOutlierScores​(elki.database.query.knn.KNNSearcher<elki.database.ids.DBIDRef> knnq,
                                            elki.database.ids.DBIDs ids,
                                            elki.database.datastore.WritableDataStore<double[]> densities,
                                            elki.database.datastore.WritableDoubleDataStore kdeos,
                                            elki.math.DoubleMinMax minmax)
        Compute the final KDEOS scores.
        Parameters:
        knnq - kNN query
        ids - IDs to process
        densities - Density estimates
        kdeos - Score outputs
        minmax - Minimum and maximum scores