Class ReferenceBasedOutlierDetection

  • All Implemented Interfaces:
    elki.Algorithm, OutlierAlgorithm

    @Title("An Efficient Reference-based Approach to Outlier Detection in Large Datasets")
    @Description("Computes kNN distances approximately, using reference points with various reference point strategies.")
    @Reference(authors="Y. Pei, O. R. Zaiane, Y. Gao",
               title="An Efficient Reference-based Approach to Outlier Detection in Large Datasets",
               booktitle="Proc. 6th IEEE Int. Conf. on Data Mining (ICDM \'06)",
               url="https://doi.org/10.1109/ICDM.2006.17",
               bibkey="DBLP:conf/icdm/PeiZG06")
    public class ReferenceBasedOutlierDetection
    extends java.lang.Object
    implements OutlierAlgorithm
    Reference-Based Outlier Detection algorithm, an algorithm that computes kNN distances approximately, using reference points.

    kNN distances are approximated by the difference in distance from a reference point. For this approximation to be of high quality, triangle inequality is required; but the algorithm can also process non-metric distances.

    Reference:

    Y. Pei, O. R. Zaiane, Y. Gao
    An Efficient Reference-Based Approach to Outlier Detection in Large Datasets
    Proc. IEEE Int. Conf. on Data Mining (ICDM'06)

    Since:
    0.3
    Author:
    Lisa Reichert, Erich Schubert
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  ReferenceBasedOutlierDetection.Par
      Parameterization class.
      • Nested classes/interfaces inherited from interface elki.Algorithm

        elki.Algorithm.Utils
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected elki.distance.NumberVectorDistance<? super elki.data.NumberVector> distance
      Distance function used.
      protected int k
      Holds the number of neighbors to use for density estimation.
      protected elki.utilities.referencepoints.ReferencePointsHeuristic refp
      Stores the reference point strategy.
    • Constructor Summary

      Constructors 
      Constructor Description
      ReferenceBasedOutlierDetection​(int k, elki.distance.NumberVectorDistance<? super elki.data.NumberVector> distance, elki.utilities.referencepoints.ReferencePointsHeuristic refp)
      Constructor with parameters.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected double computeDensity​(elki.database.ids.DoubleDBIDList referenceDists, elki.database.ids.DoubleDBIDListIter iter, int index)
      Computes the density of an object.
      protected elki.database.ids.DoubleDBIDList computeDistanceVector​(elki.data.NumberVector refPoint, elki.database.relation.Relation<? extends elki.data.NumberVector> database, elki.database.query.distance.PrimitiveDistanceQuery<? super elki.data.NumberVector> distFunc)
      Computes for each object the distance to one reference point.
      elki.data.type.TypeInformation[] getInputTypeRestriction()  
      OutlierResult run​(elki.database.relation.Relation<? extends elki.data.NumberVector> relation)
      Run the algorithm on the given relation.
      protected void updateDensities​(elki.database.datastore.WritableDoubleDataStore rbod_score, elki.database.ids.DoubleDBIDList referenceDists)
      Update the density estimates for each object.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • distance

        protected elki.distance.NumberVectorDistance<? super elki.data.NumberVector> distance
        Distance function used.
      • k

        protected int k
        Holds the number of neighbors to use for density estimation.
      • refp

        protected elki.utilities.referencepoints.ReferencePointsHeuristic refp
        Stores the reference point strategy.
    • Constructor Detail

      • ReferenceBasedOutlierDetection

        public ReferenceBasedOutlierDetection​(int k,
                                              elki.distance.NumberVectorDistance<? super elki.data.NumberVector> distance,
                                              elki.utilities.referencepoints.ReferencePointsHeuristic refp)
        Constructor with parameters.
        Parameters:
        k - number of neighbors
        distance - distance function
        refp - Reference points heuristic
    • Method Detail

      • getInputTypeRestriction

        public elki.data.type.TypeInformation[] getInputTypeRestriction()
        Specified by:
        getInputTypeRestriction in interface elki.Algorithm
      • run

        public OutlierResult run​(elki.database.relation.Relation<? extends elki.data.NumberVector> relation)
        Run the algorithm on the given relation.
        Parameters:
        relation - Relation to process
        Returns:
        Outlier result
      • computeDistanceVector

        protected elki.database.ids.DoubleDBIDList computeDistanceVector​(elki.data.NumberVector refPoint,
                                                                         elki.database.relation.Relation<? extends elki.data.NumberVector> database,
                                                                         elki.database.query.distance.PrimitiveDistanceQuery<? super elki.data.NumberVector> distFunc)
        Computes for each object the distance to one reference point. (one dimensional representation of the data set)
        Parameters:
        refPoint - Reference Point Feature Vector
        database - database to work on
        distFunc - Distance function to use
        Returns:
        array containing the distance to one reference point for each database object and the object id
      • updateDensities

        protected void updateDensities​(elki.database.datastore.WritableDoubleDataStore rbod_score,
                                       elki.database.ids.DoubleDBIDList referenceDists)
        Update the density estimates for each object.
        Parameters:
        rbod_score - Density storage
        referenceDists - Distances from current reference point
      • computeDensity

        protected double computeDensity​(elki.database.ids.DoubleDBIDList referenceDists,
                                        elki.database.ids.DoubleDBIDListIter iter,
                                        int index)
        Computes the density of an object. The density of an object is the distances to the k nearest neighbors. Neighbors and distances are computed approximately. (approximation for kNN distance: instead of a normal NN search the NN of an object are those objects that have a similar distance to a reference point. The k-nearest neighbors of an object are those objects that lay close to the object in the reference distance vector)
        Parameters:
        referenceDists - vector of the reference distances
        iter - Iterator to this list (will be reused)
        index - index of the current object
        Returns:
        density for one object and reference point