Package elki.outlier.distance
Class ReferenceBasedOutlierDetection
- java.lang.Object
-
- elki.outlier.distance.ReferenceBasedOutlierDetection
-
- All Implemented Interfaces:
elki.Algorithm,OutlierAlgorithm
@Title("An Efficient Reference-based Approach to Outlier Detection in Large Datasets") @Description("Computes kNN distances approximately, using reference points with various reference point strategies.") @Reference(authors="Y. Pei, O. R. Zaiane, Y. Gao", title="An Efficient Reference-based Approach to Outlier Detection in Large Datasets", booktitle="Proc. 6th IEEE Int. Conf. on Data Mining (ICDM \'06)", url="https://doi.org/10.1109/ICDM.2006.17", bibkey="DBLP:conf/icdm/PeiZG06") public class ReferenceBasedOutlierDetection extends java.lang.Object implements OutlierAlgorithmReference-Based Outlier Detection algorithm, an algorithm that computes kNN distances approximately, using reference points.kNN distances are approximated by the difference in distance from a reference point. For this approximation to be of high quality, triangle inequality is required; but the algorithm can also process non-metric distances.
Reference:
Y. Pei, O. R. Zaiane, Y. Gao
An Efficient Reference-Based Approach to Outlier Detection in Large Datasets
Proc. IEEE Int. Conf. on Data Mining (ICDM'06)- Since:
- 0.3
- Author:
- Lisa Reichert, Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classReferenceBasedOutlierDetection.ParParameterization class.
-
Field Summary
Fields Modifier and Type Field Description protected elki.distance.NumberVectorDistance<? super elki.data.NumberVector>distanceDistance function used.protected intkHolds the number of neighbors to use for density estimation.protected elki.utilities.referencepoints.ReferencePointsHeuristicrefpStores the reference point strategy.
-
Constructor Summary
Constructors Constructor Description ReferenceBasedOutlierDetection(int k, elki.distance.NumberVectorDistance<? super elki.data.NumberVector> distance, elki.utilities.referencepoints.ReferencePointsHeuristic refp)Constructor with parameters.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected doublecomputeDensity(elki.database.ids.DoubleDBIDList referenceDists, elki.database.ids.DoubleDBIDListIter iter, int index)Computes the density of an object.protected elki.database.ids.DoubleDBIDListcomputeDistanceVector(elki.data.NumberVector refPoint, elki.database.relation.Relation<? extends elki.data.NumberVector> database, elki.database.query.distance.PrimitiveDistanceQuery<? super elki.data.NumberVector> distFunc)Computes for each object the distance to one reference point.elki.data.type.TypeInformation[]getInputTypeRestriction()OutlierResultrun(elki.database.relation.Relation<? extends elki.data.NumberVector> relation)Run the algorithm on the given relation.protected voidupdateDensities(elki.database.datastore.WritableDoubleDataStore rbod_score, elki.database.ids.DoubleDBIDList referenceDists)Update the density estimates for each object.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface elki.outlier.OutlierAlgorithm
autorun
-
-
-
-
Field Detail
-
distance
protected elki.distance.NumberVectorDistance<? super elki.data.NumberVector> distance
Distance function used.
-
k
protected int k
Holds the number of neighbors to use for density estimation.
-
refp
protected elki.utilities.referencepoints.ReferencePointsHeuristic refp
Stores the reference point strategy.
-
-
Constructor Detail
-
ReferenceBasedOutlierDetection
public ReferenceBasedOutlierDetection(int k, elki.distance.NumberVectorDistance<? super elki.data.NumberVector> distance, elki.utilities.referencepoints.ReferencePointsHeuristic refp)Constructor with parameters.- Parameters:
k- number of neighborsdistance- distance functionrefp- Reference points heuristic
-
-
Method Detail
-
getInputTypeRestriction
public elki.data.type.TypeInformation[] getInputTypeRestriction()
- Specified by:
getInputTypeRestrictionin interfaceelki.Algorithm
-
run
public OutlierResult run(elki.database.relation.Relation<? extends elki.data.NumberVector> relation)
Run the algorithm on the given relation.- Parameters:
relation- Relation to process- Returns:
- Outlier result
-
computeDistanceVector
protected elki.database.ids.DoubleDBIDList computeDistanceVector(elki.data.NumberVector refPoint, elki.database.relation.Relation<? extends elki.data.NumberVector> database, elki.database.query.distance.PrimitiveDistanceQuery<? super elki.data.NumberVector> distFunc)Computes for each object the distance to one reference point. (one dimensional representation of the data set)- Parameters:
refPoint- Reference Point Feature Vectordatabase- database to work ondistFunc- Distance function to use- Returns:
- array containing the distance to one reference point for each database object and the object id
-
updateDensities
protected void updateDensities(elki.database.datastore.WritableDoubleDataStore rbod_score, elki.database.ids.DoubleDBIDList referenceDists)Update the density estimates for each object.- Parameters:
rbod_score- Density storagereferenceDists- Distances from current reference point
-
computeDensity
protected double computeDensity(elki.database.ids.DoubleDBIDList referenceDists, elki.database.ids.DoubleDBIDListIter iter, int index)Computes the density of an object. The density of an object is the distances to the k nearest neighbors. Neighbors and distances are computed approximately. (approximation for kNN distance: instead of a normal NN search the NN of an object are those objects that have a similar distance to a reference point. The k-nearest neighbors of an object are those objects that lay close to the object in the reference distance vector)- Parameters:
referenceDists- vector of the reference distancesiter- Iterator to this list (will be reused)index- index of the current object- Returns:
- density for one object and reference point
-
-