Class KDEOS<O>
- java.lang.Object
-
- elki.outlier.lof.KDEOS<O>
-
- Type Parameters:
O- Object type
- All Implemented Interfaces:
elki.Algorithm,OutlierAlgorithm
@Title("KDEOS: Kernel Density Estimator Outlier Score") @Reference(authors="Erich Schubert, Arthur Zimek, Hans-Peter Kriegel", title="Generalized Outlier Detection with Flexible Kernel Density Estimates", booktitle="Proc. 14th SIAM International Conference on Data Mining (SDM 2014)", url="https://doi.org/10.1137/1.9781611973440.63", bibkey="DBLP:conf/sdm/SchubertZK14") public class KDEOS<O> extends java.lang.Object implements OutlierAlgorithmGeneralized Outlier Detection with Flexible Kernel Density Estimates.This is an outlier detection inspired by LOF, but using kernel density estimation (KDE) from statistics. Unfortunately, for higher dimensional data, kernel density estimation itself becomes difficult. At this point, the kdeos.idim parameter can become useful, which allows to either disable dimensionality adjustment completely (0) or to set it to a lower dimensionality than the data representation. This may sound like a hack at first, but real data is often of lower intrinsic dimensionality, and embedded into a higher data representation. Adjusting the kernel to account for the representation seems to yield worse results than using a lower, intrinsic, dimensionality.
If your data set has many duplicates, the kdeos.kernel.minbw parameter sets a minimum kernel bandwidth, which may improve results in these cases, as it prevents kernels from degenerating to single points.
Reference:
Erich Schubert, Arthur Zimek, Hans-Peter Kriegel
Generalized Outlier Detection with Flexible Kernel Density Estimates
Proc. 14th SIAM International Conference on Data Mining (SDM 2014)- Since:
- 0.7.0
- Author:
- Erich Schubert
-
-
Field Summary
Fields Modifier and Type Field Description private static doubleCUTOFFSignificance cutoff when computing kernel density.protected elki.distance.Distance<? super O>distanceDistance function used.protected intidimIntrinsic dimensionality.protected elki.math.statistics.kernelfunctions.KernelDensityFunctionkernelKernel function to use for density estimation.protected intkmaxMaximum number of neighbors to use.protected intkminMinimum number of neighbors to use.private static elki.logging.LoggingLOGClass logger.protected doubleminBandwidthKernel minimum bandwidth.protected doublescaleKernel scaling parameter.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidcomputeOutlierScores(elki.database.query.knn.KNNSearcher<elki.database.ids.DBIDRef> knnq, elki.database.ids.DBIDs ids, elki.database.datastore.WritableDataStore<double[]> densities, elki.database.datastore.WritableDoubleDataStore kdeos, elki.math.DoubleMinMax minmax)Compute the final KDEOS scores.private intdimensionality(elki.database.relation.Relation<O> rel)Ugly hack to allow using this implementation without having a well-defined dimensionality.protected voidestimateDensities(elki.database.relation.Relation<O> rel, elki.database.query.knn.KNNSearcher<elki.database.ids.DBIDRef> knnq, elki.database.ids.DBIDs ids, elki.database.datastore.WritableDataStore<double[]> densities)Perform the kernel density estimation step.elki.data.type.TypeInformation[]getInputTypeRestriction()OutlierResultrun(elki.database.relation.Relation<O> rel)Run the KDEOS outlier detection algorithm.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface elki.outlier.OutlierAlgorithm
autorun
-
-
-
-
Field Detail
-
LOG
private static final elki.logging.Logging LOG
Class logger.
-
CUTOFF
private static final double CUTOFF
Significance cutoff when computing kernel density.- See Also:
- Constant Field Values
-
distance
protected elki.distance.Distance<? super O> distance
Distance function used.
-
kernel
protected elki.math.statistics.kernelfunctions.KernelDensityFunction kernel
Kernel function to use for density estimation.
-
kmin
protected int kmin
Minimum number of neighbors to use.
-
kmax
protected int kmax
Maximum number of neighbors to use.
-
scale
protected double scale
Kernel scaling parameter.
-
minBandwidth
protected double minBandwidth
Kernel minimum bandwidth.
-
idim
protected int idim
Intrinsic dimensionality.
-
-
Constructor Detail
-
KDEOS
public KDEOS(elki.distance.Distance<? super O> distance, int kmin, int kmax, elki.math.statistics.kernelfunctions.KernelDensityFunction kernel, double minBandwidth, double scale, int idim)
Constructor.- Parameters:
distance- Distance functionkmin- Minimum number of neighborskmax- Maximum number of neighborskernel- Kernel functionminBandwidth- Minimum bandwidthscale- Kernel scaling parameteridim- Intrinsic dimensionality (use 0 to use real dimensionality)
-
-
Method Detail
-
getInputTypeRestriction
public elki.data.type.TypeInformation[] getInputTypeRestriction()
- Specified by:
getInputTypeRestrictionin interfaceelki.Algorithm
-
run
public OutlierResult run(elki.database.relation.Relation<O> rel)
Run the KDEOS outlier detection algorithm.- Parameters:
rel- Relation to process- Returns:
- Outlier detection result
-
estimateDensities
protected void estimateDensities(elki.database.relation.Relation<O> rel, elki.database.query.knn.KNNSearcher<elki.database.ids.DBIDRef> knnq, elki.database.ids.DBIDs ids, elki.database.datastore.WritableDataStore<double[]> densities)
Perform the kernel density estimation step.- Parameters:
rel- Relation to queryknnq- kNN queryids- IDs to processdensities- Density storage
-
dimensionality
private int dimensionality(elki.database.relation.Relation<O> rel)
Ugly hack to allow using this implementation without having a well-defined dimensionality.- Parameters:
rel- Data relation- Returns:
- Dimensionality
-
computeOutlierScores
protected void computeOutlierScores(elki.database.query.knn.KNNSearcher<elki.database.ids.DBIDRef> knnq, elki.database.ids.DBIDs ids, elki.database.datastore.WritableDataStore<double[]> densities, elki.database.datastore.WritableDoubleDataStore kdeos, elki.math.DoubleMinMax minmax)Compute the final KDEOS scores.- Parameters:
knnq- kNN queryids- IDs to processdensities- Density estimateskdeos- Score outputsminmax- Minimum and maximum scores
-
-