Class PerturbationFilter<V extends elki.data.NumberVector>

  • All Implemented Interfaces:
    elki.datasource.filter.ObjectFilter

    @Title("Data Perturbation for Outlier Detection Ensembles")
    @Description("A filter to perturb a datasset on read by an additive noise component, implemented for use in an outlier ensemble (this reference).")
    @Reference(authors="A. Zimek, R. J. G. B. Campello, J. Sander",
               title="Data Perturbation for Outlier Detection Ensembles",
               booktitle="Proc. 26th International Conference on Scientific and Statistical Database Management (SSDBM), Aalborg, Denmark, 2014",
               url="https://doi.org/10.1145/2618243.2618257",
               bibkey="DBLP:conf/ssdbm/ZimekCS14")
    public class PerturbationFilter<V extends elki.data.NumberVector>
    extends AbstractVectorConversionFilter<V,​V>
    A filter to perturb the values by adding micro-noise.

    The added noise is generated, attribute-wise, by a Gaussian with mean=0 and a specified standard deviation or by a uniform distribution with a specified range. The standard deviation or the range can be scaled, attribute-wise, to a given percentage of the original standard deviation in the data distribution (assuming a Gaussian distribution there), or to a percentage of the extension in each attribute (maximumValue - minimumValue).

    This filter has a potentially wide use but has been implemented for the following publication:

    Reference:

    A. Zimek, R. J. G. B. Campello, J. Sander
    Data Perturbation for Outlier Detection Ensemble
    Proc. 26th Int. Conf. on Scientific and Statistical Database Management (SSDBM 2014)

    Since:
    0.7.0
    Author:
    Arthur Zimek
    • Field Detail

      • LOG

        private static final elki.logging.Logging LOG
        Class logger
      • RANDOM

        private final java.util.Random RANDOM
        Random object to generate the attribute-wise seeds for the noise.
      • percentage

        private double percentage
        Percentage of the variance of the random noise generation, given the variance of the corresponding attribute in the data.
      • mvs

        private elki.math.MeanVarianceMinMax[] mvs
        Temporary storage used during initialization.
      • scalingreferencevalues

        private double[] scalingreferencevalues
        Stores the scaling reference in each dimension.
      • randomPerAttribute

        private java.util.Random[] randomPerAttribute
        The random objects to generate noise distributions independently for each attribute.
      • maxima

        private double[] maxima
        Stores the maximum in each dimension.
      • minima

        private double[] minima
        Stores the minimum in each dimension.
      • dimensionality

        private int dimensionality
        Stores the dimensionality from the preprocessing.
    • Constructor Detail

      • PerturbationFilter

        public PerturbationFilter​(java.lang.Long seed,
                                  double percentage,
                                  PerturbationFilter.ScalingReference scalingreference,
                                  double[] minima,
                                  double[] maxima,
                                  PerturbationFilter.NoiseDistribution noisedistribution)
        Constructor.
        Parameters:
        seed - Seed value, may be null for a random seed.
        percentage - Relative amount of jitter to add
        scalingreference - Scaling reference
        minima - Preset minimum values. May be null.
        maxima - Preset maximum values. May be null.
        noisedistribution - Nature of the noise distribution.
    • Method Detail

      • prepareStart

        protected boolean prepareStart​(elki.data.type.SimpleTypeInformation<V> in)
        Description copied from class: AbstractConversionFilter
        Return "true" when the normalization needs initialization (two-pass filtering!).
        Overrides:
        prepareStart in class AbstractConversionFilter<V extends elki.data.NumberVector,​V extends elki.data.NumberVector>
        Parameters:
        in - Input type information
        Returns:
        true or false
      • prepareProcessInstance

        protected void prepareProcessInstance​(V featureVector)
        Description copied from class: AbstractConversionFilter
        Process a single object during initialization.
        Overrides:
        prepareProcessInstance in class AbstractConversionFilter<V extends elki.data.NumberVector,​V extends elki.data.NumberVector>
        Parameters:
        featureVector - Object to process
      • getInputTypeRestriction

        protected elki.data.type.SimpleTypeInformation<? super V> getInputTypeRestriction()
        Description copied from class: AbstractConversionFilter
        Get the input type restriction used for negotiating the data query.
        Specified by:
        getInputTypeRestriction in class AbstractConversionFilter<V extends elki.data.NumberVector,​V extends elki.data.NumberVector>
        Returns:
        Type restriction
      • filterSingleObject

        protected V filterSingleObject​(V featureVector)
        Description copied from class: AbstractConversionFilter
        Normalize a single instance. You can implement this as UnsupportedOperationException if you override both public "normalize" functions!
        Specified by:
        filterSingleObject in class AbstractConversionFilter<V extends elki.data.NumberVector,​V extends elki.data.NumberVector>
        Parameters:
        featureVector - Database object to normalize
        Returns:
        Normalized database object
      • convertedType

        protected elki.data.type.SimpleTypeInformation<? super V> convertedType​(elki.data.type.SimpleTypeInformation<V> in)
        Description copied from class: AbstractConversionFilter
        Get the output type from the input type after conversion.
        Specified by:
        convertedType in class AbstractConversionFilter<V extends elki.data.NumberVector,​V extends elki.data.NumberVector>
        Parameters:
        in - input type restriction
        Returns:
        output type restriction