Class SOS<O>

  • Type Parameters:
    O - Object type
    All Implemented Interfaces:
    elki.Algorithm, OutlierAlgorithm

    @Title("SOS: Stochastic Outlier Selection")
    @Reference(authors="J. Janssens, F. Husz\u00e1r, E. Postma, J. van den Herik",
               title="Stochastic Outlier Selection",
               booktitle="TiCC TR 2012\u2013001",
               url="https://www.tilburguniversity.edu/upload/b7bac5b2-9b00-402a-9261-7849aa019fbb_sostr.pdf",
               bibkey="tr/tilburg/JanssensHPv12")
    public class SOS<O>
    extends java.lang.Object
    implements OutlierAlgorithm
    Stochastic Outlier Selection.

    Reference:

    J. Janssens, F. Huszár, E. Postma, J. van den Herik
    Stochastic Outlier Selection
    TiCC TR 2012–001

    Since:
    0.7.5
    Author:
    Erich Schubert
    • Nested Class Summary

      • Nested classes/interfaces inherited from interface elki.Algorithm

        elki.Algorithm.Utils
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected elki.distance.Distance<? super O> distance
      Distance function used.
      private static elki.logging.Logging LOG
      Class logger.
      protected double perplexity
      Perplexity
      protected static double PERPLEXITY_ERROR
      Threshold for optimizing perplexity.
      protected static int PERPLEXITY_MAXITER
      Maximum number of iterations when optimizing perplexity.
    • Constructor Summary

      Constructors 
      Constructor Description
      SOS​(elki.distance.Distance<? super O> distance, double h)
      Constructor.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected static double computeH​(elki.database.ids.DBIDRef ignore, elki.database.ids.DoubleDBIDListIter it, double[] p, double mbeta)
      Compute H (observed perplexity) for row i, and the row pij_i.
      static double computePi​(elki.database.ids.DBIDRef ignore, elki.database.ids.DoubleDBIDListIter it, double[] p, double perplexity, double logPerp)
      Compute row p[i], using binary search on the kernel bandwidth sigma to obtain the desired perplexity.
      protected static double estimateInitialBeta​(elki.database.ids.DBIDRef ignore, elki.database.ids.DoubleDBIDListIter it, double perplexity)
      Estimate beta from the distances in a row.
      elki.data.type.TypeInformation[] getInputTypeRestriction()  
      static void nominateNeighbors​(elki.database.ids.DBIDIter ignore, elki.database.ids.DBIDArrayIter di, double[] p, double norm, elki.database.datastore.WritableDoubleDataStore scores)
      Vote for neighbors not being outliers.
      OutlierResult run​(elki.database.relation.Relation<O> relation)
      Run the algorithm.
      static double sumOfProbabilities​(elki.database.ids.DBIDIter ignore, elki.database.ids.DBIDArrayIter di, double[] p)
      Compute the sum of probabilities, stop at first 0, ignore query object.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • LOG

        private static final elki.logging.Logging LOG
        Class logger.
      • PERPLEXITY_ERROR

        protected static final double PERPLEXITY_ERROR
        Threshold for optimizing perplexity.
        See Also:
        Constant Field Values
      • PERPLEXITY_MAXITER

        protected static final int PERPLEXITY_MAXITER
        Maximum number of iterations when optimizing perplexity.
        See Also:
        Constant Field Values
      • distance

        protected elki.distance.Distance<? super O> distance
        Distance function used.
      • perplexity

        protected double perplexity
        Perplexity
    • Constructor Detail

      • SOS

        public SOS​(elki.distance.Distance<? super O> distance,
                   double h)
        Constructor.
        Parameters:
        distance - Distance function
        h - Perplexity
    • Method Detail

      • getInputTypeRestriction

        public elki.data.type.TypeInformation[] getInputTypeRestriction()
        Specified by:
        getInputTypeRestriction in interface elki.Algorithm
      • run

        public OutlierResult run​(elki.database.relation.Relation<O> relation)
        Run the algorithm.
        Parameters:
        relation - data relation
        Returns:
        outlier detection result
      • sumOfProbabilities

        public static double sumOfProbabilities​(elki.database.ids.DBIDIter ignore,
                                                elki.database.ids.DBIDArrayIter di,
                                                double[] p)
        Compute the sum of probabilities, stop at first 0, ignore query object. Note: while SOS ensures the 'ignore' object is not added in the first place, KNNSOS cannot do so efficiently (yet).
        Parameters:
        ignore - Object to ignore.
        di - Object list
        p - Probabilities
        Returns:
        Sum.
      • nominateNeighbors

        public static void nominateNeighbors​(elki.database.ids.DBIDIter ignore,
                                             elki.database.ids.DBIDArrayIter di,
                                             double[] p,
                                             double norm,
                                             elki.database.datastore.WritableDoubleDataStore scores)
        Vote for neighbors not being outliers. The key method of SOS.
        Parameters:
        ignore - Object to ignore
        di - Neighbor object IDs.
        p - Probabilities
        norm - Normalization factor (1/sum)
        scores - Output score storage
      • computePi

        public static double computePi​(elki.database.ids.DBIDRef ignore,
                                       elki.database.ids.DoubleDBIDListIter it,
                                       double[] p,
                                       double perplexity,
                                       double logPerp)
        Compute row p[i], using binary search on the kernel bandwidth sigma to obtain the desired perplexity.
        Parameters:
        ignore - Object to skip
        it - Distances iterator
        p - Output row
        perplexity - Desired perplexity
        logPerp - Log of desired perplexity
        Returns:
        Beta
      • estimateInitialBeta

        @Reference(authors="Erich Schubert, Michael Gertz",
                   title="Intrinsic t-Stochastic Neighbor Embedding for Visualization and Outlier Detection: A Remedy Against the Curse of Dimensionality?",
                   booktitle="Proc. Int. Conf. Similarity Search and Applications, SISAP 2017",
                   url="https://doi.org/10.1007/978-3-319-68474-1_13",
                   bibkey="DBLP:conf/sisap/SchubertG17")
        protected static double estimateInitialBeta​(elki.database.ids.DBIDRef ignore,
                                                    elki.database.ids.DoubleDBIDListIter it,
                                                    double perplexity)
        Estimate beta from the distances in a row.

        This lacks a thorough mathematical argument, but is a handcrafted heuristic to avoid numerical problems. The average distance is usually too large, so we scale the average distance by 2*N/perplexity. Then estimate beta as 1/x.

        Parameters:
        ignore - Object to skip
        it - Distance iterator
        perplexity - Desired perplexity
        Returns:
        Estimated beta.
      • computeH

        protected static double computeH​(elki.database.ids.DBIDRef ignore,
                                         elki.database.ids.DoubleDBIDListIter it,
                                         double[] p,
                                         double mbeta)
        Compute H (observed perplexity) for row i, and the row pij_i.
        Parameters:
        ignore - Object to skip
        it - Distances list
        p - Output probabilities
        mbeta - -1. / (2 * sigma * sigma)
        Returns:
        Observed perplexity