Class APRIORI

  • All Implemented Interfaces:
    elki.Algorithm

    @Title("APRIORI: Algorithm for Mining Association Rules")
    @Description("Searches for frequent itemsets")
    @Reference(authors="R. Agrawal, R. Srikant",
               title="Fast Algorithms for Mining Association Rules",
               booktitle="Proc. 20th Int. Conf. on Very Large Data Bases (VLDB \'94)",
               url="http://www.vldb.org/conf/1994/P487.PDF",
               bibkey="DBLP:conf/vldb/AgrawalS94")
    public class APRIORI
    extends AbstractFrequentItemsetAlgorithm
    The APRIORI algorithm for Mining Association Rules.

    Reference:

    R. Agrawal, R. Srikant
    Fast Algorithms for Mining Association Rules
    In Proc. 20th Int. Conf. on Very Large Data Bases (VLDB '94)

    This implementation uses some simple optimizations for 1- and 2-itemsets, but does not implement the original hash tree (yet, please contribute).

    Note: this algorithm scales well to a large number of transactions, but not so well to a large number of frequent itemsets (items). For best results, use domain-specific preprocessing to aggregate items into groups. Use statistics logging to keep track of candidate set sizes.

    Since:
    0.1
    Author:
    Arthur Zimek, Erich Schubert
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  APRIORI.Par
      Parameterization class.
      • Nested classes/interfaces inherited from interface elki.Algorithm

        elki.Algorithm.Utils
    • Constructor Summary

      Constructors 
      Constructor Description
      APRIORI​(double minfreq)
      Constructor with minimum frequency.
      APRIORI​(double minfreq, int minlength, int maxlength)
      Constructor with minimum frequency.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected java.util.List<Itemset> aprioriGenerate​(java.util.List<? extends Itemset> supported, int length, int dim)
      Prunes a given set of candidates to keep only those BitSets where all subsets of bits flipping one bit are frequent already.
      private int binarySearch​(java.util.List<SparseItemset> candidates, SparseItemset scratch, int begin, int end)
      Binary-search for the next-larger element.
      protected java.util.List<OneItemset> buildFrequentOneItemsets​(elki.database.relation.Relation<? extends elki.data.SparseFeatureVector<?>> relation, int dim, int needed)
      Build the 1-itemsets.
      protected java.util.List<SparseItemset> buildFrequentTwoItemsets​(java.util.List<OneItemset> oneitems, elki.database.relation.Relation<elki.data.BitVector> relation, int dim, int needed, elki.database.ids.DBIDs ids, elki.database.ids.ArrayModifiableDBIDs survivors)
      Build the 2-itemsets.
      private java.lang.StringBuilder debugDumpCandidates​(java.lang.StringBuilder msg, java.util.List<? extends Itemset> candidates, elki.data.type.VectorFieldTypeInformation<elki.data.BitVector> meta)
      Debug method: output all itemsets.
      protected java.util.List<? extends Itemset> frequentItemsets​(java.util.List<? extends Itemset> candidates, elki.database.relation.Relation<elki.data.BitVector> relation, int needed, elki.database.ids.DBIDs ids, elki.database.ids.ArrayModifiableDBIDs survivors, int length)
      Returns the frequent BitSets out of the given BitSets with respect to the given database.
      protected java.util.List<SparseItemset> frequentItemsetsSparse​(java.util.List<SparseItemset> candidates, elki.database.relation.Relation<elki.data.BitVector> relation, int needed, elki.database.ids.DBIDs ids, elki.database.ids.ArrayModifiableDBIDs survivors, int length)
      Returns the frequent BitSets out of the given BitSets with respect to the given database.
      elki.data.type.TypeInformation[] getInputTypeRestriction()  
      private boolean initializeSearchItemset​(elki.data.BitVector bv, int[] scratchi, int[] iters)
      Initialize the scratch itemset.
      private boolean nextSearchItemset​(elki.data.BitVector bv, int[] scratchi, int[] iters)
      Advance scratch itemset to the next.
      FrequentItemsetsResult run​(elki.database.relation.Relation<elki.data.BitVector> relation)
      Performs the APRIORI algorithm on the given database.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • LOG

        private static final elki.logging.Logging LOG
        The logger for this class.
      • STAT

        private final java.lang.String STAT
        Statistics logging prefix.
    • Constructor Detail

      • APRIORI

        public APRIORI​(double minfreq,
                       int minlength,
                       int maxlength)
        Constructor with minimum frequency.
        Parameters:
        minfreq - Minimum frequency
        minlength - Minimum length
        maxlength - Maximum length
      • APRIORI

        public APRIORI​(double minfreq)
        Constructor with minimum frequency.
        Parameters:
        minfreq - Minimum frequency
    • Method Detail

      • getInputTypeRestriction

        public elki.data.type.TypeInformation[] getInputTypeRestriction()
      • run

        public FrequentItemsetsResult run​(elki.database.relation.Relation<elki.data.BitVector> relation)
        Performs the APRIORI algorithm on the given database.
        Parameters:
        relation - the Relation to process
        Returns:
        the AprioriResult learned by this APRIORI
      • buildFrequentOneItemsets

        protected java.util.List<OneItemset> buildFrequentOneItemsets​(elki.database.relation.Relation<? extends elki.data.SparseFeatureVector<?>> relation,
                                                                      int dim,
                                                                      int needed)
        Build the 1-itemsets.
        Parameters:
        relation - Data relation
        dim - Maximum dimensionality
        needed - Minimum support needed
        Returns:
        1-itemsets
      • buildFrequentTwoItemsets

        protected java.util.List<SparseItemset> buildFrequentTwoItemsets​(java.util.List<OneItemset> oneitems,
                                                                         elki.database.relation.Relation<elki.data.BitVector> relation,
                                                                         int dim,
                                                                         int needed,
                                                                         elki.database.ids.DBIDs ids,
                                                                         elki.database.ids.ArrayModifiableDBIDs survivors)
        Build the 2-itemsets.
        Parameters:
        oneitems - Frequent 1-itemsets
        relation - Data relation
        dim - Maximum dimensionality
        needed - Minimum support needed
        ids - Objects to process
        survivors - Output: objects that had at least two 1-frequent items.
        Returns:
        Frequent 2-itemsets
      • aprioriGenerate

        protected java.util.List<Itemset> aprioriGenerate​(java.util.List<? extends Itemset> supported,
                                                          int length,
                                                          int dim)
        Prunes a given set of candidates to keep only those BitSets where all subsets of bits flipping one bit are frequent already.
        Parameters:
        supported - Support map
        length - Itemset length
        dim - Dimensionality
        Returns:
        itemsets that cannot be pruned by apriori
      • frequentItemsets

        protected java.util.List<? extends Itemset> frequentItemsets​(java.util.List<? extends Itemset> candidates,
                                                                     elki.database.relation.Relation<elki.data.BitVector> relation,
                                                                     int needed,
                                                                     elki.database.ids.DBIDs ids,
                                                                     elki.database.ids.ArrayModifiableDBIDs survivors,
                                                                     int length)
        Returns the frequent BitSets out of the given BitSets with respect to the given database.
        Parameters:
        candidates - the candidates to be evaluated
        relation - the database to evaluate the candidates on
        needed - Minimum support needed
        ids - Objects to process
        survivors - Output: objects that had at least two 1-frequent items.
        length - Itemset length
        Returns:
        Itemsets with sufficient support
      • frequentItemsetsSparse

        protected java.util.List<SparseItemset> frequentItemsetsSparse​(java.util.List<SparseItemset> candidates,
                                                                       elki.database.relation.Relation<elki.data.BitVector> relation,
                                                                       int needed,
                                                                       elki.database.ids.DBIDs ids,
                                                                       elki.database.ids.ArrayModifiableDBIDs survivors,
                                                                       int length)
        Returns the frequent BitSets out of the given BitSets with respect to the given database. Optimized implementation for SparseItemset.
        Parameters:
        candidates - the candidates to be evaluated
        relation - the database to evaluate the candidates on
        needed - Minimum support needed
        ids - Objects to process
        survivors - Output: objects that had at least two 1-frequent items.
        length - Itemset length
        Returns:
        Itemsets with sufficient support
      • initializeSearchItemset

        private boolean initializeSearchItemset​(elki.data.BitVector bv,
                                                int[] scratchi,
                                                int[] iters)
        Initialize the scratch itemset.
        Parameters:
        bv - Bit vector data source
        scratchi - Scratch itemset
        iters - Iterator array
        Returns:
        true if the itemset had minimum length
      • nextSearchItemset

        private boolean nextSearchItemset​(elki.data.BitVector bv,
                                          int[] scratchi,
                                          int[] iters)
        Advance scratch itemset to the next.
        Parameters:
        bv - Bit vector data source
        scratchi - Scratch itemset
        iters - Iterator array
        Returns:
        true if the itemset had minimum length
      • binarySearch

        private int binarySearch​(java.util.List<SparseItemset> candidates,
                                 SparseItemset scratch,
                                 int begin,
                                 int end)
        Binary-search for the next-larger element.
        Parameters:
        candidates - Candidates to search for
        scratch - Scratch space
        begin - Search interval begin
        end - Search interval end
        Returns:
        Position of first equal-or-larger element
      • debugDumpCandidates

        private java.lang.StringBuilder debugDumpCandidates​(java.lang.StringBuilder msg,
                                                            java.util.List<? extends Itemset> candidates,
                                                            elki.data.type.VectorFieldTypeInformation<elki.data.BitVector> meta)
        Debug method: output all itemsets.
        Parameters:
        msg - Output buffer
        candidates - Itemsets to dump
        meta - Metadata for item labels
        Returns:
        Output buffer