Class FPGrowth

  • All Implemented Interfaces:
    elki.Algorithm

    @Reference(authors="J. Han, J. Pei, Y. Yin",
               title="Mining frequent patterns without candidate generation",
               booktitle="Proc. ACM SIGMOD Int. Conf. Management of Data (SIGMOD 2000)",
               url="https://doi.org/10.1145/342009.335372",
               bibkey="DBLP:conf/sigmod/HanPY00")
    @Priority(200)
    public class FPGrowth
    extends AbstractFrequentItemsetAlgorithm
    FP-Growth is an algorithm for mining the frequent itemsets by using a compressed representation of the database called FPGrowth.FPTree.

    FP-Growth first sorts items by the overall frequency, since having high frequent items appear first in the tree leads to a much smaller tree since frequent subsets will likely share the same path in the tree. FP-Growth is beneficial when you have a lot of (near-) duplicate transactions, and are using a not too high support threshold, as it only prunes single items, not item combinations.

    This implementation is in-memory only, and has not yet been carefully optimized.

    The worst case memory use probably is \(O(\min(n\cdot l,i^l))\) where i is the number of items, l the average itemset length, and n the number of items. The worst case scenario is when every item is frequent, and every transaction is unique. The resulting tree will then be larger than the original data.

    Reference:

    J. Han, J. Pei, Y. Yin
    Mining frequent patterns without candidate generation
    In Proc. ACM SIGMOD Int. Conf. Management of Data (SIGMOD 2000)

    Since:
    0.7.0
    Author:
    Erich Schubert
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  FPGrowth.FPNode
      A single node of the FP tree.
      static class  FPGrowth.FPTree
      FP-Tree data structure
      static class  FPGrowth.Par
      Parameterization class.
      • Nested classes/interfaces inherited from interface elki.Algorithm

        elki.Algorithm.Utils
    • Constructor Summary

      Constructors 
      Constructor Description
      FPGrowth​(double minsupp, int minlength, int maxlength)
      Constructor.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private FPGrowth.FPTree buildFPTree​(elki.database.relation.Relation<elki.data.BitVector> relation, int[] iidx, int items)
      Build the actual FP-tree structure.
      private int[] buildIndex​(int[] counts, int[] positions, int minsupp)
      Build a forward map, item id (dimension) to frequency position
      private int[] countItemSupport​(elki.database.relation.Relation<elki.data.BitVector> relation, int dim)
      Count the support of each 1-item.
      elki.data.type.TypeInformation[] getInputTypeRestriction()  
      FrequentItemsetsResult run​(elki.database.relation.Relation<elki.data.BitVector> relation)
      Run the FP-Growth algorithm
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • LOG

        private static final elki.logging.Logging LOG
        Class logger.
      • STAT

        private static final java.lang.String STAT
        Prefix for statistics.
    • Constructor Detail

      • FPGrowth

        public FPGrowth​(double minsupp,
                        int minlength,
                        int maxlength)
        Constructor.
        Parameters:
        minsupp - Minimum support (relative or absolute)
        minlength - Minimum length
        maxlength - Maximum length
    • Method Detail

      • getInputTypeRestriction

        public elki.data.type.TypeInformation[] getInputTypeRestriction()
      • run

        public FrequentItemsetsResult run​(elki.database.relation.Relation<elki.data.BitVector> relation)
        Run the FP-Growth algorithm
        Parameters:
        relation - Bit vector relation
        Returns:
        Frequent patterns found
      • countItemSupport

        private int[] countItemSupport​(elki.database.relation.Relation<elki.data.BitVector> relation,
                                       int dim)
        Count the support of each 1-item.
        Parameters:
        relation - Data
        dim - Maximum dimensionality
        Returns:
        Item counts
      • buildFPTree

        private FPGrowth.FPTree buildFPTree​(elki.database.relation.Relation<elki.data.BitVector> relation,
                                            int[] iidx,
                                            int items)
        Build the actual FP-tree structure.
        Parameters:
        relation - Data
        iidx - Inverse index (dimension to item rank)
        items - Number of items
        Returns:
        FP-tree
      • buildIndex

        private int[] buildIndex​(int[] counts,
                                 int[] positions,
                                 int minsupp)
        Build a forward map, item id (dimension) to frequency position
        Parameters:
        counts - Item counts
        positions - Position index (output)
        minsupp - Minimum support
        Returns:
        Forward index