Class MinHashingTransformer

  • All Implemented Interfaces:
    ISingleAttributeTransformer

    public class MinHashingTransformer
    extends java.lang.Object
    implements ISingleAttributeTransformer
    Converts the sets of multi-value features to short signatures. At first the feature value is transformed into a binaryzation, i.e. a 0/1 Vector, and the MinHashing applied on this vectors afterwards. If two multi-value feature sets are very similar with respect to the Jaccard-Similarity, then the two signatures will be similar as well with a high probability depending on the desired length of the signatures.
    For a signature of length n, the same amount of permutations will be created and the n-th element of the signature is determined by the index where the n-th permutation finds the finds the first 1 in the 0/1 Vector.
    • Constructor Summary

      Constructors 
      Constructor Description
      MinHashingTransformer​(int[][] permutations)
      Constructor where the user gives predefined permutations.
      MinHashingTransformer​(int domainSize, int signatureLength, long seed)
      Constructor where suitable permutations are created randomly.
    • Constructor Detail

      • MinHashingTransformer

        public MinHashingTransformer​(int[][] permutations)
        Constructor where the user gives predefined permutations.
        Parameters:
        permutations - Predefined permutations. The amount of permutations defines the length of the signature the MinHashing creates and each permutation has to have the length of the domain size.
      • MinHashingTransformer

        public MinHashingTransformer​(int domainSize,
                                     int signatureLength,
                                     long seed)
        Constructor where suitable permutations are created randomly.
        Parameters:
        domainSize -
        signatureLength -