Class SetDigest

java.lang.Object
io.trino.type.setdigest.SetDigest

public class SetDigest extends Object
For the MinHash algorithm, see "On the resemblance and containment of documents" by Andrei Z. Broder, and the Wikipedia page: http://en.wikipedia.org/wiki/MinHash#Variant_with_a_single_hash_function
  • Field Details

  • Constructor Details

    • SetDigest

      public SetDigest()
    • SetDigest

      public SetDigest(int maxHashes, int numHllBuckets)
    • SetDigest

      public SetDigest(int maxHashes, io.airlift.stats.cardinality.HyperLogLog hll, it.unimi.dsi.fastutil.longs.Long2ShortSortedMap minhash)
  • Method Details

    • newInstance

      public static SetDigest newInstance(io.airlift.slice.Slice serialized)
    • serialize

      public io.airlift.slice.Slice serialize()
    • getHll

      public io.airlift.stats.cardinality.HyperLogLog getHll()
    • estimatedInMemorySize

      public int estimatedInMemorySize()
    • estimatedSerializedSize

      public int estimatedSerializedSize()
    • isExact

      public boolean isExact()
    • cardinality

      public long cardinality()
    • exactIntersectionCardinality

      public static long exactIntersectionCardinality(SetDigest a, SetDigest b)
    • jaccardIndex

      public static double jaccardIndex(SetDigest a, SetDigest b)
    • add

      public void add(long value)
    • add

      public void add(io.airlift.slice.Slice value)
    • mergeWith

      public void mergeWith(SetDigest other)
    • getHashCounts

      public Map<Long,Short> getHashCounts()