Class CompactLabelToOrdinal

java.lang.Object
org.apache.lucene.facet.taxonomy.writercache.LabelToOrdinal
org.apache.lucene.facet.taxonomy.writercache.CompactLabelToOrdinal

public class CompactLabelToOrdinal extends LabelToOrdinal
This is a very efficient LabelToOrdinal implementation that uses a CharBlockArray to store all labels and a configurable number of HashArrays to reference the labels.

Since the HashArrays don't handle collisions, a CollisionMap is used to store the colliding labels.

This data structure grows by adding a new HashArray whenever the number of collisions in the CollisionMap exceeds loadFactor * LabelToOrdinal.getMaxOrdinal(). Growing also includes reinserting all colliding labels into the HashArrays to possibly reduce the number of collisions. For setting the loadFactor see CompactLabelToOrdinal(int, float, int).

This data structure has a much lower memory footprint (~30%) compared to a Java HashMap<String, Integer>. It also only uses a small fraction of objects a HashMap would use, thus limiting the GC overhead. Ingestion speed was also ~50% faster compared to a HashMap for 3M unique labels.

  • Field Details

    • DefaultLoadFactor

      public static final float DefaultLoadFactor
      Default maximum load factor.
      See Also:
  • Constructor Details

    • CompactLabelToOrdinal

      public CompactLabelToOrdinal(int initialCapacity, float loadFactor, int numHashArrays)
      Sole constructor.
  • Method Details