Class GraphIndexBuilder<T>

java.lang.Object
io.github.jbellis.jvector.graph.GraphIndexBuilder<T>
Type Parameters:
T - the type of vector

public class GraphIndexBuilder<T> extends Object
Builder for Concurrent GraphIndex. See GraphIndex for a high level overview, and the comments to `addGraphNode` for details on the concurrent building approach.
  • Constructor Details

    • GraphIndexBuilder

      public GraphIndexBuilder(RandomAccessVectorValues<T> vectorValues, VectorEncoding vectorEncoding, VectorSimilarityFunction similarityFunction, int M, int beamWidth, float neighborOverflow, float alpha)
      Reads all the vectors from vector values, builds a graph connecting them by their dense ordinals, using the given hyperparameter settings, and returns the resulting graph.
      Parameters:
      vectorValues - the vectors whose relations are represented by the graph - must provide a different view over those vectors than the one used to add via addGraphNode.
      M - – the maximum number of connections a node can have
      beamWidth - the size of the beam search to use when finding nearest neighbors.
      neighborOverflow - the ratio of extra neighbors to allow temporarily when inserting a node. larger values will build more efficiently, but use more memory.
      alpha - how aggressive pruning diverse neighbors should be. Set alpha > 1.0 to allow longer edges. If alpha = 1.0 then the equivalent of the lowest level of an HNSW graph will be created, which is usually not what you want.
  • Method Details

    • build

      public OnHeapGraphIndex<T> build()
    • complete

      public void complete()
    • getGraph

      public OnHeapGraphIndex<T> getGraph()
    • insertsInProgress

      public int insertsInProgress()
      Number of inserts in progress, across all threads.
    • addGraphNode

      public long addGraphNode(int node, RandomAccessVectorValues<T> vectors)
      Inserts a node with the given vector value to the graph.

      To allow correctness under concurrency, we track in-progress updates in a ConcurrentSkipListSet. After adding ourselves, we take a snapshot of this set, and consider all other in-progress updates as neighbor candidates.

      Parameters:
      node - the node ID to add
      vectors - the set of vectors
      Returns:
      an estimate of the number of extra bytes used by the graph after adding the given node
    • scoreBetween

      protected float scoreBetween(T v1, T v2)