HNSW + RaBitQ
HNSW is a popular graph-based index. HNSW + RaBitQ consumes the more memory than IVF + RaBitQ because it needs to store the edges of every vertex in a graph (e.g., 32 edges = 1,024 bits). In terms of the time-accuracy trade-off, HNSW + RaBitQ and IVF + RaBitQ perform differently across datasets—sometimes the former works better, and sometimes the latter does. This document describes how the library integrates HNSW with RaBitQ to support efficient vector search.
Index Construction
We build the HNSW graph by incrementally inserting new elements following the standard HNSW routine. Currently, we support building the index using raw data vectors and storing the corresponding quantization codes.
Users can invoke:
HierarchicalNSW::construct(size_t cluster_num,
const float* centroids,
size_t data_num,
const float* data,
PID* cluster_ids,
size_t num_threads = 0,
bool faster = false);
- data: Pointer to the raw data vectors.
- data_num: The number of data vectors.
- centroids: Centroids computed by K-means clustering on the raw data vectors (we recommend
cluster_num = 16
). - cluster_ids: Array of length
data_num
where each entry indicates the centroid ID (0–15) for the corresponding data vector. - num_threads: Number of threads to use (default: 0, which auto-selects).
- faster: If
true
, enales fast quantizer.
During construction, we first rotate the centroids and then insert each element one by one. For each element:
- Update the graph structure (edges) by searching with raw vectors and pruning.
- Quantize the rotated vector and store its quantization code.
Data Layout
Each indexed element is stored in the following layout:
[number of edges]
[edges]
[cluster ID]
[external label]
[BinData (1-bit * dim + factors)]
[ExData (ex-bits * dim + factors)]
Querying
Users can invoke:
std::vector<std::vector<std::pair<float, PID>>> HierarchicalNSW::search(const float* queries,
size_t query_num,
size_t TOPK,
size_t efSearch,
size_t thread_num);
- queries: Pointer to the raw query vectors.
- query_num: The number of query vectors.
- TOPK: The number of nearest neighbors to search.
- efSearch: The size of the candidate set for searching HNSW base layer.
- thread_num: Number of threads to use. Each query is processed by one thread.
We first pre-process the query:
- Rotate the raw query vector.
- Compute distances between the rotated query and all rotated centroids.
- Encapsulate the query into a
query_wrapper
for subsequent search.
Upper Layers
In the upper layers of HNSW, we compute the 1-bit estimated distance (using BinData
) to quickly locate the entry point for the next layer.
Base Layer
In the base layer, we apply an adaptive re-ranking strategy:
- candidate_set: Elements to be visited.
- boundedKNN: Current best TOPK candidates.
Repeat until candidate_set
is empty:
- Extract the nearest element
e
fromcandidate_set
. - Visit all unvisited neighbors of
e
. - For each neighbor:
- Compute the 1-bit lower-bound distance (along with 1-bit estimated distance) using
BinData
. - If
boundedKNN
has fewer than TOPK elements, or if the 1-bit lower-bound is smaller than the full-bits estimated distance of the current farthest element inboundedKNN
:- Refine the distance estimate using
ExData
to obtain the full-bits estimated distance. - Update
boundedKNN
.
- Refine the distance estimate using
- Insert the neighbor into
candidate_set
with its (possibly refined) estimated distance.
The search terminates when candidate_set
is empty.