Quick Start

RaBitQ Quantizer

The RaBitQ Library offers simple interfaces, making it a drop-in replacement for scalar and binary quantization. The interface offers two underlying implementations of RaBitQ: one delivers optimal accuracy with longer quantization time, while the other provides near-optimal accuracy with significantly faster quantization.

The library provides advanced data formats for supporting efficient distance estimation. The details can be found in Quantizer.

Example Code in C++

#include <random>
#include <vector>

#include "quantization/rabitq.hpp"

int main() {
    size_t dim = 128;

    std::vector<float> vector(dim);
    static std::random_device rd;
    static std::mt19937 gen(rd());
    std::normal_distribution<float> dist(0, 1);

    // generate a random vector
    for (size_t i = 0; i < dim; ++i) {
        vector[i] = dist(gen);
    }

    size_t bits = 8;                  // num of bits for total code
    std::vector<uint32_t> code(dim);  // code
    float delta;                      // delta for scalar quantization
    float vl;                         // lower value for scalar quantization

    // scalar quantization
    rabitqlib::quant::quantize_scalar(vector.data(), dim, bits, code.data(), delta, vl);

    // faster version, must init a config struct first
    rabitqlib::quant::RabitqConfig config = rabitqlib::quant::faster_config(dim, bits);
    rabitqlib::quant::quantize_scalar(
        vector.data(), dim, bits, code.data(), delta, vl, config
    );

    // reconstruct  
    float * reconstructed_data = new float [padded_dim];
    rabitqlib::quant::reconstruct_vec(code, delta, vl, padded_dim, reconstructed_data);

    return 0;
}

RaBitQ + IVF

IVF is a classical clustering-based ANN index. IVF + RaBitQ consumes minimum memory across IVF, HNSW and QG. Powered by FastScan, it also achieves promising time-accuracy trade-off. To use RaBitQ + IVF, users need to cluster raw data vectors (e.g., using Kmeans), then to quantize each cluster and construct the IVF. The following is an example of using RaBitQ + IVF to search ANN on the deep1M dataset.

Dataset downloading and clustering

Use the following commands in the shell to download the deep1M dataset, generate clustering information, and store it on disk.

wget http://www.cse.cuhk.edu.hk/systems/hash/gqr/dataset/deep1M.tar.gz
tar -zxvf deep1M.tar.gz
python python/ivf.py deep1M/deep1M_base.fvecs 4096 deep1M/deep1M_centroids_4096.fvecs deep1M/deep1M_clusterids_4096.ivecs

Example Code in C++ for index construction

The following codes show how to load deep1M's vector data, centroids information, and cluster ids from disk, build the IVF + RaBitQ index, and finally save the index to disk.

#include <cstdint>
#include <iostream>

#include "defines.hpp"
#include "index/ivf/ivf.hpp"
#include "utils/io.hpp"
#include "utils/stopw.hpp"

using PID = rabitqlib::PID;
using index_type = rabitqlib::ivf::IVF;
using data_type = rabitqlib::RowMajorArray<float>;
using gt_type = rabitqlib::RowMajorArray<uint32_t>;

int main(int argc, char** argv) {
    if (argc < 6) {
        std::cerr << "Usage: " << argv[0] << " <arg1> <arg2> <arg3> <arg4> <arg5>\n"
                  << "arg1: path for data file, format .fvecs\n"
                  << "arg2: path for centroids file generated by ivf.py\n"
                  << "arg3: path for cluster ids file generated by ivf.py\n"
                  << "arg4: total number of bits for quantization\n"
                  << "arg5: path for saving index\n"
                  << "arg6: if use faster quantization (\"true\" or \"false\"), false by "
                     "default\n";
        exit(1);
    }

    bool faster_quant = false;
    if (argc > 6) {
        std::string faster_str(argv[6]);
        if (faster_str == "true") {
            faster_quant = true;
            std::cout << "Using faster quantize for indexing...\n";
        }
    }

    char* data_file = argv[1];
    char* centroids_file = argv[2];
    char* cids_file = argv[3];
    size_t total_bits = atoi(argv[4]);
    char* index_file = argv[5];

    data_type data;
    data_type centroids;
    gt_type cids;

    rabitqlib::load_vecs<float, data_type>(data_file, data);
    rabitqlib::load_vecs<float, data_type>(centroids_file, centroids);
    rabitqlib::load_vecs<PID, gt_type>(cids_file, cids);

    size_t num_points = data.rows();
    size_t dim = data.cols();
    size_t k = centroids.rows();

    std::cout << "data loaded\n";
    std::cout << "\tN: " << num_points << '\n';
    std::cout << "\tDIM: " << dim << '\n';

    rabitqlib::StopW stopw;
    index_type ivf(num_points, dim, k, total_bits);
    ivf.construct(data.data(), centroids.data(), cids.data(), faster_quant);
    float miniutes = stopw.get_elapsed_mili() / 1000 / 60;
    std::cout << "ivf constructed \n";
    ivf.save(index_file);

    std::cout << "Indexing time " << miniutes << '\n';

    return 0;
}

After compilation (suppose it is compiled to an executable named ivf_build), run the following command to build the IVF:

./ivf_build deep1M/deep1M_base.fvecs deep1M/deep1M_centroids_4096.fvecs deep1M/deep1M_clusterids_4096.ivecs 4 deep1M/deep1M_rabitqlib_ivf_4.index true

This will build an IVF that uses 4 (1+3) bits to quantize each vector for the deep1M dataset through RaBitQ.

Example Code in C++ for querying

After building the index, you can execute queries on it. The following codes show how to load ivf index and query from disk, execute queries, and compare the results to the groundtruth.

#include <iostream>
#include <vector>

#include "defines.hpp"
#include "index/ivf/ivf.hpp"
#include "utils/io.hpp"
#include "utils/stopw.hpp"
#include "utils/tools.hpp"

using PID = rabitqlib::PID;
using index_type = rabitqlib::ivf::IVF;
using data_type = rabitqlib::RowMajorArray<float>;
using gt_type = rabitqlib::RowMajorArray<uint32_t>;

static std::vector<size_t> get_nprobes(
    const index_type& ivf,
    const std::vector<size_t>& all_nprobes,
    data_type& query,
    gt_type& gt
);

static size_t topk = 10;
static size_t test_round = 5;

int main(int argc, char** argv) {
    if (argc < 4) {
        std::cerr << "Usage: " << argv[0] << " <arg1> <arg2> <arg3> <arg4>\n"
                  << "arg1: path for index \n"
                  << "arg2: path for query file, format .fvecs\n"
                  << "arg3: path for groundtruth file format .ivecs\n"
                  << "arg4: whether use high accuracy fastscan, (\"true\" or \"false\"), "
                     "true by default\n\n";
        exit(1);
    }

    char* index_file = argv[1];
    char* query_file = argv[2];
    char* gt_file = argv[3];
    bool use_hacc = true;

    if (argc > 4) {
        std::string hacc_str(argv[4]);
        if (hacc_str == "false") {
            use_hacc = false;
            std::cout << "Do not use Hacc FastScan\n";
        }
    }

    data_type query;
    gt_type gt;
    rabitqlib::load_vecs<float, data_type>(query_file, query);
    rabitqlib::load_vecs<uint32_t, gt_type>(gt_file, gt);
    size_t nq = query.rows();
    size_t total_count = nq * topk;

    index_type ivf;
    ivf.load(index_file);

    std::vector<size_t> all_nprobes;
    all_nprobes.push_back(5);
    for (size_t i = 10; i < 200; i += 10) {
        all_nprobes.push_back(i);
    }
    for (size_t i = 200; i < 400; i += 40) {
        all_nprobes.push_back(i);
    }
    for (size_t i = 400; i <= 1500; i += 100) {
        all_nprobes.push_back(i);
    }
    for (size_t i = 2000; i <= 4000; i += 500) {
        all_nprobes.push_back(i);
    }

    all_nprobes.push_back(6000);
    all_nprobes.push_back(10000);
    all_nprobes.push_back(15000);

    rabitqlib::StopW stopw;

    auto nprobes = get_nprobes(ivf, all_nprobes, query, gt);
    size_t length = nprobes.size();

    std::vector<std::vector<float>> all_qps(test_round, std::vector<float>(length));
    std::vector<std::vector<float>> all_recall(test_round, std::vector<float>(length));

    for (size_t r = 0; r < test_round; r++) {
        for (size_t l = 0; l < length; ++l) {
            size_t nprobe = nprobes[l];
            size_t total_correct = 0;
            float total_time = 0;
            std::vector<PID> results(topk);
            for (size_t i = 0; i < nq; i++) {
                stopw.reset();
                ivf.search(&query(i, 0), topk, nprobe, results.data(), use_hacc);
                total_time += stopw.get_elapsed_micro();
                for (size_t j = 0; j < topk; j++) {
                    for (size_t k = 0; k < topk; k++) {
                        if (gt(i, k) == results[j]) {
                            total_correct++;
                            break;
                        }
                    }
                }
            }
            float qps = static_cast<float>(nq) / (total_time / 1e6F);
            float recall =
                static_cast<float>(total_correct) / static_cast<float>(total_count);

            all_qps[r][l] = qps;
            all_recall[r][l] = recall;
        }
    }

    auto avg_qps = rabitqlib::horizontal_avg(all_qps);
    auto avg_recall = rabitqlib::horizontal_avg(all_recall);

    std::cout << "nprobe\tQPS\trecall" << '\n';

    for (size_t i = 0; i < length; ++i) {
        size_t nprobe = nprobes[i];
        float qps = avg_qps[i];
        float recall = avg_recall[i];

        std::cout << nprobe << '\t' << qps << '\t' << recall << '\n';
    }

    return 0;
}

static std::vector<size_t> get_nprobes(
    const index_type& ivf,
    const std::vector<size_t>& all_nprobes,
    data_type& query,
    gt_type& gt
) {
    size_t nq = query.rows();
    size_t total_count = topk * nq;
    float old_recall = 0;
    std::vector<size_t> nprobes;

    for (auto nprobe : all_nprobes) {
        nprobes.push_back(nprobe);

        size_t total_correct = 0;
        std::vector<PID> results(topk);
        for (size_t i = 0; i < nq; i++) {
            ivf.search(&query(i, 0), topk, nprobe, results.data());
            for (size_t j = 0; j < topk; j++) {
                for (size_t k = 0; k < topk; k++) {
                    if (gt(i, k) == results[j]) {
                        total_correct++;
                        break;
                    }
                }
            }
        }
        float recall = static_cast<float>(total_correct) / static_cast<float>(total_count);
        if (recall > 0.997 || recall - old_recall < 1e-5) {
            break;
        }
        old_recall = recall;
    }

    return nprobes;
}

To execute queries on deep1M, run the following command for the compiled codes (suppose that it is named ivf_query):

./ivf_query deep1M/deep1M_rabitqlib_ivf_4.index deep1M/deep1M_query.fvecs deep1M/deep1M_groundtruth.ivecs

RaBitQ + HNSW

HNSW is a popular graph-based index. HNSW + RaBitQ consumes the more memory than IVF + RaBitQ because it needs to store the edges of every vertex in a graph (e.g., 32 edges = 1,024 bits). In terms of the time-accuracy trade-off, HNSW + RaBitQ and IVF + RaBitQ perform differently across datasets—sometimes the former works better, and sometimes the latter does. RaBitQ + HNSW receives raw data vectors as inputs. It first conducts KMeans using a Python script. The centroid vectors will be used in the normalization of data vectors for improving accuracy.

Perform Clustering using Faiss

First, conduct Kmeans clustering on raw data vectors to get centroid vectors. We recommend 16 centroids(clusters). This will produce two files: centroids file and cluster ids file.

Example Code in C++ for index construction

Second, load raw data, centroids, and cluster ids files to build the index. Index file is then saved.

#include <cstdint>
#include <iostream>

#include "index/hnsw/hnsw.hpp"
#include "utils/io.hpp"
#include "utils/stopw.hpp"

using PID = rabitqlib::PID;
using index_type = rabitqlib::hnsw::HierarchicalNSW;
using data_type = rabitqlib::RowMajorArray<float>;
using gt_type = rabitqlib::RowMajorArray<uint32_t>;

int main(int argc, char* argv[]) {
    if (argc < 8) {
        std::cerr << "Usage: " << argv[0]
                  << " <arg1> <arg2> <arg3> <arg4> <arg5> <arg6> <arg7> <arg8>\n"
                  << "arg1: path for data file, format .fvecs\n"
                  << "arg2: path for centroids file, format .fvecs\n"
                  << "arg3: path for cluster ids file, format .ivecs\n"
                  << "arg4: m (degree bound) for hnsw\n"
                  << "arg5: ef for indexing \n"
                  << "arg6: total number of bits for quantization\n"
                  << "arg7: path for saving index\n"
                  << "arg8: metric type (\"l2\" or \"ip\")\n"
                  << "arg9: if use faster quantization (\"true\" or \"false\"), false by "
                     "default\n";
        exit(1);
    }

    char* data_file = argv[1];
    char* centroid_file = argv[2];
    char* cid_file = argv[3];
    size_t m = atoi(argv[4]);
    size_t ef = atoi(argv[5]);
    size_t total_bits = atoi(argv[6]);
    char* index_file = argv[7];

    rabitqlib::MetricType metric_type = rabitqlib::METRIC_L2;
    if (argc > 8) {
        std::string metric_str(argv[8]);
        if (metric_str == "ip" || metric_str == "IP") {
            metric_type = rabitqlib::METRIC_IP;
        }
    }
    if (metric_type == rabitqlib::METRIC_IP) {
        std::cout << "Metric Type: IP\n";
    } else if (metric_type == rabitqlib::METRIC_L2) {
        std::cout << "Metric Type: L2\n";
    }

    bool faster_quant = false;
    if (argc > 9) {
        std::string faster_str(argv[9]);
        if (faster_str == "true") {
            faster_quant = true;
            std::cout << "Using faster quantize for indexing...\n";
        }
    }

    data_type data;
    data_type centroids;
    gt_type cluster_id;

    rabitqlib::load_vecs<float, data_type>(data_file, data);
    rabitqlib::load_vecs<float, data_type>(centroid_file, centroids);
    rabitqlib::load_vecs<uint32_t, gt_type>(cid_file, cluster_id);

    size_t num_points = data.rows();
    size_t dim = data.cols();

    size_t random_seed = 100;  // by default 100
    auto* hnsw = new rabitqlib::hnsw::HierarchicalNSW(
        num_points, dim, total_bits, m, ef, random_seed, metric_type
    );

    rabitqlib::StopW stopw;
    stopw.reset();

    hnsw->construct(
        centroids.rows(),
        centroids.data(),
        num_points,
        data.data(),
        cluster_id.data(),
        0,
        faster_quant
    );

    float total_time = stopw.get_elapsed_micro();
    total_time /= 1e6;

    std::cout << "indexing time = " << total_time << "s" << '\n';
    hnsw->save(index_file);

    std::cout << "index saved..." << '\n';

    return 0;
}

Example Code in C++ for querying

Third, load index, query and groundtruth files to test ANN Search.

#include <iostream>
#include <vector>

#include "index/hnsw/hnsw.hpp"
#include "utils/io.hpp"
#include "utils/stopw.hpp"

std::vector<size_t> efs = {10,  20,  40,  50,  60,  80,  100, 150,  170,  190, 200,
                           250, 300, 400, 500, 600, 700, 800, 1000, 1500, 2000};

size_t test_round = 3;
size_t topk = 10;

using PID = rabitqlib::PID;
using index_type = rabitqlib::hnsw::HierarchicalNSW;
using data_type = rabitqlib::RowMajorArray<float>;
using gt_type = rabitqlib::RowMajorArray<uint32_t>;

int main(int argc, char* argv[]) {
    if (argc < 4) {
        std::cerr << "Usage: " << argv[0] << " <arg1> <arg2> <arg3> <arg4>\n"
                  << "arg1: path for index \n"
                  << "arg2: path for query file, format .fvecs\n"
                  << "arg3: path for groundtruth file format .ivecs\n"
                  << "arg4: metric type (\"l2\" or \"ip\")\n";
        exit(1);
    }

    char* index_file = argv[1];
    char* query_file = argv[2];
    char* gt_file = argv[3];

    data_type query;
    gt_type gt;
    rabitqlib::load_vecs<float, data_type>(query_file, query);
    rabitqlib::load_vecs<uint32_t, gt_type>(gt_file, gt);
    size_t nq = query.rows();
    size_t total_count = nq * topk;

    index_type hnsw;
    rabitqlib::MetricType metric_type = rabitqlib::METRIC_L2;
    if (argc > 4) {
        std::string metric_str(argv[4]);
        if (metric_str == "ip" || metric_str == "IP") {
            metric_type = rabitqlib::METRIC_IP;
        }
    }
    if (metric_type == rabitqlib::METRIC_IP) {
        std::cout << "Metric Type: IP\n";
    } else if (metric_type == rabitqlib::METRIC_L2) {
        std::cout << "Metric Type: L2\n";
    }

    hnsw.load(index_file, metric_type);

    rabitqlib::StopW stopw;

    auto nefs = efs;

    size_t length = nefs.size();

    std::vector<std::vector<float>> all_qps(test_round, std::vector<float>(length));
    std::vector<std::vector<float>> all_recall(test_round, std::vector<float>(length));

    std::cout << "search start >.....\n";

    for (size_t i_probe = 0; i_probe < length; ++i_probe) {
        for (size_t r = 0; r < test_round; r++) {
            size_t ef = nefs[i_probe];
            size_t total_correct = 0;
            float total_time = 0;

            auto start = std::chrono::high_resolution_clock::now();

            std::vector<std::vector<std::pair<float, PID>>> res =
                hnsw.search(query.data(), nq, topk, ef, 1);

            auto end = std::chrono::high_resolution_clock::now();

            float elapsed_us =
                std::chrono::duration<float, std::micro>(end - start).count();

            total_time += elapsed_us;

            for (size_t i = 0; i < nq; i++) {
                for (size_t j = 0; j < topk; j++) {
                    for (size_t k = 0; k < topk; k++) {
                        if (gt(i, k) == res[i][j].second) {
                            total_correct++;
                            break;
                        }
                    }
                }
            }

            float qps = static_cast<float>(nq) / ((total_time) / 1e6F);

            float recall =
                static_cast<float>(total_correct) / static_cast<float>(total_count);

            all_qps[r][i_probe] = qps;
            all_recall[r][i_probe] = recall;
        }
    }

    auto avg_qps = rabitqlib::horizontal_avg(all_qps);
    auto avg_recall = rabitqlib::horizontal_avg(all_recall);

    std::cout << "EF\tQPS\tRecall\t"

                 "\n";
    for (size_t i = 0; i < avg_qps.size(); ++i) {
        std::cout << efs[i] << '\t' << avg_qps[i] << '\t' << avg_recall[i] << '\t' << '\n';
    }
}

RaBitQ + QG (SymphonyQG)

QG is a graph-based index originated from the NGT library. Different from HNSW, it creates multiple quantization codes for every vector and carefully re-organizes their layout to minimize random memory accesses in querying. RaBitQ + QG in developped from our research project SymphonyQG. Unlike IVF + RaBitQ and HNSW + RaBitQ, which consumes less memory than the raw datasets, RaBitQ + QG consumes more memory to pursue the best time-accuracy trade-off.

Example Code in C++

Combine graph-based index with RaBitQ and FastScan.

#include <random>
#include <vector>

#include "index/symqg/qg.hpp"
#include "index/symqg/qg_builder.hpp"
#include "utils/stopw.hpp"

using PID = rabitqlib::PID;

int main() {
    size_t dim = 128;
    size_t N = 16000;

    std::vector<float> data(dim * N);

    static std::random_device rd;
    static std::mt19937 gen(rd());
    std::normal_distribution<float> dist(0, 1);

    // generate N random data vectors
    for (size_t i = 0; i < dim * N; i++) {
        data[i] = dist(gen);
    }

    size_t degree = 32;  // degree bound for graph
    size_t ef = 200;     // size of search window for indexing

    rabitqlib::symqg::QuantizedGraph<float> qg(N, dim, degree);  // init index

    rabitqlib::symqg::QGBuilder builder(qg, ef, data.data());  // builder of index

    rabitqlib::StopW stopw;
    stopw.reset();

    builder.build();  // construct index

    float total_time = stopw.get_elapsed_micro();
    total_time /= 1e6;

    std::cout << "indexing time = " << total_time << "s" << '\n';

    size_t nq = 10;
    std::vector<float> query(nq * dim);

    // take first nq data vectors as query vectors
    for (size_t i = 0; i < nq * dim; i++) {
        query[i] = data[i];
    }

    size_t topk = 10;
    size_t ef_search = 100;

    qg.set_ef(ef_search);
    std::vector<PID> results(topk);

    for (size_t qid = 0; qid < nq; qid++) {
        qg.search(query.data() + (dim * qid), topk, results.data());
        std::cout << "query " << qid << "'s " << topk << "NNs:" << '\n';
        for (size_t i = 0; i < topk; i++) {
            std::cout << "{ID: " << results[i] << "}  ";
        }
        std::cout << '\n';
    }

    return 0;
}