Skip to content

Compact Storage of Codes

This section describes the compact storage of codes. RaBitQLib supports to quantize codes with different bit widths, i.e., 1, 2, 3, 4, 5, 6, 7 and 8. These bit widths except 8 are unaligned with byte alignment. Thus, we need to design a specialized compact storage format for the code vector for each bit width. We pad the dimensionality to a multiple of 64 for the ease of alignment. The implementation can be found in rabitqlib/quantization/pack_excode.hpp.

Example

#include <rabitqlib/quantization/pack_excode.hpp>
#include <stdint.h>
#include <random>

int main(){
    size_t dim = 768;
    size_t bits = 4;

    std::vector<uint8_t> code(dim);
    // Generate random 4-bit values (0-15) for each dimension
    for (size_t i = 0; i < dim; ++i) {
        code[i] = rand() % 16;  // 4-bit values range from 0 to 15
    }

    std::vector<uint8_t> compact_code(dim * bits / 8);

    rabitqlib::quant::rabitq_impl::ex_bits::packing_rabitqplus_code(
        code.data(), compact_code.data(), dim, bits
    );
}

The following shows the compact storage format for every 64 dimensions.

1-bit

The code sequentially stores the binary value for each of the 64 dimensions.

2-bit

2-bit Storage

The code is stored in a byte array of length 16. Each row in the figure represents a byte. The 0-th byte stores the 2-bit codes of the 0-th, 16-th, 32-th and 48-th dimensions. The 1-th byte stores the 2-bit codes of the 1-th, 17-th, 33-th and 49-th dimensions, so on and so forth. This storage allows efficient unpacking with SIMD, i.e., shifting and masking with SSE.

3-bit = 2-bit + 1-bit

4-bit

4-bit Storage

The code is stored in a byte array of length 32. The 0-th byte stores the 4-bit codes of the 0-th and 16-th dimensions. The 1-th byte stores the 4-bit codes of the 1-th and 17-th dimensions, so on and so forth. This storage allows efficient unpacking with SIMD, i.e., shifting and masking with SSE.

5-bit = 4-bit + 1-bit

6-bit

6-bit Storage

The code is stored in a byte array of length 48.

  • The first 16 bytes store the 6-bit codes of the 0-th to the 15-th dimensions and the upper 2-bit codes of the 32-th to 47-th dimensions.
  • The second 16 bytes store the 6-bit codes of the 16-th to the 31-th dimensions and the upper 2-bit codes of the 48-th to 63-th dimensions.
  • The third 16 bytes store the lower 4-bit codes of the 32-th to 47-th dimensions and the lower 4-bit codes of the 48-th to 63-th dimensions.

This storage allows efficient unpacking with SIMD, i.e., shifting and masking with SSE.

7-bit = 6-bit + 1-bit

8-bit

The code of 8-bit is aligned with byte arrays and needs no specialized design.