Quantization compresses high-dimensional float vectors into a smaller, approximate representation, where instead of storing every vector as a float32 or float64, it’s stored in compressed form, without too much of a compromise in search quality. Use quantization when:Documentation Index
Fetch the complete documentation index at: https://docs.lancedb.com/llms.txt
Use this file to discover all available pages before exploring further.
- You have a large dataset with relatively high-dimensional vectors (512, 768, 1024+)
- Index build time and query latency matter
IVF_PQ— Inverted File index with Product Quantization (default). See the vector indexing guide forIVF_PQexamples.IVF_RQ— Inverted File index with RaBitQ quantization (binary, 1 bit per dimension). Requires vector dimensions divisible by8. See below for details.IVF_HNSW_SQ— IVF partitions with an HNSW graph per partition plus Scalar Quantization. Strong recall/latency/size trade-off for most workloads.IVF_HNSW_PQ— IVF partitions with an HNSW graph per partition plus Product Quantization. Prefer when PQ-level compression matters and you still want HNSW-style in-partition search.
IVF_* vs. IVF_HNSW_*), and which quantizer compresses the vectors (PQ, RQ, or SQ). IVF_PQ is the default and works well in many cases. For more drastic compression, RaBitQ (IVF_RQ) is a reasonable option. For higher recall at low latency, the HNSW-backed variants are usually the right pick. The “Choose the Right Index” table on the vector indexing page is the canonical decision tool.
RaBitQ quantization
RaBitQ is a binary quantization method that represents each normalized embedding using 1 bit per dimension, plus a couple of small corrective scalars. In practice, a 1,024-dimensionalfloat32 vector that would normally take 4 KB can be compressed to roughly a few hundred bytes with RaBitQ, while still maintaining reasonable recall.
How RaBitQ works
- Embeddings are grouped around centroids (as in other IVF indexes).
- Each residual vector is normalized and mapped to the nearest vertex of a randomly rotated hypercube on the unit sphere.
- The sign pattern of that vector is stored as bits (1 bit per dimension).
- Two small corrective factors are stored:
- The distance from the original vector to its centroid
- The dot product between the normalized vector and its quantized version
IVF_PQ, RaBitQ:
- Avoids training expensive PQ codebooks
- Builds indexes faster and handles updates more easily
- Maintains or improves recall at high dimensionality under the same storage budget
Using RaBitQ
You can create an RaBitQ-backed vector index by settingindex_type="IVF_RQ" when calling create_index.
When using
IVF_RQ, vector dimensions must be divisible by 8.num_bits controls how many bits per dimension are used:
API Reference
1 bit is the classic RaBitQ setting, but you could (at higher computational cost) set it to 2, 4 or 8 bits if you want to improve the fidelity for better precision or recall. It’s also possible to tune the number of IVF partitions inIVF_RQ, similar to how you would do in IVF_PQ.
The full list of parameters to the algorithm are listed below.
distance_type: Literal[“l2”, “cosine”, “dot”], defaults to “l2”
The distance metric to use for similarity comparison. Choose “l2” for Euclidean, “cosine” for cosine similarity, or “dot” for dot product.num_partitions: Optional[int], defaults to None
Number of IVF partitions (affects index build time and query accuracy). More partitions can improve recall but may increase build time.num_bits: int, defaults to 1
Bits per dimension for quantization (1 is standard RaBitQ). Higher values improve fidelity at the cost of more storage and computation.max_iterations: int, defaults to 50
Maximum number of iterations for training the quantizer. Increase for larger datasets or to improve quantization quality.sample_rate: int, defaults to 256
Number of samples per partition during training. Higher values may improve accuracy but increase training time.target_partition_size: Optional[int], defaults to None
Target number of vectors per partition. Adjust to control partition granularity and memory usage.