LanceDB’s multivector support enables you to store and search multiple vector embeddings for a single item. This capability is particularly valuable when working with late-interaction models like ColBERT and ColPaLi, which generate multiple embeddings per document. In this tutorial, you’ll create a table with multiple vector embeddings per document and learn how to perform multivector search. For more end-to-end examples, see the VectorDB recipes repository.Documentation Index
Fetch the complete documentation index at: https://docs.lancedb.com/llms.txt
Use this file to discover all available pages before exploring further.
Multivector Support
Each item in your dataset can have a column containing multiple vectors, which LanceDB can efficiently index and search. When performing a search, you can query with either a single vector embedding or multiple vector embeddings.Computing Similarity
MaxSim (Maximum Similarity) is a key concept in late-interaction models that:- Computes the maximum similarity between each query embedding and all document embeddings
- Sums these maximum similarities to get the final relevance score
- Effectively captures fine-grained semantic matches between query and document tokens
Using Multivector Search
1. Setup
Connect to LanceDB and import the required libraries.2. Define Schema
Define a schema that specifies a multivector field. A multivector field is a nested list structure in which each document contains multiple vectors. In this case, we’ll create a schema with:- An ID field as an integer (int64)
- A vector field that is a list of lists of float32 values
- The outer list represents multiple vectors per document
- Each inner list is a 256-dimensional vector
- Using float32 for memory efficiency while maintaining precision
3. Generate Multivectors
Generate sample data where each document contains multiple vector embeddings, which can represent different aspects or views of the same document. In this example, we create 1024 documents where each document has 2 random vectors of dimension 256, simulating a real-world scenario where you might have multiple embeddings per item.4. Create a Table
Create a table with the defined schema and sample data, which will store multiple vectors per document for similarity search.5. Build an Index
Only cosine similarity is supported for multivector search operations. For faster search, build the standardIVF_PQ index over your vectors: