Example: Hybrid Search
1. Setup
Import the necessary libraries and dependencies for working with LanceDB, OpenAI embeddings, and reranking.2. Connect to LanceDB Cloud
Establish a connection to your LanceDB instance, with different options for Cloud, Enterprise, and Open Source deployments.3. Configure Embedding Model
Set up the any embedding model that will convert text into vector representations for semantic search.4. Create Table & Schema
Define the data structure for your documents, including both the text content and its vector representation.5. Add Data
Insert sample documents into your table, which will be used for both semantic and keyword search.6. Build Full Text Index
Create a full-text search index on the text column to enable keyword-based search capabilities.7. Set Reranker [Optional]
Initialize the reranker that will combine and rank results from both semantic and keyword search. By default, lancedb uses RRF reranker, but you can choose other rerankers likeCohere, CrossEncoder, or others lister in integrations section.
8. Hybrid Search
Perform a hybrid search query that combines semantic similarity with keyword matching, using the specified reranker to merge and rank the results.9. Hybrid Search - Explicit Vector and Text Query pattern
You can also pass the vector and text query explicitly. This is useful if you’re not using the embedding API or if you’re using a separate embedder service.More on Reranking
You can perform hybrid search in LanceDB by combining the results of semantic and full-text search via a reranking algorithm of your choice. LanceDB comes with built-in rerankers and you can implement your own custom reranker as well. By default, LanceDB usesRRFReranker(), which uses reciprocal rank fusion score, to combine and rerank the results of semantic and full-text search. You can customize the hyperparameters as needed or write your own custom reranker. Here’s how you can use any of the available rerankers:
| Argument | Type | Default | Description |
|---|---|---|---|
normalize | str | "score" | The method to normalize the scores. Can be rank or score. If rank, the scores are converted to ranks and then normalized. If score, the scores are normalized directly. |
reranker | Reranker | RRF() | The reranker to use. If not specified, the default reranker is used. |