What this tutorial shows
If you are using NVIDIA RAG Blueprints and want to evaluate LanceDB in that stack, this tutorial gives you a concrete starting point. It shows how to use LanceDB as the retrieval layer for a Docker-based NVIDIA RAG deployment with a small, script-driven reference integration where LanceDB OSS is embedded directly in the NVIDIA containers, the collection is prepared ahead of time, and the RAG server retrieves from it for search and generation. The example is intentionally retrieval-only, but it also includes hybrid search and reranker selection so you can see how LanceDB fits into a realistic NVIDIA retrieval workflow.How NVIDIA organizes vector databases
NVIDIA’s RAG Blueprint documentation effectively describes three different patterns for vector database support.- There are built-in backends such as Milvus, where NVIDIA already owns both ingestion and retrieval.
- There are built-in alternatives such as Elasticsearch, where NVIDIA still owns the end-to-end flow but switches the backend through configuration.
- Then, there is the custom vector database path, where you implement a
VDBRagbackend yourself and register it in NVIDIA’s factory.
Deployment model
This reference integration uses LanceDB OSS as an embedded retrieval library, not as a separate database service. In practice,APP_VECTORSTORE_NAME is set to lancedb, APP_VECTORSTORE_URL
points to a local filesystem path inside the NVIDIA containers, the LanceDB collection is prepared
ahead of time, and the NVIDIA RAG server loads the LanceDB adapter to retrieve directly from that
local dataset.
What the recipe contains
The recipe atexamples/nvidia-rag-blueprint-lancedb
is organized around a small number of practical pieces. The data-prep script builds a demo
LanceDB collection from scratch, generates embeddings through the LanceDB embedding registry, and
creates a full-text index so hybrid retrieval works immediately. The adapter file shows the
retrieval-only integration point for NVIDIA RAG Blueprint, while the Docker override and NVIDIA
change guide show the minimal configuration and source changes needed to run the example against
NVIDIA’s containers.
End-to-end flow
1. Prepare the LanceDB collection
From the recipe directory:- a local LanceDB dataset under
data/ - a collection named
nvidia_blueprint_demo - automatic embeddings generated at ingest time
- an FTS index for hybrid search
2. Patch the NVIDIA blueprint
Follow the instructions in the recipe’snvidia_blueprint_changes.md.
The essential changes are:
- add LanceDB dependencies to the NVIDIA environment
- copy
lancedb_vdb.pyinto the NVIDIA source tree - register the
lancedbbranch in NVIDIA’s VDB factory
3. Start the Docker deployment
Set the absolute path to the recipe directory:APP_VECTORSTORE_NAME=lancedbAPP_VECTORSTORE_URL=/opt/lancedb-recipe/dataCOLLECTION_NAME=nvidia_blueprint_demoAPP_VECTORSTORE_SEARCHTYPE=hybridLANCEDB_RERANKER=mrr
Verifying the integration
Search
Generate
Hybrid retrieval and rerankers
This example is meant to prove more than a trivial vector lookup.- LanceDB hybrid retrieval combines vector search with full-text search
- the recipe creates the FTS index as part of dataset prep
- the adapter supports
RRFReranker,MRRReranker, andCrossEncoderReranker - the default example uses
MRRReranker, not a plain weighted linear combination
How this can be extended
The current example follows NVIDIA’s custom retrieval-only backend path. In practice, that means the LanceDB collection is created ahead of time and NVIDIA RAG Blueprint is then pointed at that existing collection for search and generation. The sample data inprepare_lancedb.py exists
only to make that flow runnable end to end: it creates a small local collection, inserts a few
documents, generates embeddings, and builds an FTS index so the NVIDIA side has something real to
query.
A fuller integration is possible. NVIDIA’s custom VDBRag interface also supports the pattern used
by built-in backends such as Milvus and Elasticsearch, where NVIDIA owns both ingestion and
retrieval. To make LanceDB work that way, a complete LanceDB backend would need to implement the
ingestion methods NVIDIA documents, especially create_collection and write_to_index, along with
the retrieval and collection-management methods expected by the rest of the stack.
The open work is in defining how NVIDIA’s ingestor should write
records into LanceDB, how that storage is shared between the ingestor and the RAG server, and how
document and collection metadata should be exposed so the broader NVIDIA APIs behave correctly.
Until those pieces exist, this example should be read as: prepare LanceDB first, then let NVIDIA
retrieve from it.