Step 1: Import Required Libraries
First, import the necessary LanceDB components:lancedb: The main database connection and operationsLanceModel: Pydantic model for defining table schemasVector: Field type for storing vector embeddingsget_registry(): Access to the embedding function registry. It has all the supported as well custom embedding functions registered by the user
Step 2: Connect to LanceDB Cloud
Establish a connection to your LanceDB instance:Step 3: Initialize the Embedding Function
Choose and configure your embedding model:- Change
"sentence-transformers"to other providers like"openai","cohere", etc. - Modify the model name for different embedding models
- Set
device="cuda"for GPU acceleration if available
Step 4: Define Your Schema
Create a Pydantic model that defines your table structure:SourceField(): This field will be embeddedVectorField(): This stores the embeddingsmodel.ndims(): Sets vector dimensions for your model
Step 5: Create Table and Ingest Data
Create a table with your schema and add data:table.add() call automatically:
- Takes the text from each document
- Generates embeddings using your chosen model
- Stores both the original text and the vector embeddings
Step 6: Query with Automatic Embedding
Note: On LanceDB cloud, automatic query embedding is not supported. You need to pass the embedding vector directly. Search your data using natural language queries:- Automatically converts your query text to embeddings
- Finds the most similar vectors in your table
- Returns the matching documents