Embedding function registry
You can get a supported embedding function from the registry, and then use it in your table schema. Once configured, the embedding function will automatically generate embeddings when you insert data into the table. And when you query the table, you can provide a query string or other input, and the embedding function will generate an embedding for it.Using an embedding function
The.create() method accepts several arguments to configure the embedding function’s behavior. max_retries is a special argument that applies to all providers.
| Argument | Type | Description |
|---|---|---|
name | str | The name of the model to use (e.g., text-embedding-3-small). |
max_retries | int | The maximum number of times to retry on a failed API request. Defaults to 7. |
| Argument | Type | Description |
|---|---|---|
batch_size | int | The number of inputs to process in a single batch. Provider-specific. |
api_key | str | The API key for the embedding provider. Can also be set via environment variables. |
device | str | The device to run the model on (e.g., “cpu”, “cuda”). Defaults to automatic detection. |
Embedding model providers
LanceDB supports most popular embedding providers.Text embeddings
| Provider | Model ID | Default Model |
|---|---|---|
| OpenAI | openai | text-embedding-ada-002 |
| Sentence Transformers | sentence-transformers | all-MiniLM-L6-v2 |
| Hugging Face | huggingface | colbert-ir/colbertv2.0 |
| Cohere | cohere | embed-english-v3.0 |
| … | … | … |
Multimodal embedding
| Provider | Model ID | Supported Inputs |
|---|---|---|
| OpenCLIP | open-clip | Text, Images |
| ImageBind | imagebind | Text, Images, Audio, Video |
| … | … | … |
Embedding function on LanceDB cloud
When using embedding functions on LanceDB cloud, during the ingestion time the embeddings are generated on the client side, and stored in the cloud. We don’t yet support model inference on the cloud side so automatic query generation during search is not supported. You can manually generate the embeddings for your queries using the same embedding function and pass the vector to the search function.Custom Embedding Functions
You can implement your own embedding function by inheriting fromTextEmbeddingFunction (for text)
or EmbeddingFunction (for multimodal data).