sie-lancedb package registers SIE as a first-class embedding function in LanceDB’s embeddings registry, so embeddings are computed automatically on insert and search. You need a running SIE instance - see the Superlinked quickstart for deployment options.
Installation
Registered functions
Importingsie_lancedb registers two embedding functions in LanceDB’s registry:
| Name | Purpose |
|---|---|
"sie" | Dense text embeddings |
"sie-multivector" | ColBERT-style late interaction with MaxSim scoring |
.create():
| Parameter | Type | Description |
|---|---|---|
model | str | Any of 85+ SIE-supported models (e.g. BAAI/bge-m3, NovaSearch/stella_en_400M_v5, jinaai/jina-colbert-v2) |
base_url | str | URL of the SIE endpoint (e.g. http://localhost:8080) |
Usage
Python
SourceField / VectorField declarations on the schema.
Hybrid search with reranker
SIEReranker plugs into LanceDB’s hybrid search pipeline. It uses SIE’s cross-encoder score() to rerank combined vector + full-text search results. You need a full-text search index on the column first:
Python
.rerank().
ColBERT / multivector
SIEMultiVectorEmbeddingFunction (registered as "sie-multivector") works with LanceDB’s native MultiVector type and MaxSim scoring for ColBERT and ColPali models:
Python
Entity extraction
SIEExtractor adds entity extraction to LanceDB’s data-enrichment workflows. Extract entities from a text column and merge the results back as a structured Arrow column - enabling filtered search on extracted entities:
Python
entities column stores structured Arrow data (list<struct<text, label, score, start, end, bbox>>), so you can filter on extracted entities in queries.