We support lightning fast vector search on massive scale data. Following performance data shows search latency from a 1M dataset with warmed up cache.

PercentileLatency
P5025ms
P9026ms
P9935ms
Max49ms

Other than latency, users can also tune the following parameters for better search quality.

  • nprobes: the number of partitions to search (probe)
  • refine factor: a multiplier to control how many additional rows are taken during the refine step
  • distance range: search for vectors within the distance range

Metadata filtering combined with the vector search is also supported, with as low as 65ms query latency on a 15M dataset. Our benchmark tests have more details.

import lancedb
from datasets import load_dataset

# Connect to LanceDB
db = lancedb.connect(
  uri="db://your-project-slug",
  api_key="your-api-key",
  region="us-east-1"
)

# Load query vector from dataset
query_dataset = load_dataset("sunhaozhepy/ag_news_sbert_keywords_embeddings", split="test[5000:5001]")
print(f"Query keywords: {query_dataset[0]['keywords']}")
query_embed = query_dataset["keywords_embeddings"][0]

# Open table and perform search
table_name = "lancedb-cloud-quickstart"
table = db.open_table(table_name)

# Vector search with filters (pre-filtering is the default)
search_results = (
    table.search(query_embed)
    .where("label > 2")
    .select(["text", "keywords", "label"])
    .limit(5)
    .to_pandas()
)

print("Search results (with pre-filtering):")
print(search_results)

By default, pre-filtering is performed to filter prior to vector search. This can be useful to narrow down the search space of a very large dataset to reduce query latency. Post-filtering is also an option that performs the filter on the results returned by the vector search. You can use post-filtering as follows:

results_post_filtered = (
   table.search(query_embed)
   .where("label > 1", prefilter=False)
   .select(["text", "keywords", "label"])
   .limit(5)
   .to_pandas()
)

print("Vector search results with post-filter:")
print(results_post_filtered)

need to add order by distance by pass index fast_search