Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.lancedb.com/llms.txt

Use this file to discover all available pages before exploring further.

LanceDB is performant by default. This page covers performance best practices that matter when you want to ensure you get the right performance for a specific workload. Use the table below to jump to the area relevant to what you’re working on.
When you’re working on…Read
Loading data into a tableIngestion
Running filtered or vector queries at scaleIndexing
Iterating over large result sets (training, export, migration)Querying
Keeping a long-lived dataset healthyMaintenance
Inspecting query plansDiagnostics
When using Python with multiprocessing, use spawn rather than fork. LanceDB is multi-threaded internally, and fork plus a multi-threaded process is unsafe.

Ingestion

If ingestion is taking longer than expected on a large dataset, the cause is almost always how add() is called: each call commits a new version and a new fragment, so a per-row loop pays that per-call overhead at every row. The best practice is to pick the ingestion mode that matches your data shape — bulk ingestion when the data is already materialized, or iterator ingestion when it’s streamed or computed on the fly.
Why merge_insert() is significantly slower than add()A merge has to scan existing data to find matches on the join key (or look them up via a scalar index, if one exists), and then delete-and-reinsert any updated rows in a single transaction; add() simply appends new fragments.Use add() for pure appends, and reach for merge_insert() only when you need upsert or conditional-insert logic. When you do use it, build a scalar index on the join column first — otherwise the match step falls back to a full column scan, which is the dominant cost at scale.

Bulk ingestion: for data you already have

For materialized inputs (Arrow Tables, DataFrames, pyarrow.dataset(...)), LanceDB auto-parallelizes the write across workers.
Python
table.add(arrow_table)                              # in-memory
table.add(df)                                       # pandas
table.add(ds.dataset("data/", format="parquet"))    # streams from disk, still parallelized
pa.dataset(...) is the right choice for large file-based loads: it streams Parquet/CSV without loading into memory, and still preserves auto-parallelism. For very large initial loads, create the table empty first; passing data directly to create_table(name, data) skips the auto-parallel path.

Iterator ingestion: for streaming or computed-on-the-fly data

If your dataset doesn’t fit in memory all at once (for example, when rows are computed on the fly (generating embeddings as you ingest), or pulled from a streaming source, materializing the full table before calling add() will cause you to run out of memory. The best practice is to pass an iterator of pyarrow.RecordBatch instead. LanceDB consumes one batch at a time, so peak memory stays bounded by the batch size you yield.
Python
def stream():
    for raw in source:
        vectors = model.encode(raw["text"])
        yield pa.RecordBatch.from_pydict({**raw, "vector": vectors})

table.add(stream())
Streaming inputs don’t auto-parallelize, so use decently large chunks of several thousand rows or more, rather than yielding single-row batches.

Indexing

Vector indexes

If vector search latency climbs with table size (i.e., queries that ran in milliseconds on a small table take seconds as it grows to millions of rows), the cause is the default brute-force scan over every vector. That works fine below ~100K vectors, but past that you should build a dedicated vector index. Pick the type by your data shape:
IndexWhen to use
IVF_PQGeneral-purpose default; what Enterprise builds automatically. Higher accuracy than RQ at small dimensions (≤ 256).
IVF_RQMaximum compression on high-dim vectors, faster builds than PQ.
IVF_HNSW_SQBest recall/latency for unfiltered search; higher latency variance under selective filters.
IVF_FLATRequired for binary vectors with hamming.
The distance metric is fixed once the index is built. Pick the distance metric based on how the embedding model was trained: cosine (unnormalized), dot (already-normalized, best performance), l2 (general-purpose, default), hamming (binary). For parameter tuning, see Vector Indexing.

Scalar indexes

If filtered queries slow down as the table grows — even when the filter is selective — the cause is a full column scan: without a scalar index, LanceDB evaluates the where(...) predicate on every row, and the same applies to merge_insert join keys. The best practice is to build a scalar index on every column you filter or join on, picking the type by the column’s shape:
IndexBest for
BTREE (default)Numeric, string, temporal columns with mostly distinct values
BITMAPBoolean and low-cardinality columns (< ~1,000 distinct values)
LABEL_LISTList<T> columns queried with array_has_any / array_has_all
See Scalar Indexing. If your full-text index is much larger than expected, or takes longer than expected to build, the cause is usually phrase-query flags being enabled when they aren’t needed: with_position=True and remove_stop_words=False both significantly inflate index size and build time. The best practice is to keep the defaults for most workloads, and only enable those flags when you actually need to search for phrases. See FTS Indexing configuration options for the full set of options.

Compaction and cleanup

Two things accumulate on a long-lived table as more and more data gets added to it:
  • Many small fragments build up as you write, slowing down queries that have to scan across more files. Compaction merges them back into larger fragments.
  • Old versions build up as the table changes, growing disk usage beyond the live data size (LanceDB retains them for time-travel and rollback). Cleanup prunes versions older than a retention window.
In OSS, you run them yourself via optimize(), which bundles both into a single call:
Python
from datetime import timedelta
table.optimize()                                       # 7-day default retention
table.optimize(cleanup_older_than=timedelta(days=1))   # reclaim space sooner
The best practice is to run optimize() after large writes or on a schedule. It also bundles incremental index updates — see Reindexing for the breakdown. Updates also move rows out of the vector index, so they remain searchable but unindexed — rebuild the index after large update batches. LanceDB Enterprise handles both compaction and cleanup automatically.

Querying

Three knobs materially affect query latency, memory use, and recall. Be deliberate about each one on every query:
1

Project and limit explicitly

If queries return more data than you need or take longer than expected, the cause is usually projecting more columns than necessary, or letting the result count run unbounded. The best practice is to always pass both select() and limit():
Python
table.search(emb).select(["id", "title"]).limit(20)
2

Stay on pre-filter unless fewer matches are acceptable

If a query with prefilter=False returns fewer than limit results — sometimes zero — the cause is that post-filter applies the predicate after the top-k is selected, so only candidates that already passed the search are filtered. Pre-filter is the default and guarantees every result satisfies the predicate. Switch to prefilter=False only when fewer-than-limit results are acceptable. See Filtering.
3

Tune for recall one knob at a time

If recall is lower than expected, the cause is that the search-time knobs limit how many candidates the index considers. Adjust based on your index type:
  • Quantized indexes (PQ, RQ, SQ): refine_factor pulls extra candidates and re-scores them on full vectors.
  • HNSW-backed indexes: ef at search time. Start at 1.5 × k, raise toward 10 × k if recall is short.
  • IVF candidate breadth: nprobes is auto-tuned; override only when a selective pre-filter leaves too few neighbors.
For hybrid search, the default RRFReranker() combines vector and FTS results into a single ranking via reciprocal rank fusion. See Hybrid Search.

Avoid materializing the whole table

If you need every row in the table (for training, export, or migration), calling to_pandas() or to_arrow() will run you out of memory on any non-trivial dataset — both materialize the full table at once. The best practice is to iterate via table.search(...) or table.query(...), which work the same way in both LanceDB OSS and Enterprise. In OSS, you can also stream batches through the underlying Lance dataset directly — useful when you need filter pushdown or fragment-level parallelism via ds.scanner(...):
Python
ds = table.to_lance()
for batch in ds.to_batches(columns=["id", "text"], batch_size=10000):
    process(batch)
API disparity between OSS and Enterprise.LanceDB OSS exposes the to_pandas(), to_arrow(), and table.to_lance() methods for direct Lance dataset access. Enterprise’s RemoteTable exposes none of these. Only table.search(...) and table.query(...). In general, it’s always best to go through search() and query() to keep your code portable across both OSS and Enterprise.

Diagnostics

To analyze a slow query, inspect what the query engine actually did and the state of the indexes it touched. These two tools surface that information — use them in this order:
Python
print(table.search(emb).where("year > 2000").limit(10).analyze_plan())
print(table.index_stats("vector_idx"))   # num_unindexed_rows should be ~0
analyze_plan() returns the execution plan with per-stage timings, so you can see where the query actually spent its time. index_stats() shows how many rows are still unindexed — you want num_unindexed_rows to be ~0. In analyze_plan, look for:
Plan patternFix
LanceScan with high bytes_read / iopsAdd a missing index, project columns with select(), or check that the dataset has been compacted
Multiple sequential filtersReorder filter conditions
Optimize Query Performance walks through a fully worked before/after example, including how KNNVectorDistance and output_batches change once indexes are in place.

Where to go next

Optimize Query Performance

Read execution plans, find the bottleneck.

Vector Indexing

Index types, parameters, and tuning in depth.

Filtering

Pre- vs post-filter, scalar indexes, list columns.

Enterprise Benchmarks

Benchmark methodology and reference latency numbers.