LanceDB is performant by default. This page covers performance best practices that matter when you want to ensure you get the right performance for a specific workload. Use the table below to jump to the area relevant to what you’re working on.Documentation Index
Fetch the complete documentation index at: https://docs.lancedb.com/llms.txt
Use this file to discover all available pages before exploring further.
| When you’re working on… | Read |
|---|---|
| Loading data into a table | Ingestion |
| Running filtered or vector queries at scale | Indexing |
| Iterating over large result sets (training, export, migration) | Querying |
| Keeping a long-lived dataset healthy | Maintenance |
| Inspecting query plans | Diagnostics |
When using Python with multiprocessing, use
spawn rather than fork. LanceDB is multi-threaded internally, and fork plus a multi-threaded process is unsafe.Ingestion
If ingestion is taking longer than expected on a large dataset, the cause is almost always howadd() is called: each call commits a new version and a new fragment, so a per-row loop pays that per-call overhead at every row. The best practice is to pick the ingestion mode that matches your data shape — bulk ingestion when the data is already materialized, or iterator ingestion when it’s streamed or computed on the fly.
Why
merge_insert() is significantly slower than add()A merge has to scan existing data to find matches on the join key (or look them up via a scalar index, if one exists), and then delete-and-reinsert any updated rows in a single transaction; add() simply appends new fragments.Use add() for pure appends, and reach for merge_insert() only when you need upsert or conditional-insert logic. When you do use it, build a scalar index on the join column first — otherwise the match step falls back to a full column scan, which is the dominant cost at scale.Bulk ingestion: for data you already have
For materialized inputs (Arrow Tables, DataFrames,pyarrow.dataset(...)), LanceDB auto-parallelizes the write across workers.
Python
pa.dataset(...) is the right choice for large file-based loads: it streams Parquet/CSV without loading into memory, and still preserves auto-parallelism. For very large initial loads, create the table empty first; passing data directly to create_table(name, data) skips the auto-parallel path.
Iterator ingestion: for streaming or computed-on-the-fly data
If your dataset doesn’t fit in memory all at once (for example, when rows are computed on the fly (generating embeddings as you ingest), or pulled from a streaming source, materializing the full table before callingadd() will cause you to run out of memory. The best practice is to pass an iterator of pyarrow.RecordBatch instead. LanceDB consumes one batch at a time, so peak memory stays bounded by the batch size you yield.
Python
Indexing
Vector indexes
If vector search latency climbs with table size (i.e., queries that ran in milliseconds on a small table take seconds as it grows to millions of rows), the cause is the default brute-force scan over every vector. That works fine below ~100K vectors, but past that you should build a dedicated vector index. Pick the type by your data shape:| Index | When to use |
|---|---|
IVF_PQ | General-purpose default; what Enterprise builds automatically. Higher accuracy than RQ at small dimensions (≤ 256). |
IVF_RQ | Maximum compression on high-dim vectors, faster builds than PQ. |
IVF_HNSW_SQ | Best recall/latency for unfiltered search; higher latency variance under selective filters. |
IVF_FLAT | Required for binary vectors with hamming. |
cosine (unnormalized), dot (already-normalized, best performance), l2 (general-purpose, default), hamming (binary). For parameter tuning, see Vector Indexing.
Scalar indexes
If filtered queries slow down as the table grows — even when the filter is selective — the cause is a full column scan: without a scalar index, LanceDB evaluates thewhere(...) predicate on every row, and the same applies to merge_insert join keys. The best practice is to build a scalar index on every column you filter or join on, picking the type by the column’s shape:
| Index | Best for |
|---|---|
BTREE (default) | Numeric, string, temporal columns with mostly distinct values |
BITMAP | Boolean and low-cardinality columns (< ~1,000 distinct values) |
LABEL_LIST | List<T> columns queried with array_has_any / array_has_all |
Full-text search
If your full-text index is much larger than expected, or takes longer than expected to build, the cause is usually phrase-query flags being enabled when they aren’t needed:with_position=True and remove_stop_words=False both significantly inflate index size and build time. The best practice is to keep the defaults for most workloads, and only enable those flags when you actually need to search for phrases. See FTS Indexing configuration options for the full set of options.
Compaction and cleanup
Two things accumulate on a long-lived table as more and more data gets added to it:- Many small fragments build up as you write, slowing down queries that have to scan across more files. Compaction merges them back into larger fragments.
- Old versions build up as the table changes, growing disk usage beyond the live data size (LanceDB retains them for time-travel and rollback). Cleanup prunes versions older than a retention window.
optimize(), which bundles both into a single call:
Python
optimize() after large writes or on a schedule. It also bundles incremental index updates — see Reindexing for the breakdown. Updates also move rows out of the vector index, so they remain searchable but unindexed — rebuild the index after large update batches.
LanceDB Enterprise handles both compaction and cleanup automatically.
Querying
Three knobs materially affect query latency, memory use, and recall. Be deliberate about each one on every query:Project and limit explicitly
If queries return more data than you need or take longer than expected, the cause is usually projecting more columns than necessary, or letting the result count run unbounded. The best practice is to always pass both
select() and limit():Python
Stay on pre-filter unless fewer matches are acceptable
If a query with
prefilter=False returns fewer than limit results — sometimes zero — the cause is that post-filter applies the predicate after the top-k is selected, so only candidates that already passed the search are filtered. Pre-filter is the default and guarantees every result satisfies the predicate. Switch to prefilter=False only when fewer-than-limit results are acceptable. See Filtering.Tune for recall one knob at a time
If recall is lower than expected, the cause is that the search-time knobs limit how many candidates the index considers. Adjust based on your index type:
- Quantized indexes (PQ, RQ, SQ):
refine_factorpulls extra candidates and re-scores them on full vectors. - HNSW-backed indexes:
efat search time. Start at1.5 × k, raise toward10 × kif recall is short. - IVF candidate breadth:
nprobesis auto-tuned; override only when a selective pre-filter leaves too few neighbors.
RRFReranker() combines vector and FTS results into a single ranking via reciprocal rank fusion. See Hybrid Search.
Avoid materializing the whole table
If you need every row in the table (for training, export, or migration), callingto_pandas() or to_arrow() will run you out of memory on any non-trivial dataset — both materialize the full table at once. The best practice is to iterate via table.search(...) or table.query(...), which work the same way in both LanceDB OSS and Enterprise.
In OSS, you can also stream batches through the underlying Lance dataset directly — useful when you need filter pushdown or fragment-level parallelism via ds.scanner(...):
Python
API disparity between OSS and Enterprise.LanceDB OSS exposes the
to_pandas(), to_arrow(), and table.to_lance() methods for direct Lance dataset access. Enterprise’s RemoteTable exposes none of these. Only table.search(...) and table.query(...). In general, it’s always best to go through search() and query() to keep your code portable across both OSS and Enterprise.Diagnostics
To analyze a slow query, inspect what the query engine actually did and the state of the indexes it touched. These two tools surface that information — use them in this order:Python
analyze_plan() returns the execution plan with per-stage timings, so you can see where the query actually spent its time. index_stats() shows how many rows are still unindexed — you want num_unindexed_rows to be ~0. In analyze_plan, look for:
| Plan pattern | Fix |
|---|---|
LanceScan with high bytes_read / iops | Add a missing index, project columns with select(), or check that the dataset has been compacted |
| Multiple sequential filters | Reorder filter conditions |
KNNVectorDistance and output_batches change once indexes are in place.
Where to go next
Optimize Query Performance
Read execution plans, find the bottleneck.
Vector Indexing
Index types, parameters, and tuning in depth.
Filtering
Pre- vs post-filter, scalar indexes, list columns.
Enterprise Benchmarks
Benchmark methodology and reference latency numbers.