Performance Tips and Best Practices

LanceDB is performant by default. This page covers performance best practices that matter when you want to ensure you get the right performance for a specific workload. Use the table below to jump to the area relevant to what you’re working on.

When you’re working on…	Read
Loading data into a table	Ingestion
Running filtered or vector queries at scale	Indexing
Iterating over large result sets (training, export, migration)	Querying
Keeping a long-lived dataset healthy	Maintenance
Inspecting query plans	Diagnostics

When using Python with multiprocessing, use spawn rather than fork. LanceDB is multi-threaded internally, and fork plus a multi-threaded process is unsafe.

Ingestion

If ingestion is taking longer than expected on a large dataset, the cause is almost always how add() is called: each call commits a new version and a new fragment, so a per-row loop pays that per-call overhead at every row. The best practice is to pick the ingestion mode that matches your data shape — bulk ingestion when the data is already materialized, or iterator ingestion when it’s streamed or computed on the fly.

Why merge_insert() is significantly slower than add()A merge has to scan existing data to find matches on the join key (or look them up via a scalar index, if one exists), and then delete-and-reinsert any updated rows in a single transaction; add() simply appends new fragments.Use add() for pure appends, and reach for merge_insert() only when you need upsert or conditional-insert logic. When you do use it, build a scalar index on the join column first — otherwise the match step falls back to a full column scan, which is the dominant cost at scale.

Bulk ingestion: for data you already have

For materialized inputs (Arrow Tables, DataFrames, pyarrow.dataset(...)), LanceDB auto-parallelizes the write across workers.

Python

table.add(arrow_table)                              # in-memory
table.add(df)                                       # pandas
table.add(ds.dataset("data/", format="parquet"))    # streams from disk, still parallelized

pa.dataset(...) is the right choice for large file-based loads: it streams Parquet/CSV without loading into memory, and still preserves auto-parallelism. For very large initial loads, create the table empty first; passing data directly to create_table(name, data) skips the auto-parallel path.

Iterator ingestion: for streaming or computed-on-the-fly data

If your dataset doesn’t fit in memory all at once (for example, when rows are computed on the fly (generating embeddings as you ingest), or pulled from a streaming source, materializing the full table before calling add() will cause you to run out of memory. The best practice is to pass an iterator of pyarrow.RecordBatch instead. LanceDB consumes one batch at a time, so peak memory stays bounded by the batch size you yield.

Python

def stream():
    for raw in source:
        vectors = model.encode(raw["text"])
        yield pa.RecordBatch.from_pydict({**raw, "vector": vectors})

table.add(stream())

Streaming inputs don’t auto-parallelize, so use decently large chunks of several thousand rows or more, rather than yielding single-row batches.

Indexing

Vector indexes

If vector search latency climbs with table size (i.e., queries that ran in milliseconds on a small table take seconds as it grows to millions of rows), the cause is the default brute-force scan over every vector. That works fine below ~100K vectors, but past that you should build a dedicated vector index. Pick the type by your data shape:

Index	When to use
`IVF_PQ`	General-purpose default; what Enterprise builds automatically. Higher accuracy than RQ at small dimensions (≤ 256).
`IVF_RQ`	Maximum compression on high-dim vectors, faster builds than PQ.
`IVF_HNSW_SQ`	Best recall/latency for unfiltered search; higher latency variance under selective filters.
`IVF_FLAT`	Required for binary vectors with `hamming`.

The distance metric is fixed once the index is built. Pick the distance metric based on how the embedding model was trained: cosine (unnormalized), dot (already-normalized, best performance), l2 (general-purpose, default), hamming (binary). For parameter tuning, see Vector Indexing.

Scalar indexes

If filtered queries slow down as the table grows — even when the filter is selective — the cause is a full column scan: without a scalar index, LanceDB evaluates the where(...) predicate on every row, and the same applies to merge_insert join keys. The best practice is to build a scalar index on every column you filter or join on, picking the type by the column’s shape:

Index	Best for
`BTREE` (default)	Numeric, string, temporal columns with mostly distinct values
`BITMAP`	Boolean and low-cardinality columns (< ~1,000 distinct values)
`LABEL_LIST`	`List<T>` columns queried with `array_has_any` / `array_has_all`

See Scalar Indexing.

Full-text search

If your full-text index is much larger than expected, or takes longer than expected to build, the cause is usually phrase-query flags being enabled when they aren’t needed: with_position=True and remove_stop_words=False both significantly inflate index size and build time. The best practice is to keep the defaults for most workloads, and only enable those flags when you actually need to search for phrases. See FTS Indexing configuration options for the full set of options.

Compaction and cleanup

Two things accumulate on a long-lived table as more and more data gets added to it:

Many small fragments build up as you write, slowing down queries that have to scan across more files. Compaction merges them back into larger fragments.
Old versions build up as the table changes, growing disk usage beyond the live data size (LanceDB retains them for time-travel and rollback). Cleanup prunes versions older than a retention window.

In OSS, you run them yourself via optimize(), which bundles both into a single call:

Python

from datetime import timedelta
table.optimize()                                       # 7-day default retention
table.optimize(cleanup_older_than=timedelta(days=1))   # reclaim space sooner

The best practice is to run optimize() after large writes or on a schedule. It also bundles incremental index updates — see Reindexing for the breakdown. Updates also move rows out of the vector index, so they remain searchable but unindexed — rebuild the index after large update batches. LanceDB Enterprise handles both compaction and cleanup automatically.

Querying

Three knobs materially affect query latency, memory use, and recall. Be deliberate about each one on every query:

Project and limit explicitly

If queries return more data than you need or take longer than expected, the cause is usually projecting more columns than necessary, or letting the result count run unbounded. The best practice is to always pass both select() and limit():

Python

table.search(emb).select(["id", "title"]).limit(20)

Stay on pre-filter unless fewer matches are acceptable

If a query with prefilter=False returns fewer than limit results — sometimes zero — the cause is that post-filter applies the predicate after the top-k is selected, so only candidates that already passed the search are filtered. Pre-filter is the default and guarantees every result satisfies the predicate. Switch to prefilter=False only when fewer-than-limit results are acceptable. See Filtering.

Tune for recall one knob at a time

If recall is lower than expected, the cause is that the search-time knobs limit how many candidates the index considers. Adjust based on your index type:

Quantized indexes (PQ, RQ, SQ): refine_factor pulls extra candidates and re-scores them on full vectors.
HNSW-backed indexes: ef at search time. Start at 1.5 × k, raise toward 10 × k if recall is short.
IVF candidate breadth: nprobes is auto-tuned; override only when a selective pre-filter leaves too few neighbors.

For hybrid search, the default RRFReranker() combines vector and FTS results into a single ranking via reciprocal rank fusion. See Hybrid Search.

Avoid materializing the whole table

If you need every row in the table (for training, export, or migration), calling to_pandas() or to_arrow() will run you out of memory on any non-trivial dataset — both materialize the full table at once. The best practice is to iterate via table.search(...) or table.query(...), which work the same way in both LanceDB OSS and Enterprise. In OSS, you can also stream batches through the underlying Lance dataset directly — useful when you need filter pushdown or fragment-level parallelism via ds.scanner(...):

Python

ds = table.to_lance()
for batch in ds.to_batches(columns=["id", "text"], batch_size=10000):
    process(batch)

API disparity between OSS and Enterprise.LanceDB OSS exposes the to_pandas(), to_arrow(), and table.to_lance() methods for direct Lance dataset access. Enterprise’s RemoteTable exposes none of these. Only table.search(...) and table.query(...). In general, it’s always best to go through search() and query() to keep your code portable across both OSS and Enterprise.

Diagnostics

To analyze a slow query, inspect what the query engine actually did and the state of the indexes it touched. These two tools surface that information — use them in this order:

Python

print(table.search(emb).where("year > 2000").limit(10).analyze_plan())
print(table.index_stats("vector_idx"))   # num_unindexed_rows should be ~0

analyze_plan() returns the execution plan with per-stage timings, so you can see where the query actually spent its time. index_stats() shows how many rows are still unindexed — you want num_unindexed_rows to be ~0. In analyze_plan, look for:

Plan pattern	Fix
`LanceScan` with high `bytes_read` / `iops`	Add a missing index, project columns with `select()`, or check that the dataset has been compacted
Multiple sequential filters	Reorder filter conditions

Optimize Query Performance walks through a fully worked before/after example, including how KNNVectorDistance and output_batches change once indexes are in place.

Where to go next

Optimize Query Performance

Read execution plans, find the bottleneck.

Vector Indexing

Index types, parameters, and tuning in depth.

Filtering

Pre- vs post-filter, scalar indexes, list columns.

Enterprise Benchmarks

Benchmark methodology and reference latency numbers.

Get started

Guides

Feature Engineering (Geneva)

Support

Performance Tips and Best Practices

Ingestion

Bulk ingestion: for data you already have

Iterator ingestion: for streaming or computed-on-the-fly data

Indexing

Vector indexes

Scalar indexes

Full-text search

Compaction and cleanup

Querying

Avoid materializing the whole table

Diagnostics

Where to go next

Optimize Query Performance

Vector Indexing

Filtering

Enterprise Benchmarks

Get started

Guides

Feature Engineering (Geneva)

Support

Documentation Index

​Ingestion

​Bulk ingestion: for data you already have

​Iterator ingestion: for streaming or computed-on-the-fly data

​Indexing

​Vector indexes

​Scalar indexes

​Full-text search

​Compaction and cleanup

​Querying

​Avoid materializing the whole table

​Diagnostics

​Where to go next

Optimize Query Performance

Vector Indexing

Filtering

Enterprise Benchmarks

Ingestion

Bulk ingestion: for data you already have

Iterator ingestion: for streaming or computed-on-the-fly data

Indexing

Vector indexes

Scalar indexes

Full-text search

Compaction and cleanup

Querying

Avoid materializing the whole table

Diagnostics

Where to go next