Skip to main content
Scalar indexes organize data by scalar attributes (e.g., numbers, categories) and enable fast filtering of vector data. They accelerate retrieval of scalar data associated with vectors, thus enhancing query performance. LanceDB supports four types of scalar indexes:
  • BTREE: Stores column data in sorted order for binary search. Best for columns with many unique values.
  • BITMAP: Uses bitmaps to track value presence. Ideal for columns with few unique values (e.g., categories, tags).
  • LABEL_LIST: Special index for List<T> and LargeList<T> columns of primitive values supporting array_contains_all and array_contains_any queries.
  • FM: FM-Index over string or binary columns that accelerates substring search via contains(col, 'needle').

Choosing the Right Index Type

Data TypeFilterIndex Type
Numeric, String, Temporal<, =, >, in, between, is nullBTREE
Boolean, numbers or strings with fewer than 1,000 unique values<, =, >, in, between, is nullBITMAP
List of low cardinality of numbers or stringsarray_has_any, array_has_allLABEL_LIST
String or binary (Utf8, LargeUtf8, Binary, LargeBinary)contains(col, 'needle')FM

Scalar Index Operations

1. Build the Index

You can create multiple scalar indexes within a table. By default, the index will be BTREE, but you can always configure another type like BITMAP
If you are using LanceDB Enterprise, the create_scalar_index API returns immediately, but the building of the scalar index is asynchronous. To wait until all data is fully indexed, you can specify the wait_timeout parameter on create_scalar_index() or call wait_for_index() on the table.

2. Check Index Status

wait_for_index(...) waits until the named scalar indexes exist and index_stats(...) reports num_unindexed_rows == 0. If a table is receiving steady writes, that fully indexed state may not stabilize before the timeout.

3. Update the Index

Updating the table data (adding, deleting, or modifying records) requires that you also update the scalar index. This can be done by calling optimize, which will trigger an update to the existing scalar index.
New data added after creating the scalar index will still appear in search results if optimize is not used, but with increased latency due to a flat search on the unindexed portion. LanceDB Enterprise automates the optimize process, minimizing the impact on search speed.

4. Run Indexed Searches

The following scan will be faster if the column book_id has a scalar index: Scalar indexes can also speed up scans containing a vector search or full text search, and a prefilter:

Indexing nested fields

Scalar indexes can target a scalar field inside a struct by passing its full dotted path. The path is preserved end to end: it’s the value you pass to create_scalar_index, it’s what list_indices() reports under columns, and it’s the column reference you use in filter predicates.
# Schema: pa.struct([pa.field("user_id", pa.int32())]) stored under the `metadata` column.
table.create_scalar_index("metadata.user_id", name="metadata_user_id_idx")

# The same dotted path works in WHERE clauses.
table.search().where("metadata.user_id = 42").limit(1).to_list()
Nested paths follow Lance field-path semantics: dot-separate each struct field from root to leaf (for example, metadata.author.name). The same convention applies to FTS and vector indexes.
The FM index is a scalar index built over string or binary columns that accelerates substring lookups expressed as contains(col, 'needle'). Unlike the tokenized FTS index, which matches whole words after tokenization, the FM-Index matches arbitrary substrings of the raw bytes — so it works well for URLs, file paths, identifiers, log lines, or any column where you search for a fragment rather than a word. Use the FM-Index when:
  • Filters use contains(col, 'needle') (substring), not equality or word search.
  • The column is Utf8, LargeUtf8, Binary, or LargeBinary.
  • You want substring matches without paying for tokenization, language analysis, or BM25 scoring.
Pick FTS instead when you need word-level relevance ranking, phrase queries, or language-aware tokenization.

Create an FM-Index

Build an FM-Index with the async create_index API by passing the Fm config in Python or Index.fm() in TypeScript. In Rust, use Index::Fm(FmIndexBuilder::default()).
from lancedb.index import Fm

await tbl.create_index("text", config=Fm())
After the index is built, substring filters use it automatically:
table.search().where("contains(text, 'needle')").limit(10).to_pandas()
list_indices() reports the index type as "Fm".

Index UUID Columns

LanceDB supports scalar indexes on UUID columns (stored as FixedSizeBinary(16)), enabling efficient lookups and filtering on UUID-based primary keys.
To use FixedSizeBinary, ensure you have:
  • Python SDK version 0.22.0 or later
  • TypeScript SDK version 0.19.0 or later

1. Define UUID Type

2. Generate UUID Data

3. Create Table with UUID Column

4. Create and Wait for the Index

5. Perform Operations with the UUID Index

Index nested fields

You can build a scalar index on a field inside a struct column by passing the canonical dot-separated path to create_index. This is useful when filters target attributes nested under a metadata-style column, for example metadata.user_id or metadata.event.type. If a literal segment of the path itself contains a dot (for example a column named user.id nested inside metadata), wrap that segment in backticks so LanceDB can tell the dot apart from the path separator: metadata.`user.id`. list_indices() echoes the same canonical path back, so the column you pass in round-trips through index metadata regardless of nesting depth or escaping.
Composite indexes that cover multiple columns aren’t supported yet. Each create_index call must target a single (possibly nested) field path.