> ## Documentation Index
> Fetch the complete documentation index at: https://docs.lancedb.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Scalar Indexes

> Learn how to use scalar indexes in LanceDB for efficient metadata filtering and query optimization.

export const ScalarIndexNestedFields = "import pyarrow as pa\nfrom lancedb.index import BTree\n\nmetadata_type = pa.struct(\n    [\n        pa.field(\"user_id\", pa.int32()),\n        pa.field(\"user.id\", pa.int32()),\n    ]\n)\ndata = pa.Table.from_arrays(\n    [\n        pa.array([1, 2, 3], type=pa.int32()),\n        pa.array(\n            [\n                {\"user_id\": 10, \"user.id\": 100},\n                {\"user_id\": 20, \"user.id\": 200},\n                {\"user_id\": 30, \"user.id\": 300},\n            ],\n            type=metadata_type,\n        ),\n    ],\n    names=[\"user_id\", \"metadata\"],\n)\ntable = await db.create_table(\"nested_scalar_index\", data)\n\n# Index a nested struct field.\nawait table.create_index(\n    \"metadata.user_id\", config=BTree(), name=\"nested_user_id_idx\"\n)\n\n# Escape literal dots inside a segment with backticks.\nawait table.create_index(\n    \"metadata.`user.id`\", config=BTree(), name=\"escaped_user_id_idx\"\n)\n\n# `columns` is returned as the canonical path you passed in.\nfor index in await table.list_indices():\n    print(index.name, index.columns)\n# nested_user_id_idx  ['metadata.user_id']\n# escaped_user_id_idx ['metadata.`user.id`']\n";

export const ScalarIndexUuidUpsert = "new_users = [\n    {\"id\": uuid.uuid4().bytes, \"name\": \"Hannah D.\"},\n    {\"id\": uuid.uuid4().bytes, \"name\": \"Ian B.\"},\n]\n# Insert or update using the UUID index\ntable.merge_insert(\n    \"id\"\n).when_matched_update_all().when_not_matched_insert_all().execute(new_users)\n";

export const ScalarIndexUuidWait = "index_name = \"id_idx\"\ntable.create_scalar_index(\"id\")\ntable.wait_for_index([index_name])\n";

export const ScalarIndexUuidTable = "table_name = \"index-on-uuid\"\n\nuuid_array = pa.array(uuids, pa.uuid())\nname_array = pa.array(names, pa.string())\nschema = pa.schema(\n    [\n        pa.field(\"id\", pa.uuid()),\n        pa.field(\"name\", pa.string()),\n    ]\n)\ndata_table = pa.Table.from_arrays([uuid_array, name_array], schema=schema)\ntable = db.create_table(table_name, data=data_table, mode=\"overwrite\")\n";

export const ScalarIndexUuidData = "def generate_random_names():\n    base_names = [\"Alice\", \"Bob\", \"Carla\", \"David\", \"Eve\", \"Frank\", \"Grace\"]\n    letter = random.choice(string.ascii_uppercase)\n    return f\"{random.choice(base_names)} {letter}.\"\n\ndef generate_uuids(num_items):\n    return [uuid.uuid4().bytes for _ in range(num_items)]\n\n# Generate some UUIDs and random names\nn = 7\nuuids = generate_uuids(n)\nnames = [generate_random_names() for _ in range(n)]\n";

export const ScalarIndexUuidType = "import pyarrow as pa\n";

export const ScalarIndexPrefilter = "table = db.open_table(\"book_with_embeddings\")\ntable.search([1.2] * 2).where(\"book_id != 3\").limit(10).to_pandas()\n";

export const ScalarIndexFilter = "table = db.open_table(\"books\")\nresult = table.search().where(\"book_id = 2\").limit(10).to_pandas()\n";

export const ScalarIndexOptimize = "table.add([{\"vector\": [7, 8], \"book_id\": 4}])\ntable.optimize()\n";

export const ScalarIndexWait = "index_name = \"label_idx\"\ntable.wait_for_index([index_name])\n";

export const ScalarIndexBuild = "tbl = db.open_table(\"scalar_index_build\")\ntbl.create_scalar_index(\"book_id\")\ntbl.create_scalar_index(\"publisher\", index_type=\"BITMAP\")\n";

Scalar indexes organize data by scalar attributes (e.g., numbers, categories) and enable fast filtering of vector data. They accelerate retrieval of scalar data associated with vectors, thus enhancing query performance.

LanceDB supports four types of scalar indexes:

* `BTREE`: Stores column data in sorted order for binary search. Best for columns with many unique values.
* `BITMAP`: Uses bitmaps to track value presence. Ideal for columns with few unique values (e.g., categories, tags).
* `LABEL_LIST`: Special index for `List<T>` and `LargeList<T>` columns of primitive values supporting `array_contains_all` and `array_contains_any` queries.
* `FM`: FM-Index over string or binary columns that accelerates substring search via `contains(col, 'needle')`.

## Choosing the Right Index Type

| Data Type                                                       | Filter                                    | Index Type   |
| :-------------------------------------------------------------- | :---------------------------------------- | :----------- |
| Numeric, String, Temporal                                       | `<`, `=`, `>`, `in`, `between`, `is null` | `BTREE`      |
| Boolean, numbers or strings with fewer than 1,000 unique values | `<`, `=`, `>`, `in`, `between`, `is null` | `BITMAP`     |
| List of low cardinality of numbers or strings                   | `array_has_any`, `array_has_all`          | `LABEL_LIST` |
| String or binary (`Utf8`, `LargeUtf8`, `Binary`, `LargeBinary`) | `contains(col, 'needle')`                 | `FM`         |

## Scalar Index Operations

### 1. Build the Index

You can create multiple scalar indexes within a table. By default, the index will be `BTREE`, but you can always configure another type like `BITMAP`

<CodeGroup>
  <CodeBlock filename="Python" language="Python" icon="python">
    {ScalarIndexBuild}
  </CodeBlock>
</CodeGroup>

<Note title="LanceDB Enterprise">
  If you are using LanceDB Enterprise, the `create_scalar_index` API returns immediately, but the building of the scalar index is asynchronous. To wait until all data is fully indexed, you can specify the `wait_timeout` parameter on `create_scalar_index()` or call `wait_for_index()` on the table.
</Note>

### 2. Check Index Status

<CodeGroup>
  <CodeBlock filename="Python" language="Python" icon="python">
    {ScalarIndexWait}
  </CodeBlock>
</CodeGroup>

`wait_for_index(...)` waits until the named scalar indexes exist and `index_stats(...)` reports `num_unindexed_rows == 0`. If a table is receiving steady writes, that fully indexed state may not stabilize before the timeout.

### 3. Update the Index

Updating the table data (adding, deleting, or modifying records) requires that you also update the scalar index. This can be done by calling `optimize`, which will trigger an update to the existing scalar index.

<CodeGroup>
  <CodeBlock filename="Python" language="Python" icon="python">
    {ScalarIndexOptimize}
  </CodeBlock>
</CodeGroup>

<Note title="LanceDB Enterprise">
  New data added after creating the scalar index will still appear in search results if optimize is not used, but with increased latency due to a flat search on the unindexed portion. LanceDB Enterprise automates the optimize process, minimizing the impact on search speed.
</Note>

### 4. Run Indexed Searches

The following scan will be faster if the column `book_id` has a scalar index:

<CodeGroup>
  <CodeBlock filename="Python" language="Python" icon="python">
    {ScalarIndexFilter}
  </CodeBlock>
</CodeGroup>

Scalar indexes can also speed up scans containing a vector search or full text search, and a prefilter:

<CodeGroup>
  <CodeBlock filename="Python" language="Python" icon="python">
    {ScalarIndexPrefilter}
  </CodeBlock>
</CodeGroup>

## Indexing nested fields

Scalar indexes can target a scalar field inside a struct by passing its full dotted path. The path is preserved end to end: it's the value you pass to `create_scalar_index`, it's what `list_indices()` reports under `columns`, and it's the column reference you use in filter predicates.

```python theme={"theme":{"light":"vitesse-light","dark":"catppuccin-mocha"}}
# Schema: pa.struct([pa.field("user_id", pa.int32())]) stored under the `metadata` column.
table.create_scalar_index("metadata.user_id", name="metadata_user_id_idx")

# The same dotted path works in WHERE clauses.
table.search().where("metadata.user_id = 42").limit(1).to_list()
```

<Note>
  Nested paths follow Lance field-path semantics: dot-separate each struct field from root to leaf (for example, `metadata.author.name`). The same convention applies to FTS and vector indexes.
</Note>

## FM-Index for substring search

The `FM` index is a scalar index built over string or binary columns that
accelerates substring lookups expressed as `contains(col, 'needle')`. Unlike the
tokenized [FTS index](/indexing/fts-index), which matches whole words after
tokenization, the FM-Index matches arbitrary substrings of the raw bytes — so it
works well for URLs, file paths, identifiers, log lines, or any column where you
search for a fragment rather than a word.

Use the FM-Index when:

* Filters use `contains(col, 'needle')` (substring), not equality or word search.
* The column is `Utf8`, `LargeUtf8`, `Binary`, or `LargeBinary`.
* You want substring matches without paying for tokenization, language analysis,
  or BM25 scoring.

Pick `FTS` instead when you need word-level relevance ranking, phrase queries,
or language-aware tokenization.

### Create an FM-Index

Build an FM-Index with the async `create_index` API by passing the `Fm` config in
Python or `Index.fm()` in TypeScript. In Rust, use `Index::Fm(FmIndexBuilder::default())`.

<CodeGroup>
  ```python Python icon="python" theme={"theme":{"light":"vitesse-light","dark":"catppuccin-mocha"}}
  from lancedb.index import Fm

  await tbl.create_index("text", config=Fm())
  ```

  ```typescript TypeScript icon="square-js" theme={"theme":{"light":"vitesse-light","dark":"catppuccin-mocha"}}
  import * as lancedb from "@lancedb/lancedb";
  import { Index } from "@lancedb/lancedb";

  await tbl.createIndex("text", { config: Index.fm() });
  ```

  ```rust Rust icon="rust" theme={"theme":{"light":"vitesse-light","dark":"catppuccin-mocha"}}
  use lancedb::index::Index;
  use lancedb::index::scalar::FmIndexBuilder;

  tbl.create_index(&["text"], Index::Fm(FmIndexBuilder::default()))
      .execute()
      .await?;
  ```
</CodeGroup>

After the index is built, substring filters use it automatically:

```python theme={"theme":{"light":"vitesse-light","dark":"catppuccin-mocha"}}
table.search().where("contains(text, 'needle')").limit(10).to_pandas()
```

`list_indices()` reports the index type as `"Fm"`.

## Index UUID Columns

LanceDB supports scalar indexes on UUID columns (stored as `FixedSizeBinary(16)`), enabling efficient lookups and filtering on UUID-based primary keys.

<Note>
  **To use `FixedSizeBinary`, ensure you have:**

  * Python SDK version `0.22.0` or later
  * TypeScript SDK version `0.19.0` or later
</Note>

### 1. Define UUID Type

<CodeGroup>
  <CodeBlock filename="Python" language="Python" icon="python">
    {ScalarIndexUuidType}
  </CodeBlock>
</CodeGroup>

### 2. Generate UUID Data

<CodeGroup>
  <CodeBlock filename="Python" language="Python" icon="python">
    {ScalarIndexUuidData}
  </CodeBlock>
</CodeGroup>

### 3. Create Table with UUID Column

<CodeGroup>
  <CodeBlock filename="Python" language="Python" icon="python">
    {ScalarIndexUuidTable}
  </CodeBlock>
</CodeGroup>

### 4. Create and Wait for the Index

<CodeGroup>
  <CodeBlock filename="Python" language="Python" icon="python">
    {ScalarIndexUuidWait}
  </CodeBlock>
</CodeGroup>

### 5. Perform Operations with the UUID Index

<CodeGroup>
  <CodeBlock filename="Python" language="Python" icon="python">
    {ScalarIndexUuidUpsert}
  </CodeBlock>
</CodeGroup>

## Index nested fields

You can build a scalar index on a field inside a struct column by passing the
canonical dot-separated path to `create_index`. This is useful when filters
target attributes nested under a `metadata`-style column, for example
`metadata.user_id` or `metadata.event.type`.

If a literal segment of the path itself contains a dot (for example a column
named `user.id` nested inside `metadata`), wrap that segment in backticks so
LanceDB can tell the dot apart from the path separator: `` metadata.`user.id` ``.

`list_indices()` echoes the same canonical path back, so the column you pass in
round-trips through index metadata regardless of nesting depth or escaping.

<CodeGroup>
  <CodeBlock filename="Python" language="Python" icon="python">
    {ScalarIndexNestedFields}
  </CodeBlock>
</CodeGroup>

<Note>
  Composite indexes that cover multiple columns aren't supported yet. Each
  `create_index` call must target a single (possibly nested) field path.
</Note>
