Our enterprise solution efficiently manages updates across millions of tables, supporting several hundred transactions per second (TPS) per table.

LanceDB supports various data modification operations:

  • Basic Operations: Update and delete existing data
  • Merge Insert Operations: Upsert, insert-if-not-exists, and replace range

Basic Operations

Update

Update existing rows that match a condition.

import lancedb

# Connect to LanceDB
db = lancedb.connect(
  uri="db://your-project-slug",
  api_key="your-api-key",
  region="us-east-1"
)

data = [
    {"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
    {"vector": [5.9, 26.5], "item": "bar", "price": 20.0},
    {"vector": [2.9, 18.2], "item": "zoo", "price": 30.0},
]

table = db.create_table("update_table_example", data, mode="overwrite")

# Update with direct values
table.update(where="price < 20.0", values={"vector": [2, 2], "item": "foo-updated"})

# Update using SQL expression
table.update(where="price < 20.0", values_sql={"price": "price * 1.1"})

When rows are updated, LanceDB will update the existing index on the column. The updated data is available for search immediately.

Check out the details of update from Python SDK and TypeScript SDK.

Delete

Remove rows that match a condition.

import lancedb

# Connect to LanceDB
db = lancedb.connect(
  uri="db://your-project-slug",
  api_key="your-api-key",
  region="us-east-1"
)
table = db.open_table("update_table_example")

# delete data
predicate = "price = 30.0"
table.delete(predicate)

Delete operations are permanent and cannot be undone. Always ensure you have backups or are certain before deleting data.

Check out the details of delete from Python SDK and TypeScript SDK.

Merge Operations

The merge insert command is a flexible API that can be used to perform upsert, insert_if_not_exists, and replace_range operations.

Use scalar indices to speed up merge insert

The merge insert command performs a join between the input data and the target table on the key you provide. This requires scanning that entire column, which can be expensive for large tables. To speed up this operation, create a scalar index on the join column, which will allow LanceDB to find matches without scanning the whole table.

Read more about scalar indices in the Build a Scalar Index guide.

Embedding Functions

Like the create table and add APIs, the merge insert API will automatically compute embeddings if the table has an embedding definition in its schema. If the input data doesn’t contain the source column, or the vector column is already filled, the embeddings won’t be computed.

Upsert

upsert updates rows if they exist and inserts them if they don’t. To do this with merge insert, enable both when_matched_update_all() and when_not_matched_insert_all().

import lancedb

# Connect to LanceDB
db = lancedb.connect(
  uri="db://your-project-slug",
  api_key="your-api-key",
  region="us-east-1"
)

# Create example table
users_table_name = "users_example"
table = db.create_table(
    users_table_name,
    [
        {"id": 0, "name": "Alice"},
        {"id": 1, "name": "Bob"},
    ],
    mode="overwrite",
)
print(f"Created users table with {table.count_rows()} rows")

# Prepare data for upsert
new_users = [
    {"id": 1, "name": "Bobby"},  # Will update existing record
    {"id": 2, "name": "Charlie"},  # Will insert new record
]

# Upsert by id
(
    users_table.merge_insert("id")
    .when_matched_update_all()
    .when_not_matched_insert_all()
    .execute(new_users)
)

# Verify results - should be 3 records total
print(f"Total users: {users_table.count_rows()}")  # 3

Insert-if-not-exists

This will only insert rows that do not have a match in the target table, thus preventing duplicate rows. To do this with merge insert, enable just when_not_matched_insert_all().

import lancedb

# Connect to LanceDB
db = lancedb.connect(
  uri="db://your-project-slug",
  api_key="your-api-key",
  region="us-east-1"
)

# Create example table
table = db.create_table(
    "domains",
    [
        {"domain": "google.com", "name": "Google"},
        {"domain": "github.com", "name": "GitHub"},
    ],
)

# Prepare new data - one existing and one new record
new_domains = [
    {"domain": "google.com", "name": "Google"},
    {"domain": "facebook.com", "name": "Facebook"},
]

# Insert only if domain doesn't exist
table.merge_insert("domain").when_not_matched_insert_all().execute(new_domains)

# Verify count - should be 3 (original 2 plus 1 new)
print(f"Total domains: {table.count_rows()}")  # 3

Replace range

You can also replace a range of rows in the target table with the input data. For example, if you have a table of document chunks, where each chunk has both a doc_id and a chunk_id, you can replace all chunks for a given doc_id with updated chunks.

This can be tricky otherwise because if you try to use upsert when the new data has fewer chunks you will end up with extra chunks. To avoid this, add another clause to delete any chunks for the document that are not in the new data, with when_not_matched_by_source_delete.

import lancedb

# Connect to LanceDB
db = lancedb.connect(
  uri="db://your-project-slug",
  api_key="your-api-key",
  region="us-east-1"
)

# Create example table with document chunks
table = db.create_table(
    "chunks",
    [
        {"doc_id": 0, "chunk_id": 0, "text": "Hello"},
        {"doc_id": 0, "chunk_id": 1, "text": "World"},
        {"doc_id": 1, "chunk_id": 0, "text": "Foo"},
        {"doc_id": 1, "chunk_id": 1, "text": "Bar"},
        {"doc_id": 2, "chunk_id": 0, "text": "Baz"},
    ],
)

# New data - replacing all chunks for doc_id 1 with just one chunk
new_chunks = [
    {"doc_id": 1, "chunk_id": 0, "text": "Zoo"},
]

# Replace all chunks for doc_id 1
(
    table.merge_insert(["doc_id"])
    .when_matched_update_all()
    .when_not_matched_insert_all()
    .when_not_matched_by_source_delete("doc_id > 1")
    .execute(new_chunks)
)

# Verify count for doc_id >= 1 - should be 2 instead of original 3
print(f"Chunks for doc_id >= 1: {table.count_rows('doc_id >= 1')}")  # 2

For more detailed information, refer to the merge_insert from Python SDK and mergeInsert from TypeScript SDK