Update Data
Our enterprise solution efficiently manages updates across millions of tables, supporting several hundred transactions per second (TPS) per table.
LanceDB supports various data modification operations:
- Basic Operations: Update and delete existing data
- Merge Insert Operations: Upsert, insert-if-not-exists, and replace range
Basic Operations
Update
Update existing rows that match a condition.
When rows are updated, LanceDB will update the existing index on the column. The updated data is available for search immediately.
Check out the details of update
from Python SDK and TypeScript SDK.
Delete
Remove rows that match a condition.
Delete operations are permanent and cannot be undone. Always ensure you have backups or are certain before deleting data.
Check out the details of delete
from Python SDK and TypeScript SDK.
Merge Operations
The merge insert command is a flexible API that can be used to perform upsert, insert_if_not_exists, and replace_range operations.
Use scalar indices to speed up merge insert
The merge insert command performs a join between the input data and the target
table on
the key you provide. This requires scanning that entire column, which can be
expensive for large tables. To speed up this operation, create a scalar index
on the join column, which will allow LanceDB to find matches without scanning the whole table.
Read more about scalar indices in the Build a Scalar Index guide.
Embedding Functions
Like the create table and add APIs, the merge insert API will automatically compute embeddings if the table has an embedding definition in its schema. If the input data doesn’t contain the source column, or the vector column is already filled, the embeddings won’t be computed.
Upsert
upsert
updates rows if they exist and inserts them if they don’t. To do this with merge insert,
enable both when_matched_update_all()
and when_not_matched_insert_all()
.
Insert-if-not-exists
This will only insert rows that do not have a match in the target table, thus
preventing duplicate rows. To do this with merge insert, enable just
when_not_matched_insert_all()
.
Replace range
You can also replace a range of rows in the target table with the input data.
For example, if you have a table of document chunks, where each chunk has both
a doc_id
and a chunk_id
, you can replace all chunks for a given doc_id
with updated chunks.
This can be tricky otherwise because if you try to use upsert
when the new data has fewer
chunks you will end up with extra chunks. To avoid this, add another clause to delete any chunks
for the document that are not in the new data, with when_not_matched_by_source_delete
.
For more detailed information, refer to the merge_insert
from Python SDK
and mergeInsert from TypeScript SDK