Overview
Geneva backfills operate on a point-in-time snapshot of your table. When other operations modify the table during or between backfills, conflicts can occur. Geneva >=0.9.0 automatically handles most conflict scenarios, reducing unnecessary recomputation and enabling graceful recovery.
Safe Operations During Backfill
These operations can safely run while a backfill is in progress:
| Operation | Why It’s Safe |
|---|
merge_insert (Insert-only) | Creates new fragments without modifying existing ones |
add() / append data | Creates new fragments without modifying existing ones |
Read operations (search, to_arrow) | Read-only, no fragment modification |
| Adding new columns | Schema change only, no fragment rewrite |
Operations That Cause Conflicts
These operations can conflict with running backfills:
| Operation | What Happens |
|---|
compact_files() / optimize() | Reorganizes fragments, invalidating the backfill’s snapshot |
merge_insert with updates | Modifies existing rows, causing fragment conflicts |
delete() | Modifies existing fragments |
When a conflict occurs, affected fragments fail gracefully. The backfill completes what it can, and you can re-run it to process the remaining rows.
How Geneva Handles Conflicts
Concurrent Backfills on Different Columns
When multiple backfills run on the same table but different columns, Geneva handles version conflicts automatically:
- Each backfill writes to different column files (field IDs)
- If a commit conflict occurs, Geneva retries at the latest version
- The retry merges the new column data without overwriting other columns
This is controlled by the GENEVA_VERSION_CONFLICT_MAX_RETRIES environment variable (default: 10). See Advanced Configuration for details.
Compaction Between Backfills
When you run compaction between backfills (not during), Geneva handles it efficiently:
| Scenario | Behavior |
|---|
| Backfill, compact, re-backfill (same UDF) | Already-computed rows are skipped via WHERE <col> IS NULL |
| Partial backfill, compact, resume | Incremental processing continues from where it left off |
Backfill, alter_columns (new UDF), re-backfill | Full reprocessing with new UDF (intentional) |
Geneva’s default behavior is to skip rows that already have values (WHERE <col> IS NULL). This means compaction doesn’t cause unnecessary recomputation.
Recovery Steps
When a conflict occurs during a backfill:
- Wait for any concurrent operations (compaction, updates) to complete
- Re-run the backfill:
tbl.backfill("column_name")
- Only uncomputed rows will be processed (rows with NULL values in the target column)
Checkpoints from the previous run are preserved, so you won’t lose progress on successfully computed rows.
Best Practices
Sequence Your Operations
For the smoothest experience, sequence your operations:
1. Complete all data ingestion
2. Run backfill to compute UDF columns
3. Run compaction/optimization
Use Insert-Only Operations During Backfill
If you need to add data while a backfill is running, use insert-only operations:
# Safe: INSERT-only merge_insert
tbl.merge_insert("id").when_not_matched_insert_all().execute(new_data)
# Unsafe: Updates to existing rows
tbl.merge_insert("id").when_matched_update_all().execute(data) # May conflict
Monitor Backfill Progress
Use async backfills to monitor progress and handle errors:
fut = tbl.backfill_async("column_name")
while not fut.done():
time.sleep(1)
# Check for errors before subsequent operations
result = fut.result()
Disable Auto-Compaction During Large Backfills
If using LanceDB Cloud or Enterprise with auto-compaction enabled, consider disabling it during large backfill operations to avoid conflicts.