> ## Documentation Index
> Fetch the complete documentation index at: https://docs.lancedb.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Advanced Configuration

> Learn about environment variables for configuring Geneva behavior.

Geneva supports various environment variables that start with `GENEVA_` to configure advanced behavior and fine-tune system settings.

<Tip>
  All `GENEVA_` environment variables are optional and have sensible defaults. Only set them if you need to override the default behavior.
</Tip>

## Admission Control

Admission control validates cluster resources before starting jobs to prevent failures due to insufficient resources.

| Variable                    | Default | Description                                                                                                                                         |
| --------------------------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
| `GENEVA_ADMISSION__CHECK`   | `true`  | Enable admission control checks. Set to `false` to skip all checks.                                                                                 |
| `GENEVA_ADMISSION__STRICT`  | `true`  | If `true`, reject the job with `ResourcesUnavailableError` when resources are insufficient. If `false`, log a warning but allow the job to proceed. |
| `GENEVA_ADMISSION__TIMEOUT` | `3.0`   | Timeout in seconds for Ray API calls during admission control checks. Prevents hanging when the cluster is in a bad state.                          |

## Commit and Retry Configuration

Control retry behavior for commits and version conflicts.

| Variable                              | Default | Description                                                                                                                                                                                                      |
| ------------------------------------- | ------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `GENEVA_COMMIT_MAX_RETRIES`           | `12`    | Maximum number of retries for commit operations. With exponential backoff (1s, 2s, 4s, 8s, 16s, then 16s capped), 12 retries gives \~2.5 minutes total wait time before giving up.                               |
| `GENEVA_VERSION_CONFLICT_MAX_RETRIES` | `10`    | Maximum number of retries for version conflicts during commit. Version conflicts occur when concurrent backfills commit to the same fragments. Prevents infinite loops when concurrent commits keep conflicting. |
| `GENEVA_WRITER_STALL_IDLE_ROUNDS`     | `6`     | Number of idle rounds (5s each) before considering a writer stalled during drain. With many concurrent backfills, resource contention can slow writers without them being truly stalled.                         |

## Lance Retry Configuration

This section configures retry logic for Lance I/O operations. Retries occur on `OSError`, `ValueError`, and `RuntimeError("Too many concurrent writers")` exceptions, and are retried with exponential backoff with jitter.

| Variable                          | Default | Description                                                                              |
| --------------------------------- | ------- | ---------------------------------------------------------------------------------------- |
| `GENEVA_RETRY_LANCE_ATTEMPTS`     | `7`     | Maximum number of retry attempts for Lance I/O operations.                               |
| `GENEVA_RETRY_LANCE_INITIAL_SECS` | `0.5`   | Initial wait time in seconds for exponential backoff when retrying Lance I/O operations. |
| `GENEVA_RETRY_LANCE_MAX_SECS`     | `120.0` | Maximum wait time in seconds for exponential backoff when retrying Lance I/O operations. |

## Checkpoint Storage

<Warning>
  Checkpoint storage configuration is **experimental**. The environment variable names and behavior may change in a future release.
</Warning>

Configure where Geneva stores checkpoint data during job execution. Checkpoints enable fault-tolerant processing by saving intermediate results so that failed jobs can resume without reprocessing completed work.

By default, Geneva stores checkpoints in a `_ckp/` subdirectory inside the table's own storage location. This means checkpoints share the same bucket and IOPS budget as the table data. You can override this to store checkpoints in a separate location.

| Variable                              | Default               | Description                                                                                                                                                                                              |
| ------------------------------------- | --------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `JOB__CHECKPOINT__OBJECT_STORE__PATH` | *(table dir)*`/_ckp/` | URI where checkpoint data is stored. When set, overrides the default in-table checkpoint location. Accepts any URI supported by Lance (e.g., `gs://bucket/path/checkpoints`, `s3://bucket/checkpoints`). |

<Tip>
  This variable maps to the config path `job.checkpoint.object_store.path`. It can also be set via config files in `.config/` or `pyproject.toml` under the `[geneva]` section.
</Tip>

### Why use a separate checkpoint path?

At scale, checkpoint I/O and data I/O compete for the same object store IOPS budget when they share a bucket prefix. Setting `JOB__CHECKPOINT__OBJECT_STORE__PATH` to a **different bucket or prefix** decouples checkpoint I/O from data I/O, giving each its own IOPS budget and preventing shared-prefix rate limiting.

```bash theme={"theme":{"light":"vitesse-light","dark":"catppuccin-mocha"}}
# Example: separate checkpoint bucket from dataset storage
JOB__CHECKPOINT__OBJECT_STORE__PATH=gs://my-checkpoints-bucket/ckpts
```

Equivalent programmatic configuration:

```python theme={"theme":{"light":"vitesse-light","dark":"catppuccin-mocha"}}
from geneva.config import override_config_kv

override_config_kv({
    "job.checkpoint.object_store.path": "gs://my-checkpoints-bucket/ckpts",
})
```

## Other Configuration

| Variable                      | Default     | Description                                                                                                                                        |
| ----------------------------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
| `GENEVA_RAY_INIT_MAX_RETRIES` | `5`         | Maximum number of retry attempts for `ray.init()` connection failures. Useful when connecting to Ray clusters that may be temporarily unavailable. |
| `GENEVA_K8S_AUTH_MAX_RETRIES` | `3`         | Maximum number of retries for Kubernetes authentication operations. Must be at least 1.                                                            |
| `GENEVA_CONFIG_DIR`           | `./.config` | Directory path where Geneva looks for configuration files (`.yaml`, `.json`, `.toml`). Can be an absolute or relative path.                        |
