> ## Documentation Index
> Fetch the complete documentation index at: https://docs.lancedb.com/llms.txt
> Use this file to discover all available pages before exploring further.

# LeRobot X-VLA Soft-Fold

> A Lance-formatted version of lerobot/xvla-soft-fold — a multi-camera robotics dataset from the X-VLA project — packaged as three Lance tables for efficient frame-level training, episode-level trajectory loading, and direct access to the original…

<Card title="View on Hugging Face" icon="https://mintcdn.com/lancedb-bcbb4faf/6L0IRVkfdlgMU1Pw/static/assets/logo/huggingface-logo.svg?fit=max&auto=format&n=6L0IRVkfdlgMU1Pw&q=85&s=da940a105a40440f0cd1224d3fa4ae52" href="https://huggingface.co/datasets/lance-format/lerobot-xvla-soft-fold" width="640" height="640" data-path="static/assets/logo/huggingface-logo.svg">
  Source dataset card and downloadable files for `lance-format/lerobot-xvla-soft-fold`.
</Card>

A Lance-formatted version of [`lerobot/xvla-soft-fold`](https://huggingface.co/datasets/lerobot/xvla-soft-fold) — a multi-camera robotics dataset from the [X-VLA](https://thu-air-dream.github.io/X-VLA/) project — packaged as three Lance tables for efficient frame-level training, episode-level trajectory loading, and direct access to the original encoded videos. Available directly from the Hub at `hf://datasets/lance-format/lerobot-xvla-soft-fold/data`.

* **1,542 episodes**
* **2,852,512 frames** at **20 FPS**
* **3 camera streams per episode** — `cam_high`, `cam_left_wrist`, `cam_right_wrist`
* **Robot state and action vectors** aligned to frame timestamps

## Key features

* **Three-table layout** — `frames`, `episodes`, `videos` — so frame-level training, episode-level trajectory work, and raw video access live side-by-side without scattered parquet shards or sidecar MP4 directories.
* **Per-camera inline MP4 segments** in `episodes.lance`, with `from_timestamp` / `to_timestamp` bounds per camera and per episode, surfaced as lazy `BlobFile` handles via `take_blobs` so metadata scans never read the bytes.
* **Frame-level observations and actions** in `frames.lance` with stable `episode_index`, `frame_index`, and `index` columns for joining or temporal iteration.
* **Source MP4 provenance** in `videos.lance` (`relative_path`, `filename`, `file_size_bytes`, `sha256`) alongside the raw bytes, for integrity checks or custom decode pipelines.

## Tables

| Table            | Rows      | Purpose                                                                              |
| ---------------- | --------- | ------------------------------------------------------------------------------------ |
| `frames.lance`   | 2,852,512 | Per-frame observations, actions, episode/task indices                                |
| `episodes.lance` | 1,542     | Full per-episode trajectories plus per-camera MP4 segment blobs and timestamp bounds |
| `videos.lance`   | 104       | Raw source MP4 files (one row per source MP4) with file-level provenance             |

Use `frames.lance` for low-level training (loss-per-timestep, state-conditioned policies). Use `episodes.lance` when you need the full trajectory and the matching per-camera video segments together. Use `videos.lance` when you want direct access to the original encoded video files.

## Schemas

### `frames.lance`

| Column              | Type            | Notes                               |
| ------------------- | --------------- | ----------------------------------- |
| `observation_state` | `list<float32>` | Robot state vector for that frame   |
| `action`            | `list<float32>` | Action vector for that frame        |
| `time_stamp`        | `float`         | Original source timestamp field     |
| `timestamp`         | `float`         | Canonical frame timestamp (seconds) |
| `frame_index`       | `int64`         | Frame index within episode          |
| `episode_index`     | `int64`         | Parent episode id                   |
| `index`             | `int64`         | Global frame index                  |
| `task_index`        | `int64`         | Task id                             |

### `episodes.lance`

| Column                                              | Type                          | Notes                                    |
| --------------------------------------------------- | ----------------------------- | ---------------------------------------- |
| `episode_index`                                     | `int64`                       | Episode id                               |
| `task_index`                                        | `int64`                       | Task id                                  |
| `fps`                                               | `int32`                       | Frame rate of the episode video segments |
| `timestamps`                                        | `list<float32>`               | Per-frame timestamps                     |
| `actions`                                           | `list<list<float32>>`         | Per-frame action vectors                 |
| `observation_state`                                 | `list<list<float32>>`         | Per-frame robot state vectors            |
| `observation_images_cam_high_video_blob`            | `large_binary` (blob-encoded) | Inline MP4 segment for `cam_high`        |
| `observation_images_cam_high_from_timestamp`        | `float64`                     | `cam_high` segment start time            |
| `observation_images_cam_high_to_timestamp`          | `float64`                     | `cam_high` segment end time              |
| `observation_images_cam_left_wrist_video_blob`      | `large_binary` (blob-encoded) | Inline MP4 segment for `cam_left_wrist`  |
| `observation_images_cam_left_wrist_from_timestamp`  | `float64`                     | `cam_left_wrist` segment start time      |
| `observation_images_cam_left_wrist_to_timestamp`    | `float64`                     | `cam_left_wrist` segment end time        |
| `observation_images_cam_right_wrist_video_blob`     | `large_binary` (blob-encoded) | Inline MP4 segment for `cam_right_wrist` |
| `observation_images_cam_right_wrist_from_timestamp` | `float64`                     | `cam_right_wrist` segment start time     |
| `observation_images_cam_right_wrist_to_timestamp`   | `float64`                     | `cam_right_wrist` segment end time       |

### `videos.lance`

| Column                      | Type                          | Notes                           |
| --------------------------- | ----------------------------- | ------------------------------- |
| `camera_angle`              | `string`                      | Camera key (e.g. `cam_high`)    |
| `chunk_index`, `file_index` | `int32`                       | IDs parsed from the source path |
| `relative_path`, `filename` | `string`                      | Provenance                      |
| `file_size_bytes`           | `int64`                       | Source MP4 size                 |
| `sha256`                    | `string`                      | SHA256 of the MP4 bytes         |
| `video_blob`                | `large_binary` (blob-encoded) | Raw source MP4 bytes            |

## Pre-built indices

None bundled. Build indices on a local copy if a workload calls for them — e.g., a `BTREE` on `frames.episode_index` for fast per-episode lookup, or a vector index after attaching observation embeddings via Evolve.

## Why Lance?

1. **Blazing Fast Random Access**: Optimized for fetching scattered rows, making it ideal for random sampling, real-time ML serving, and interactive applications without performance degradation.
2. **Native Multimodal Support**: Store text, embeddings, and other data types together in a single file. Large binary objects are loaded lazily, and vectors are optimized for fast similarity search.
3. **Native Index Support**: Lance comes with fast, on-disk, scalable vector and FTS indexes that sit right alongside the dataset on the Hub, so you can share not only your data but also your embeddings and indexes without your users needing to recompute them.
4. **Efficient Data Evolution**: Add new columns and backfill data without rewriting the entire dataset. This is perfect for evolving ML features, adding new embeddings, or introducing moderation tags over time.
5. **Versatile Querying**: Supports combining vector similarity search, full-text search, and SQL-style filtering in a single query, accelerated by on-disk indexes.
6. **Data Versioning**: Every mutation commits a new version; previous versions remain intact on disk. Tags pin a snapshot by name, so retrieval systems and training runs can reproduce against an exact slice of history.

## Load with `datasets.load_dataset`

You can load Lance datasets via the standard HuggingFace `datasets` interface, suitable when your pipeline already speaks `Dataset` / `IterableDataset` or you want a quick streaming sample. Each Lance table is a separate `datasets` config.

```python theme={"theme":{"light":"vitesse-light","dark":"catppuccin-mocha"}}
import datasets

hf_ds = datasets.load_dataset("lance-format/lerobot-xvla-soft-fold", split="frames", streaming=True)
for row in hf_ds.take(3):
    print(row["episode_index"], row["frame_index"], row["action"])
```

## Load with LanceDB

LanceDB is the embedded retrieval library built on top of the Lance format ([docs](https://lancedb.com/docs)), and is the interface most users interact with. Each `.lance` file in `data/` is a table — open by name. The same handles are used by the Search, Curate, Evolve, Train, Versioning, and Materialize-a-subset sections below.

```python theme={"theme":{"light":"vitesse-light","dark":"catppuccin-mocha"}}
import lancedb

db = lancedb.connect("hf://datasets/lance-format/lerobot-xvla-soft-fold/data")

frames    = db.open_table("frames")
episodes  = db.open_table("episodes")
videos    = db.open_table("videos")
print(len(frames), len(episodes), len(videos))
```

## Load with Lance

`pylance` is the Python binding for the Lance format and works directly with the format's lower-level APIs. Reach for it when you want to inspect dataset internals — schema, scanner, fragments, the list of pre-built indices — or when you need the blob-level `take_blobs` entry point that streams MP4 bytes lazily from inline storage.

```python theme={"theme":{"light":"vitesse-light","dark":"catppuccin-mocha"}}
import lance

ds = lance.dataset("hf://datasets/lance-format/lerobot-xvla-soft-fold/data/frames.lance")
print(ds.count_rows(), ds.schema.names)
print(ds.list_indices())
```

> **Tip — for production use, download locally first.** Streaming from the Hub works for exploration, but heavy random access to video segments and any kind of indexed search are dramatically faster against a local copy. The full dataset is **>50 GB**, so ensure you have sufficient disk space:
>
> ```bash theme={"theme":{"light":"vitesse-light","dark":"catppuccin-mocha"}}
> hf download lance-format/lerobot-xvla-soft-fold --repo-type dataset --local-dir ./lerobot-xvla-soft-fold
> ```
>
> Then point Lance or LanceDB at `./lerobot-xvla-soft-fold/data`. For most workflows, the Materialize-a-subset section at the end of this card is a better starting point than downloading the full corpus.

## Search

This dataset does not ship a vector index out of the box — observation states are low-dimensional and most robotics workflows look up by index rather than by similarity. The bundled identifier columns (`episode_index`, `task_index`, `frame_index`) make exact lookups a single filtered scan. The example below pulls the first few frames of episode 30 from the frames table.

```python theme={"theme":{"light":"vitesse-light","dark":"catppuccin-mocha"}}
import lancedb

db = lancedb.connect("hf://datasets/lance-format/lerobot-xvla-soft-fold/data")
frames = db.open_table("frames")

slice_ = (
    frames.search()
    .where("episode_index = 30 AND frame_index < 10", prefilter=True)
    .select(["episode_index", "frame_index", "timestamp", "action", "observation_state"])
    .limit(10)
    .to_list()
)
for r in slice_:
    print(r["frame_index"], r["timestamp"], r["action"])
```

For similarity-style search across states or actions, attach an embedding column via Evolve and build an `IVF_PQ` index on it. For visual similarity over rendered frames, the pre-extracted-frames pattern in Train below produces a table that can carry a learned image embedding alongside the pixels.

## Curate

A typical curation pass for a robotics workflow starts with an episode-level filter — pick episodes with a particular task, length, or initial condition — and then either iterates frames or pulls the matching video segments. Stacking predicates inside a single filtered scan keeps the result small and explicit, and the bounded `.limit(...)` makes it cheap to inspect.

```python theme={"theme":{"light":"vitesse-light","dark":"catppuccin-mocha"}}
import lancedb

db = lancedb.connect("hf://datasets/lance-format/lerobot-xvla-soft-fold/data")
episodes = db.open_table("episodes")

ep_rows = (
    episodes.search()
    .where("task_index = 0 AND fps = 20", prefilter=True)
    .select([
        "episode_index",
        "observation_images_cam_high_from_timestamp",
        "observation_images_cam_high_to_timestamp",
    ])
    .limit(20)
    .with_row_id(True)
    .to_list()
)
print(f"{len(ep_rows)} episodes selected")
for r in ep_rows[:3]:
    print(
        f"  ep {r['episode_index']}  "
        f"{r['observation_images_cam_high_from_timestamp']:.2f}s → "
        f"{r['observation_images_cam_high_to_timestamp']:.2f}s"
    )
```

Neither this scan nor any of the per-camera segment columns are read. The MP4 segments live in the blob-encoded `_video_blob` columns and stay on disk until something explicitly asks for them — which makes "find me the right episodes" a metadata-only operation against a multi-million-frame corpus.

## Evolve

Lance stores each column independently, so a new column can be appended without rewriting the existing data. The lightest form is a SQL expression: derive the new column from columns that already exist, and Lance computes it once and persists it. The example below adds an `episode_duration` column to the episodes table from the existing `cam_high` timestamp bounds.

> **Note:** Mutations require a local copy of the dataset, since the Hub mount is read-only. See the Materialize-a-subset section at the end of this card for a streaming pattern that downloads only the rows and columns you need.

```python theme={"theme":{"light":"vitesse-light","dark":"catppuccin-mocha"}}
import lancedb

db = lancedb.connect("./lerobot-xvla-soft-fold/data")  # local copy required for writes
episodes = db.open_table("episodes")

episodes.add_columns({
    "episode_duration_s": (
        "observation_images_cam_high_to_timestamp - "
        "observation_images_cam_high_from_timestamp"
    ),
    "is_long_episode": (
        "(observation_images_cam_high_to_timestamp - "
        " observation_images_cam_high_from_timestamp) > 120.0"
    ),
})
```

If the values you want to attach already live in another table (offline reward labels, classifier predictions, learned observation embeddings), merge them in by joining on the appropriate key — `index` for frames or `episode_index` for episodes:

```python theme={"theme":{"light":"vitesse-light","dark":"catppuccin-mocha"}}
import pyarrow as pa

ep_labels = pa.table({
    "episode_index": pa.array([0, 1, 2]),
    "outcome": pa.array(["success", "partial", "success"]),
})
episodes.merge(ep_labels, on="episode_index")
```

The original columns and the inline video blobs are untouched, so existing code that does not reference the new columns continues to work unchanged. For column values that require a Python computation (e.g., running a visual encoder over the decoded video frames), Lance provides a batch-UDF API — see the [Lance data evolution docs](https://lance.org/guide/data_evolution/).

## Train

A common pattern for vision-language-action training is to pre-extract decoded frame pixels once into a derived LanceDB table — one row per frame, with the per-frame `action` and `observation_state` already joined in, and one column per camera holding the decoded image — and train against that table with the regular projection-based dataloader. `take_blobs` is the mechanism that makes the extraction step tractable: each episode's per-camera MP4 segment is randomly addressable in `episodes.lance` (the `*_from_timestamp` / `*_to_timestamp` columns give the segment bounds), so the pass can subset bytes on demand and write decoded frames into a fresh table without an external file store. Other workflows project the `*_video_blob` columns from `episodes.lance` directly and decode at the batch boundary, or skip pixels entirely and train a state-only policy on `frames.lance` — the right shape is workload-specific. The actual training loop is the same `Permutation.identity(tbl).select_columns(...)` snippet in every case; only the source table and the column list change.

For a state-only policy, the frames table is already in the right shape — no pre-extraction needed:

```python theme={"theme":{"light":"vitesse-light","dark":"catppuccin-mocha"}}
import lancedb
from lancedb.permutation import Permutation
from torch.utils.data import DataLoader

db = lancedb.connect("hf://datasets/lance-format/lerobot-xvla-soft-fold/data")
frames = db.open_table("frames")

train_ds = Permutation.identity(frames).select_columns(["observation_state", "action"])
loader = DataLoader(train_ds, batch_size=256, shuffle=True, num_workers=4)
```

For a vision-language-action policy, train against a pre-extracted frames-with-pixels table that joins each frame's three decoded camera images to its `action` and `observation_state`. Picking the cameras the model actually conditions on is then a column projection — `cam_high` alone, all three, or any subset:

```python theme={"theme":{"light":"vitesse-light","dark":"catppuccin-mocha"}}
import lancedb
from lancedb.permutation import Permutation
from torch.utils.data import DataLoader

db = lancedb.connect("./lerobot-xvla-frames")   # local table produced by the one-time extraction
tbl = db.open_table("train")

train_ds = Permutation.identity(tbl).select_columns(
    ["cam_high", "cam_left_wrist", "cam_right_wrist", "observation_state", "action"]
)
loader = DataLoader(train_ds, batch_size=32, shuffle=True, num_workers=4)
```

The inline `_video_blob` storage and `take_blobs` still earn their place outside of the training loop — visualizing an episode in a notebook, sampling for human review, one-off evaluation, and the pre-extraction step itself — but they are not the dataloader.

## Versioning

Every mutation to a Lance table, whether it adds a column, merges labels, or builds an index, commits a new version. Each of `frames`, `episodes`, and `videos` is versioned independently, so a column added to `frames` does not bump the version of `episodes`. You can list versions and inspect the history directly from the Hub copy; creating new tags requires a local copy since tags are writes.

```python theme={"theme":{"light":"vitesse-light","dark":"catppuccin-mocha"}}
import lancedb

db = lancedb.connect("hf://datasets/lance-format/lerobot-xvla-soft-fold/data")
frames = db.open_table("frames")

print("frames version:", frames.version)
print("history:", frames.list_versions())
print("tags:", frames.tags.list())
```

Once you have a local copy, tag the table for reproducibility:

```python theme={"theme":{"light":"vitesse-light","dark":"catppuccin-mocha"}}
local_db = lancedb.connect("./lerobot-xvla-soft-fold/data")
local_frames = local_db.open_table("frames")
local_frames.tags.create("xvla-v1", local_frames.version)
```

Reopen by tag or by version number against either the Hub copy or a local one:

```python theme={"theme":{"light":"vitesse-light","dark":"catppuccin-mocha"}}
frames_v1 = db.open_table("frames", version="xvla-v1")
frames_v5 = db.open_table("frames", version=5)
```

Pinning supports two workflows. A policy locked to `xvla-v1` keeps reproducing the same behavior while the dataset evolves in parallel. A training experiment pinned to the same tag can be rerun later against the exact same frames and segments, so changes in metrics reflect model changes rather than data drift.

## Materialize a subset

At >50 GB across three tables and millions of frames, few workflows want the full corpus on local disk. The practical entry point is to stream a filtered query through `.to_batches()` into a new local table; only the projected columns and matching row groups cross the wire, and the bytes never fully materialize in Python memory — including the per-camera `_video_blob` columns on `episodes.lance`, which stream through Arrow record batches rather than being assembled in a single buffer.

```python theme={"theme":{"light":"vitesse-light","dark":"catppuccin-mocha"}}
import lancedb

remote_db = lancedb.connect("hf://datasets/lance-format/lerobot-xvla-soft-fold/data")
remote_episodes = remote_db.open_table("episodes")

batches = (
    remote_episodes.search()
    .where("task_index = 0 AND episode_index < 50")
    .select([
        "episode_index", "task_index", "fps", "timestamps", "actions", "observation_state",
        "observation_images_cam_high_video_blob",
        "observation_images_cam_high_from_timestamp",
        "observation_images_cam_high_to_timestamp",
    ])
    .to_batches()
)

local_db = lancedb.connect("./xvla-task0-subset")
local_db.create_table("episodes", batches)
```

The resulting `./xvla-task0-subset` is a first-class LanceDB database. Every snippet in the Evolve, Train, and Versioning sections above works against it by swapping `hf://datasets/lance-format/lerobot-xvla-soft-fold/data` for `./xvla-task0-subset`. The same pattern applies to `frames` and `videos` — narrow each table to the rows your workload needs, and the resulting database stays small enough to index and iterate cheaply.

## Source & license

Converted from [`lerobot/xvla-soft-fold`](https://huggingface.co/datasets/lerobot/xvla-soft-fold) (LeRobot v3.0 dataset format), originally released as part of the [X-VLA](https://thu-air-dream.github.io/X-VLA/) project. Apache 2.0.

## Citation

```
@article{zheng2025xvla,
  title={X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model},
  author={Zheng and others},
  journal={arXiv preprint arXiv:2510.10274},
  year={2025}
}

@misc{cadene2024lerobot,
  title={LeRobot: State-of-the-art Machine Learning for Real-World Robotics in PyTorch},
  author={R{\'e}mi Cadene and Simon Alibert and Alexander Soare and Quentin Gallou{\'e}dec and Adil Zouitine and Steven Palma and Pepijn Kooijmans and Michel Aractingi and Mustafa Shukor and Martino Russi and Francesco Capuano and Caroline Pascal and Jade Choghari and Jess Moss and Thomas Wolf},
  year={2024},
  url={https://github.com/huggingface/lerobot}
}
```
