Documentation Index
Fetch the complete documentation index at: https://docs.lancedb.com/llms.txt
Use this file to discover all available pages before exploring further.
View on Hugging Face
Source dataset card and downloadable files for
lance-format/pascal-voc-2012-segmentation-lance.nateraw/pascal-voc-2012) — 2,913 image / mask pairs with CLIP image embeddings stored inline and a pre-built IVF_PQ ANN index.
Why segmentation?
VOC 2012 ships several tasks (classification, detection, segmentation, action). We focus on the semantic segmentation subset because every row carries a paired mask image and the dataset is small enough to convert quickly with full embeddings — useful as a smoke test or a small benchmark.Splits
| Split | Rows |
|---|---|
train.lance | 1,464 |
validation.lance | 1,449 |
Schema
| Column | Type | Notes |
|---|---|---|
id | int64 | Row index within the split |
image | large_binary | Inline JPEG bytes |
mask | large_binary | Inline PNG bytes — class id per pixel (0=background, 1-20=VOC classes, 255=void) |
image_emb | fixed_size_list<float32, 512> | OpenCLIP ViT-B-32 image embedding (cosine-normalized) |
aeroplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, diningtable, dog, horse, motorbike, person, pottedplant, sheep, sofa, train, tvmonitor.
Pre-built indices
IVF_PQonimage_emb—metric=cosine
Note: the small dataset size (≤1,464 rows per split) is below Lance’s default partition count, so the helper falls back to a smallernum_partitionsautomatically. For higher recall, build the index withnum_partitions=16against a local copy.
Quick start
Load with LanceDB
These tables can also be consumed by LanceDB, the multimodal lakehouse and embedded search library built on top of Lance, for simplified vector search and other queries.Working with images and masks
Vector search example
LanceDB vector search
Why Lance?
- One dataset carries images + masks + embeddings + indices — no sidecar files.
- On-disk vector and full-text indices live next to the data, so search works on local copies and on the Hub.
- Schema evolution: add columns (instance masks, alternate embeddings, model predictions) without rewriting the data.
Source & license
Converted fromnateraw/pascal-voc-2012. The Pascal VOC dataset is released under its own custom license — please review before redistribution.