Documentation Index
Fetch the complete documentation index at: https://docs.lancedb.com/llms.txt
Use this file to discover all available pages before exploring further.
View on Hugging Face
Source dataset card and downloadable files for
lance-format/coco-detection-2017-lance.detection-datasets/coco — with 123,287 images and the full per-image list of bounding boxes, category labels, and CLIP image embeddings, all stored inline.
Why this version?
Object detection datasets typically split images, annotations, and embeddings across multiple files (often three different formats: JPEG, JSON, NumPy). Lance keeps all of it in one tabular dataset:- one row per image,
- the JPEG bytes, the bounding box list, the category labels, and the CLIP image embedding all live as columns on the same row,
IVF_PQon the embedding column lets you do visual similarity search without leaving the dataset, andLABEL_LISToncategories_presentlets you filter to “images containing a dog and a frisbee” in milliseconds.
Splits
| Split | Rows |
|---|---|
train.lance | 117,000+ |
val.lance | 4,950+ |
detection-datasets/coco redistribution; box counts: ~860k train / ~37k val.)
Schema
| Column | Type | Notes |
|---|---|---|
id | int64 | Row index within split |
image | large_binary | Inline JPEG bytes |
image_id | int64 | COCO image id |
width, height | int32 | Image dimensions in pixels |
bboxes | list<list<float32, 4>> | Each box is [x_min, y_min, x_max, y_max] in absolute pixel coords |
categories | list<int32> | COCO 80-class id (0-79) |
category_names | list<string> | Human-readable class name per object (e.g. person, dog, …) |
areas | list<float32> | Bounding-box area (pixels²) |
num_objects | int32 | Number of annotated objects in the image |
categories_present | list<string> | Deduped class names — feeds the LABEL_LIST index for fast filtering |
image_emb | fixed_size_list<float32, 512> | OpenCLIP ViT-B-32 image embedding (cosine-normalized) |
Pre-built indices
IVF_PQonimage_emb—metric=cosineBTREEonimage_id,num_objectsLABEL_LISToncategories_present— supportsarray_has_any/array_has_allpredicates
Quick start
Load with LanceDB
These tables can also be consumed by LanceDB, the multimodal lakehouse and embedded search library built on top of Lance, for simplified vector search and other queries.Tip — for production use, download locally first.
Read one annotated image
Filter by classes (LABEL_LIST index)
Filter by classes with LanceDB
Visual similarity search
LanceDB visual similarity search
Why Lance?
- One dataset carries images + boxes + categories + areas + embeddings + indices — no JSON sidecars.
- On-disk vector and label-list indices live next to the data, so filters and ANN search work on local copies and on the Hub.
- Schema evolution: add columns (segmentation polygons, keypoints, panoptic ids, fresh embeddings) without rewriting the data.
Source & license
Converted fromdetection-datasets/coco. COCO annotations are released under CC BY 4.0; the underlying images are subject to Flickr terms of service. See the COCO Terms of Use before redistribution.