Documentation Index
Fetch the complete documentation index at: https://docs.lancedb.com/llms.txt
Use this file to discover all available pages before exploring further.
View on Hugging Face
Source dataset card and downloadable files for lance-format/oxford-pets-lance.
Lance-formatted version of the Oxford-IIIT Pet dataset — 7,390 cat & dog photos across 37 breeds — sourced from pcuenq/oxford-pets.
Schema
| Column | Type | Notes |
|---|
id | int64 | Row index |
image | large_binary | Inline JPEG bytes |
label_name | string | One of 37 breeds, underscore-spaced (british_shorthair, golden_retriever, …) |
is_dog | bool | true for dog breeds, false for cat breeds |
path | string? | Original filename in the source dataset |
image_emb | fixed_size_list<float32, 512> | OpenCLIP ViT-B-32 embedding (cosine-normalized) |
Pre-built indices
IVF_PQ on image_emb — metric=cosine
BITMAP on label_name and is_dog
Quick start
import lance
ds = lance.dataset("hf://datasets/lance-format/oxford-pets-lance/data/train.lance")
print(ds.count_rows(), ds.schema.names, ds.list_indices())
Load with LanceDB
These tables can also be consumed by LanceDB, the multimodal lakehouse and embedded search library built on top of Lance, for simplified vector search and other queries.
import lancedb
db = lancedb.connect("hf://datasets/lance-format/oxford-pets-lance/data")
tbl = db.open_table("train")
print(f"LanceDB table opened with {len(tbl)} images")
Filter — only dogs, only golden retrievers, etc.
import lance
ds = lance.dataset("hf://datasets/lance-format/oxford-pets-lance/data/train.lance")
dogs = ds.scanner(filter="is_dog = true", columns=["label_name"], limit=5).to_table()
goldens = ds.scanner(filter="label_name = 'golden_retriever'", columns=["id"], limit=5).to_table()
Filter with LanceDB
import lancedb
db = lancedb.connect("hf://datasets/lance-format/oxford-pets-lance/data")
tbl = db.open_table("train")
dogs = tbl.search().where("is_dog = true").select(["label_name"]).limit(5).to_list()
goldens = tbl.search().where("label_name = 'golden_retriever'").select(["id"]).limit(5).to_list()
Visual similarity search
import lance, pyarrow as pa
ds = lance.dataset("hf://datasets/lance-format/oxford-pets-lance/data/train.lance")
emb_field = ds.schema.field("image_emb")
ref = ds.take([0], columns=["image_emb", "label_name"]).to_pylist()[0]
neighbors = ds.scanner(
nearest={"column": "image_emb", "q": pa.array([ref["image_emb"]], type=emb_field.type)[0], "k": 5},
columns=["id", "label_name"],
).to_table().to_pylist()
LanceDB visual similarity search
import lancedb
db = lancedb.connect("hf://datasets/lance-format/oxford-pets-lance/data")
tbl = db.open_table("train")
ref = tbl.search().limit(1).select(["image_emb", "label_name"]).to_list()[0]
query_embedding = ref["image_emb"]
results = (
tbl.search(query_embedding)
.metric("cosine")
.select(["id", "label_name"])
.limit(5)
.to_list()
)
Source & license
Converted from pcuenq/oxford-pets. Released under CC BY-SA 4.0.
Citation
@inproceedings{parkhi2012cats,
title={Cats and Dogs},
author={Parkhi, Omkar M. and Vedaldi, Andrea and Zisserman, Andrew and Jawahar, C. V.},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2012}
}