Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.lancedb.com/llms.txt

Use this file to discover all available pages before exploring further.

https://mintcdn.com/lancedb-bcbb4faf/6L0IRVkfdlgMU1Pw/static/assets/logo/huggingface-logo.svg?fit=max&auto=format&n=6L0IRVkfdlgMU1Pw&q=85&s=da940a105a40440f0cd1224d3fa4ae52

View on Hugging Face

Source dataset card and downloadable files for lance-format/oxford-pets-lance.
Lance-formatted version of the Oxford-IIIT Pet dataset — 7,390 cat & dog photos across 37 breeds — sourced from pcuenq/oxford-pets.

Schema

ColumnTypeNotes
idint64Row index
imagelarge_binaryInline JPEG bytes
label_namestringOne of 37 breeds, underscore-spaced (british_shorthair, golden_retriever, …)
is_dogbooltrue for dog breeds, false for cat breeds
pathstring?Original filename in the source dataset
image_embfixed_size_list<float32, 512>OpenCLIP ViT-B-32 embedding (cosine-normalized)

Pre-built indices

  • IVF_PQ on image_embmetric=cosine
  • BITMAP on label_name and is_dog

Quick start

import lance
ds = lance.dataset("hf://datasets/lance-format/oxford-pets-lance/data/train.lance")
print(ds.count_rows(), ds.schema.names, ds.list_indices())

Load with LanceDB

These tables can also be consumed by LanceDB, the multimodal lakehouse and embedded search library built on top of Lance, for simplified vector search and other queries.
import lancedb

db = lancedb.connect("hf://datasets/lance-format/oxford-pets-lance/data")
tbl = db.open_table("train")
print(f"LanceDB table opened with {len(tbl)} images")

Filter — only dogs, only golden retrievers, etc.

import lance
ds = lance.dataset("hf://datasets/lance-format/oxford-pets-lance/data/train.lance")
dogs = ds.scanner(filter="is_dog = true", columns=["label_name"], limit=5).to_table()
goldens = ds.scanner(filter="label_name = 'golden_retriever'", columns=["id"], limit=5).to_table()

Filter with LanceDB

import lancedb

db = lancedb.connect("hf://datasets/lance-format/oxford-pets-lance/data")
tbl = db.open_table("train")
dogs = tbl.search().where("is_dog = true").select(["label_name"]).limit(5).to_list()
goldens = tbl.search().where("label_name = 'golden_retriever'").select(["id"]).limit(5).to_list()
import lance, pyarrow as pa
ds = lance.dataset("hf://datasets/lance-format/oxford-pets-lance/data/train.lance")
emb_field = ds.schema.field("image_emb")
ref = ds.take([0], columns=["image_emb", "label_name"]).to_pylist()[0]
neighbors = ds.scanner(
    nearest={"column": "image_emb", "q": pa.array([ref["image_emb"]], type=emb_field.type)[0], "k": 5},
    columns=["id", "label_name"],
).to_table().to_pylist()
import lancedb

db = lancedb.connect("hf://datasets/lance-format/oxford-pets-lance/data")
tbl = db.open_table("train")

ref = tbl.search().limit(1).select(["image_emb", "label_name"]).to_list()[0]
query_embedding = ref["image_emb"]

results = (
    tbl.search(query_embedding)
    .metric("cosine")
    .select(["id", "label_name"])
    .limit(5)
    .to_list()
)

Source & license

Converted from pcuenq/oxford-pets. Released under CC BY-SA 4.0.

Citation

@inproceedings{parkhi2012cats,
  title={Cats and Dogs},
  author={Parkhi, Omkar M. and Vedaldi, Andrea and Zisserman, Andrew and Jawahar, C. V.},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2012}
}