Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.lancedb.com/llms.txt

Use this file to discover all available pages before exploring further.

https://mintcdn.com/lancedb-bcbb4faf/6L0IRVkfdlgMU1Pw/static/assets/logo/huggingface-logo.svg?fit=max&auto=format&n=6L0IRVkfdlgMU1Pw&q=85&s=da940a105a40440f0cd1224d3fa4ae52

View on Hugging Face

Source dataset card and downloadable files for lance-format/food101-lance.
Lance-formatted version of Food-101 — 101,000 food photographs across 101 classes — sourced from ethz/food101. Inline JPEG bytes + CLIP image embeddings + IVF_PQ.

Splits

SplitRows
train.lance75,750
validation.lance25,250

Schema

ColumnTypeNotes
idint64Row index within split
imagelarge_binaryInline JPEG bytes
labelint32Class id (0-100)
label_namestringOne of 101 dish names (apple_pie, baby_back_ribs, …)
image_embfixed_size_list<float32, 512>OpenCLIP ViT-B-32 embedding (cosine-normalized)

Pre-built indices

  • IVF_PQ on image_embmetric=cosine
  • BTREE on label
  • BITMAP on label_name

Quick start

import lance
ds = lance.dataset("hf://datasets/lance-format/food101-lance/data/validation.lance")
print(ds.count_rows(), ds.schema.names, ds.list_indices())

Load with LanceDB

These tables can also be consumed by LanceDB, the multimodal lakehouse and embedded search library built on top of Lance, for simplified vector search and other queries.
import lancedb

db = lancedb.connect("hf://datasets/lance-format/food101-lance/data")
tbl = db.open_table("validation")
print(f"LanceDB table opened with {len(tbl)} images")

Filter by class

import lance
ds = lance.dataset("hf://datasets/lance-format/food101-lance/data/validation.lance")
sushi = ds.scanner(filter="label_name = 'sushi'", columns=["id"], limit=5).to_table()

Filter by class with LanceDB

import lancedb

db = lancedb.connect("hf://datasets/lance-format/food101-lance/data")
tbl = db.open_table("validation")
sushi = tbl.search().where("label_name = 'sushi'").select(["id"]).limit(5).to_list()
import lance, pyarrow as pa
ds = lance.dataset("hf://datasets/lance-format/food101-lance/data/validation.lance")
emb_field = ds.schema.field("image_emb")
ref = ds.take([0], columns=["image_emb", "label_name"]).to_pylist()[0]
query = pa.array([ref["image_emb"]], type=emb_field.type)
neighbors = ds.scanner(
    nearest={"column": "image_emb", "q": query[0], "k": 5, "nprobes": 16, "refine_factor": 30},
    columns=["id", "label_name"],
).to_table().to_pylist()
import lancedb

db = lancedb.connect("hf://datasets/lance-format/food101-lance/data")
tbl = db.open_table("validation")

ref = tbl.search().limit(1).select(["image_emb", "label_name"]).to_list()[0]
query_embedding = ref["image_emb"]

results = (
    tbl.search(query_embedding)
    .metric("cosine")
    .select(["id", "label_name"])
    .limit(5)
    .to_list()
)

Source & license

Converted from ethz/food101. The Food-101 dataset is by Bossard et al. (ETH Zurich) — see the original dataset page for licensing details.

Citation

@inproceedings{bossard2014food,
  title={Food-101 -- Mining Discriminative Components with Random Forests},
  author={Bossard, Lukas and Guillaumin, Matthieu and Van Gool, Luc},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2014}
}