Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.lancedb.com/llms.txt

Use this file to discover all available pages before exploring further.

https://mintcdn.com/lancedb-bcbb4faf/6L0IRVkfdlgMU1Pw/static/assets/logo/huggingface-logo.svg?fit=max&auto=format&n=6L0IRVkfdlgMU1Pw&q=85&s=da940a105a40440f0cd1224d3fa4ae52

View on Hugging Face

Source dataset card and downloadable files for lance-format/eurosat-lance.
Lance-formatted version of EuroSAT — Sentinel-2 satellite imagery (RGB) covering 27,000 64×64 tiles across 10 land-cover classes, sourced from blanchon/EuroSAT_RGB. This is the canonical “geo” tile-level classification benchmark, useful for remote sensing pre-training and small-tile retrieval research.

Splits

SplitRows
train.lance16,200
validation.lance5,400
test.lance5,400

Schema

ColumnTypeNotes
idint64Row index within split
imagelarge_binaryInline JPEG bytes (64×64 RGB Sentinel-2)
labelint32Class id (0-9)
label_namestringAnnual_Crop, Forest, Herbaceous_Vegetation, Highway, Industrial_Buildings, Pasture, Permanent_Crop, Residential_Buildings, River, SeaLake
image_embfixed_size_list<float32, 512>OpenCLIP ViT-B-32 image embedding (cosine-normalized)

Pre-built indices

  • IVF_PQ on image_embmetric=cosine
  • BTREE on label
  • BITMAP on label_name

Quick start

import lance

ds = lance.dataset("hf://datasets/lance-format/eurosat-lance/data/train.lance")
print(ds.count_rows(), ds.schema.names, ds.list_indices())

Load with LanceDB

These tables can also be consumed by LanceDB, the multimodal lakehouse and embedded search library built on top of Lance, for simplified vector search and other queries.
import lancedb

db = lancedb.connect("hf://datasets/lance-format/eurosat-lance/data")
tbl = db.open_table("train")
print(f"LanceDB table opened with {len(tbl)} satellite tiles")
import lance
import pyarrow as pa

ds = lance.dataset("hf://datasets/lance-format/eurosat-lance/data/train.lance")
emb_field = ds.schema.field("image_emb")
ref = ds.take([0], columns=["image_emb", "label_name"]).to_pylist()[0]
query = pa.array([ref["image_emb"]], type=emb_field.type)

hits = ds.scanner(
    nearest={"column": "image_emb", "q": query[0], "k": 5, "nprobes": 16, "refine_factor": 30},
    columns=["id", "label_name"],
).to_table().to_pylist()
print(f"reference: {ref['label_name']}")
for h in hits:
    print(h)
import lancedb

db = lancedb.connect("hf://datasets/lance-format/eurosat-lance/data")
tbl = db.open_table("train")

ref = tbl.search().limit(1).select(["image_emb", "label_name"]).to_list()[0]
query_embedding = ref["image_emb"]

results = (
    tbl.search(query_embedding)
    .metric("cosine")
    .select(["id", "label_name"])
    .limit(5)
    .to_list()
)

Filter by class

import lance
ds = lance.dataset("hf://datasets/lance-format/eurosat-lance/data/train.lance")
rivers = ds.scanner(filter="label_name = 'River'", columns=["id"], limit=5).to_table()

Filter by class with LanceDB

import lancedb

db = lancedb.connect("hf://datasets/lance-format/eurosat-lance/data")
tbl = db.open_table("train")
rivers = tbl.search().where("label_name = 'River'").select(["id"]).limit(5).to_list()

Why Lance?

  • One dataset for tiles + embeddings + indices — no sidecar TIF folder per class.
  • On-disk vector and FTS indices live next to the data, so search works on local copies and on the Hub.
  • Schema evolution: add columns (multi-spectral channels, model predictions, fresh embeddings) without rewriting the data.

Source & license

Converted from blanchon/EuroSAT_RGB. EuroSAT is released under the MIT license by Helber et al. The underlying Sentinel-2 imagery is © European Space Agency, made available under the Copernicus open data policy.

Citation

@inproceedings{helber2019eurosat,
  title={EuroSAT: A novel dataset and deep learning benchmark for land use and land cover classification},
  author={Helber, Patrick and Bischke, Benjamin and Dengel, Andreas and Borth, Damian},
  journal={IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing},
  year={2019}
}