Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.lancedb.com/llms.txt

Use this file to discover all available pages before exploring further.

https://mintcdn.com/lancedb-bcbb4faf/6L0IRVkfdlgMU1Pw/static/assets/logo/huggingface-logo.svg?fit=max&auto=format&n=6L0IRVkfdlgMU1Pw&q=85&s=da940a105a40440f0cd1224d3fa4ae52

View on Hugging Face

Source dataset card and downloadable files for lance-format/textvqa-lance.
Lance-formatted version of TextVQA — VQA where the question requires reading text in the image — sourced from lmms-lab/textvqa. Each row carries the image bytes, the question, the 10 reference answers, the OCR tokens detected by the dataset’s pre-processing, and CLIP image + question embeddings.

Splits

SplitRows
validation.lance5,000
train.lance34,602

Schema

ColumnTypeNotes
idint64Row index within split
imagelarge_binaryInline JPEG bytes
image_idstring?TextVQA image id
question_idstring?TextVQA question id
questionstringThe question text
answerslist<string>10 annotator answers
answerstringFirst answer — used as canonical / FTS target
ocr_tokenslist<string>OCR tokens detected on the image
image_classeslist<string>OpenImages-style scene tags from the source
set_namestring?Source partition (train, val)
image_embfixed_size_list<float32, 512>OpenCLIP image embedding (cosine-normalized)
question_embfixed_size_list<float32, 512>OpenCLIP text embedding of the question

Pre-built indices

  • IVF_PQ on image_emb and question_embmetric=cosine
  • INVERTED (FTS) on question and answer
  • BTREE on image_id, question_id, set_name

Quick start

import lance
ds = lance.dataset("hf://datasets/lance-format/textvqa-lance/data/validation.lance")
print(ds.count_rows(), ds.schema.names, ds.list_indices())

Load with LanceDB

These tables can also be consumed by LanceDB, the multimodal lakehouse and embedded search library built on top of Lance, for simplified vector search and other queries.
import lancedb

db = lancedb.connect("hf://datasets/lance-format/textvqa-lance/data")
tbl = db.open_table("validation")
print(f"LanceDB table opened with {len(tbl)} image-question pairs")
import lance, pyarrow as pa, open_clip, torch

model, _, _ = open_clip.create_model_and_transforms("ViT-B-32", pretrained="laion2b_s34b_b79k")
tokenizer = open_clip.get_tokenizer("ViT-B-32")
model = model.eval().cuda().half()
with torch.no_grad():
    q = model.encode_text(tokenizer(["what brand is on this billboard?"]).cuda())
    q = (q / q.norm(dim=-1, keepdim=True)).float().cpu().numpy()[0]

ds = lance.dataset("hf://datasets/lance-format/textvqa-lance/data/validation.lance")
emb_field = ds.schema.field("image_emb")
hits = ds.scanner(
    nearest={"column": "image_emb", "q": pa.array([q.tolist()], type=emb_field.type)[0], "k": 10},
    columns=["question", "answer", "ocr_tokens"],
).to_table().to_pylist()
import lancedb, open_clip, torch

model, _, _ = open_clip.create_model_and_transforms("ViT-B-32", pretrained="laion2b_s34b_b79k")
tokenizer = open_clip.get_tokenizer("ViT-B-32")
model = model.eval().cuda().half()
with torch.no_grad():
    q = model.encode_text(tokenizer(["what brand is on this billboard?"]).cuda())
    q = (q / q.norm(dim=-1, keepdim=True)).float().cpu().numpy()[0]

db = lancedb.connect("hf://datasets/lance-format/textvqa-lance/data")
tbl = db.open_table("validation")

results = (
    tbl.search(q.tolist(), vector_column_name="image_emb")
    .metric("cosine")
    .select(["question", "answer", "ocr_tokens"])
    .limit(10)
    .to_list()
)
import lancedb

db = lancedb.connect("hf://datasets/lance-format/textvqa-lance/data")
tbl = db.open_table("validation")

results = (
    tbl.search("brand name")
    .select(["question", "answer"])
    .limit(10)
    .to_list()
)

Why Lance?

  • One dataset for images + questions + answers + OCR + dual embeddings + indices — no JSON/feature folders.
  • Cross-modal search and OCR-text filtering work on the same dataset on the Hub.
  • Schema evolution: add columns (alternate OCR systems, model predictions) without rewriting the data.

Source & license

Converted from lmms-lab/textvqa. TextVQA is released under CC BY 4.0 by Singh et al. (Facebook AI Research).

Citation

@inproceedings{singh2019towards,
  title={Towards VQA models that can read},
  author={Singh, Amanpreet and Natarajan, Vivek and Shah, Meet and Jiang, Yu and Chen, Xinlei and Batra, Dhruv and Parikh, Devi and Rohrbach, Marcus},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2019}
}