Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.lancedb.com/llms.txt

Use this file to discover all available pages before exploring further.

https://mintcdn.com/lancedb-bcbb4faf/6L0IRVkfdlgMU1Pw/static/assets/logo/huggingface-logo.svg?fit=max&auto=format&n=6L0IRVkfdlgMU1Pw&q=85&s=da940a105a40440f0cd1224d3fa4ae52

View on Hugging Face

Source dataset card and downloadable files for lance-format/chartqa-lance.
Lance-formatted version of ChartQA — VQA over scientific and business charts that combine logical and visual reasoning — sourced from lmms-lab/ChartQA.

Splits

SplitRows
test.lance2,500
The lmms-lab/ChartQA redistribution exposes test only. Train and validation live in the original release (https://github.com/vis-nlp/ChartQA); add them via chartqa/dataprep.py --splits once a parquet mirror is identified.

Schema

ColumnTypeNotes
idint64Row index
imagelarge_binaryInline chart image bytes
image_id / question_idstring?(Source does not assign explicit ids — null for now)
questionstringNatural-language question
answerslist<string>Reference answer (typically a single string)
answerstringFirst answer — used as canonical
typestring?Question type (human vs augmented)
image_embfixed_size_list<float32, 512>CLIP image embedding (cosine-normalized)
question_embfixed_size_list<float32, 512>CLIP text embedding of the question

Pre-built indices

  • IVF_PQ on image_emb and question_embmetric=cosine
  • INVERTED (FTS) on question and answer
  • BITMAP on type

Quick start

import lance
ds = lance.dataset("hf://datasets/lance-format/chartqa-lance/data/test.lance")
print(ds.count_rows(), ds.schema.names, ds.list_indices())

Load with LanceDB

These tables can also be consumed by LanceDB, the multimodal lakehouse and embedded search library built on top of Lance, for simplified vector search and other queries.
import lancedb

db = lancedb.connect("hf://datasets/lance-format/chartqa-lance/data")
tbl = db.open_table("test")
print(f"LanceDB table opened with {len(tbl)} chart-question pairs")
import lancedb

db = lancedb.connect("hf://datasets/lance-format/chartqa-lance/data")
tbl = db.open_table("test")

ref = tbl.search().limit(1).select(["question_emb", "question"]).to_list()[0]
query_embedding = ref["question_emb"]

results = (
    tbl.search(query_embedding, vector_column_name="question_emb")
    .metric("cosine")
    .select(["question", "answer"])
    .limit(5)
    .to_list()
)
import lancedb

db = lancedb.connect("hf://datasets/lance-format/chartqa-lance/data")
tbl = db.open_table("test")

results = (
    tbl.search("percentage")
    .select(["question", "answer"])
    .limit(10)
    .to_list()
)

Source & license

Converted from lmms-lab/ChartQA. The original ChartQA dataset is released under the GNU GPL-3.0 license by Masry et al.

Citation

@inproceedings{masry2022chartqa,
  title={ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning},
  author={Masry, Ahmed and Long, Do Xuan and Tan, Jia Qing and Joty, Shafiq and Hoque, Enamul},
  booktitle={Findings of the Association for Computational Linguistics: ACL 2022},
  year={2022}
}