Documentation Index
Fetch the complete documentation index at: https://docs.lancedb.com/llms.txt
Use this file to discover all available pages before exploring further.
View on Hugging Face
Source dataset card and downloadable files for
lance-format/docvqa-lance.lmms-lab/DocVQA (DocVQA config).
Splits
| Split | Rows |
|---|---|
validation.lance | 5,349 |
test.lance | 5,188 |
Schema
| Column | Type | Notes |
|---|---|---|
id | int64 | Row index within split |
image | large_binary | Inline JPEG bytes (page image) |
image_id | string? | DocVQA docId (alias) |
question_id | string? | DocVQA questionId |
question | string | Natural-language question |
answers | list<string> | Reference answer span(s) |
answer | string | First reference answer (FTS target) |
doc_id | string? | DocVQA document id |
ucsf_document_id | string? | UCSF Industry Documents Library id |
ucsf_document_page_no | string? | Page number within the source document |
data_split | string? | Original split label from the source |
question_types | list<string> | DocVQA question-type tags (form, figure, table, …) |
image_emb | fixed_size_list<float32, 512> | CLIP image embedding (cosine-normalized) |
question_emb | fixed_size_list<float32, 512> | CLIP text embedding of the question |
Pre-built indices
IVF_PQonimage_embandquestion_emb—metric=cosineINVERTED(FTS) onquestionandanswerBTREEonimage_id,question_id,doc_idLABEL_LISTonquestion_types
Quick start
Load with LanceDB
These tables can also be consumed by LanceDB, the multimodal lakehouse and embedded search library built on top of Lance, for simplified vector search and other queries.LanceDB vector search
LanceDB full-text search
Filter by question type
Filter with LanceDB
Source & license
Converted fromlmms-lab/DocVQA. DocVQA is released under the MIT license; the underlying documents come from the UCSF Industry Documents Library — review their access conditions before redistribution.