Documentation Index
Fetch the complete documentation index at: https://docs.lancedb.com/llms.txt
Use this file to discover all available pages before exploring further.
View on Hugging Face
Source dataset card and downloadable files for
lance-format/trivia-qa-lance.rc.nocontext config) — a question-answering dataset of trivia questions paired with answer aliases — with MiniLM sentence embeddings stored inline.
Why rc.nocontext?
The full TriviaQA dataset bundles entire Wikipedia / web pages per question (entity_pages, search_results), which makes it tens of GB. The rc.nocontext slice keeps the question + answer + answer aliases in a compact form — ideal for closed-book QA, retrieval research, and as a search target.
Splits
| Split | Rows |
|---|---|
train.lance | 138,384 |
validation.lance | 17,944 |
Schema
| Column | Type | Notes |
|---|---|---|
question_id | string | TriviaQA question id (e.g. tc_1) |
question | string | The trivia question |
question_source | string | URL / source where the question came from |
answer_value | string | Canonical answer |
answer_aliases | list<string> | Other accepted phrasings (e.g. ["Sinclair Lewis", "Harry Sinclair Lewis"]) |
normalized_answer | string | Lowercased / normalized form for exact-match scoring |
answer_type | string | TriviaQA entity type (e.g. WikipediaEntity, FreebaseEntity) |
question_emb | fixed_size_list<float32, 384> | MiniLM embedding of question (cosine-normalized) |
Pre-built indices
IVF_PQonquestion_emb—metric=cosineINVERTEDonquestionBTREEonquestion_idandanswer_valueBITMAPonanswer_type
Quick start
Load with LanceDB
These tables can also be consumed by LanceDB, the multimodal lakehouse and embedded search library built on top of Lance, for simplified vector search and other queries.Semantic search over questions
LanceDB semantic search
LanceDB full-text search
Filter by answer type
Filter with LanceDB
Why Lance?
- One dataset carries questions + answers + aliases + embeddings + indices — no sidecar files.
- On-disk vector and full-text indices live next to the data, so search works on local copies and on the Hub.
- Schema evolution: add columns (alternate embeddings, generated answers, task labels) without rewriting the data.
Source & license
Converted frommandarjoshi/trivia_qa (rc.nocontext). TriviaQA is released under the Apache 2.0 license.