The mental model is clean
- Hermes owns the agent loop
- LanceDB manages the durable long-term memory and offers semantic recall.
Why LanceDB fits agent memory
Out of the box, Hermes remembers with a small curated notes file frozen into the system prompt, plus lexical (keyword) search over past sessions. Both are useful, but keyword search misses paraphrases of what you originally typed — the exact thing you need when recalling a fact you phrased differently months ago. LanceDB is an embedded retrieval library, which makes it a natural fit here:- No server to stand up — it reads and writes a table on local disk, so the plugin ships as a dependency rather than a service to operate.
- One table holds everything — content, metadata, and embeddings live together. A memory becomes a structured row with a category, tags, timestamps, and provenance, not just a text blob.
- Query it any way you need — vector similarity for meaning, BM25 full-text for exact names and jargon, a hybrid of the two, or plain metadata filters to keep recall scoped to the right workspace.
- It scales up — the same table abstraction carries over to larger LanceDB deployments later, so the local setup is never a dead end.
Install and activate
Install runtime dependencies into Hermes' environment
Hermes loads plugins inside its own Python interpreter, so the dependencies go there — not
into a separate virtualenv. (This interpreter is shared across profiles, so you only install
once.)
Set your embeddings API key
The plugin turns conversations into embeddings, so it needs an embeddings key. By default that
is OpenAI, so set
OPENAI_API_KEY in your environment or in ~/.hermes/.env.Prefer a local or non-OpenAI model? The plugin uses an OpenAI-compatible client, so you can
point it at any compatible endpoint (OpenRouter, Ollama, vLLM, …) in your config — no code
change needed. See Configuration below.
Activate and verify
Switch memory on and pick this plugin:Then confirm it’s actually active before you start chatting — this is the one step worth not
skipping, because Hermes quietly falls back to its built-in notes if the provider isn’t set:You want to see
Provider: lancedb with both installed ✓ and available ✓.The memory tools
Once activated, the agent has four tools for working with long-term memory:| Tool | What it does |
|---|---|
lancedb_recall | Semantic (vector, the default) or hybrid search over your workspace memory. Returns matching facts with scores and provenance. |
lancedb_remember | Stores a durable fact when you explicitly ask. Deduplicated by content hash, so remembering the same thing twice doesn’t pile up rows. |
lancedb_read | Fetches a single memory by ID, optionally with the original conversation messages it was distilled from. |
lancedb_forget | Deletes safely: previews candidates first, then deletes by exact ID, so nothing disappears by accident. |
Walkthrough
“Teach it your project preferences” Let’s make this concrete with the pain we opened on: re-explaining your setup to the agent every session. We’ll save a convention once and then prove a brand-new session can recall it. This example will touch all four tools along the way.Remember
Ask Hermes to commit a convention to long-term memory. Saying “remember in long-term memory” makes sure it lands in the LanceDB store, which shows up as the⚡ lancedb_r (lancedb_remember)
line below:
Recall
First, take Hermes’ built-in notes out of the picture so recall can only come from LanceDB — the two layers run side by side otherwise, and either could answer:/new) and ask for the convention back, worded completely
differently from how you saved it:
Read
You can also ask where a fact came from. Hermes attributes the answer to its stored memory rather than guessing from a file in the repo (under the hood,lancedb_read can also return
the exact source messages a fact was distilled from):
Forget
When a preference changes, ask Hermes to drop the old fact. The tool calls tell the whole story: the two⚡ lancedb_f (lancedb_forget) lines are it previewing matches and then
deleting, and the trailing ⚡ lancedb_r is it saving the replacement in the same breath:
Retrieval modes
Recall ships invector mode by default — pure semantic search, which is what survives the
paraphrasing you saw above. If you also need exact name or jargon matching, switch to hybrid
(vector + BM25) and choose how the two legs are fused: RRF, a vector-biased linear blend, or a
cross-encoder reranker. Mode is set per call; fusion is a config setting.
Inspect the store
Everything lives in one table namedmemories at ~/.hermes/lancedb/memories.lance. Because
it’s a plain LanceDB table, you can open it directly and see exactly what the agent has stored
— a kind column separates extracted fact rows from the raw turn rows they were drawn
from:
Configuration
The plugin runs on sensible defaults once activated — you don’t have to configure anything.~/.hermes/config.yaml is purely for overrides. Two common ones:
Use a cheaper model for the auxiliary fact-extraction calls:
Changing the embedding model (or its dimension) against an existing store requires recreating
the table — the plugin fails loudly on a dimension mismatch rather than silently returning
nothing. Every option is documented in the plugin’s
default_config.yaml.Benchmark
On LongMemEval-S, a long-conversation QA benchmark, LanceDB’s semantic recall clearly beat Hermes’ built-in lexical search (0.66 vs. 0.53 answer accuracy) by finding the right messages even when the question was worded differently from the original conversation. For the full methodology, the per-question-type breakdown, and a reproducible harness, see the blog post and the benchmark harness.Why this works well
- It’s local-first and embedded. The LanceDB memory table lives on your disk with no server to run; the plugin installs as a dependency of Hermes’ own environment.
- Recall survives paraphrasing. Semantic search matches meaning, not spelling, which is the failure mode that sinks keyword-only session search.
- Memories are structured and traceable. Each fact is a row with metadata and a link back
to the messages it came from, and
forgetalways previews before it deletes. - Nothing about it is a dead end. As your needs grow, the same table abstraction carries over to LanceDB Enterprise for automatic compaction, reindexing, and scale.
hermes memory setup, and run the kind of
workflow we walked through above.