Home / Articles / Agent memory without a vector database
Architecture · Claude Code

Persistent Memory for AI Agents Without a Vector Database

Martin Andrt June 7, 2026

You don't need a vector database to give an AI agent long-term memory. For a single agent working in long-running sessions, a disciplined set of plain markdown files — a bootloader, a session-lineage log, and curated notes — plus a shell hook that writes at session end gives reliable continuity across hundreds of sessions. This is the architecture I run daily in Claude Code (200+ named sessions), why it beats RAG for this use case, and when you should reach for a vector database instead.

The problem: agents forget between sessions

An AI coding agent starts every session blank. It has no idea what you decided yesterday, which approach already failed, or how you like things done. The standard fix is retrieval-augmented generation (RAG): embed your history into a vector database and pull back the "relevant" chunks at query time.

For large, fast-changing knowledge bases, that's the right tool. For a single operator running an agent on their own work, it's usually the wrong one. RAG adds infrastructure, introduces retrieval misses, and hides your memory inside an opaque store you can't read or edit by hand. There is a simpler architecture that, in practice, holds up better.

None of the individual pieces below are novel — files like CLAUDE.md, handoff notes, and session hooks are emerging community practice, not an invention. What's worth writing down is how they fit together, and the two or three places where the obvious version is subtly wrong.

The architecture

Session lineage instead of retrieval

The core idea is session lineage: every working session ends by writing a short handoff — what was done, the key decisions, what was learned, and a note to the next session. The next session reads the most recent entry at startup.

Context is therefore inherited rather than retrieved at query time. But it's worth being precise about what the chain does and doesn't carry, because this is where the naive version goes wrong. Lineage carries the recent working thread — not every fact forever. If each session only ever read the latest handoff, anything decided in session 12 would survive to session 200 only if every session in between faithfully copied it forward. That's a game of telephone, and it degrades.

So durable facts don't live in the lineage at all. The moment something is worth keeping — a decision, a rule, a hard-won lesson — it gets promoted out of the handoff and into a curated file (a knowledge or feedback note that's addressed directly, not relayed). Lineage is short-term continuity; the curated files are long-term memory. Keeping those two jobs separate is the discipline that makes the system hold up: the handoff stays small and current, and the things that must not be lost are written down where the chain can't drop them.

A single entry is deliberately short — it's a handoff note for the next session to read, not a transcript:

## SESSION 204 — Open-source
Date:     2026-06-06
Worked on: published the memory architecture as a public repo
Decisions: open-source only what is unambiguously original
Learned:  strip promotional flourish from public copy
Handoff:  one follow-up link still missing — tracked in the index

Only the most recent entry stays in the always-loaded file; older entries roll over into an archive that's searchable but not loaded by default. The log grows without the startup cost growing with it.

A bootloader, not a context dump

The obvious objection is tokens: won't loading memory every session blow the context window? Only if you load everything. The pattern is a single bootloader file — I call mine BRAIN.md — that loads a minimal always-on layer at startup: who the operator is, how we work, where things live, and the handful of rules that must always apply.

Everything else is read on demand. Most memory files are never opened in a given session; they are looked up only when the task at hand calls for them. The bootloader is a map, not the territory.

Curated memory, organised by purpose

Beneath the bootloader, memory is a small set of markdown files grouped by what they're for, not by recency:

Each file holds one idea, with a one-line description used to decide relevance. The agent writes new memory itself when it learns something durable, and the operator can read or edit any of it in a text editor. Nothing is locked inside an embedding.

Auto-capture: session end as a first-class event

The system only works if the handoff actually gets written. Relying on the agent to remember to do it is fragile, so capture is automated. Claude Code lets you register a Stop hook — a command that runs when a session ends — in its settings:

// .claude/settings.json
{
  "hooks": {
    "Stop": [
      { "hooks": [ { "type": "command", "command": "session-capture.sh" } ] }
    ]
  }
}

The hook receives the session metadata and the script appends one line to an activity.log — timestamp, project, topic:

# session-capture.sh (simplified)
echo "[$(date '+%F %T')] [$project] — $topic" >> memory/activity.log

The end of a session becomes an event the system reacts to, not a step someone hopes happened. The next boot reads the tail of that log to know what came just before.

With one caveat the "first-class event" framing can hide: a Stop hook only fires on a clean exit. A crash, a force-quit, or a killed process writes nothing — and that's exactly the moment a session's context is most painful to lose. Two cheap mitigations matter more than they look. First, durable takeaways are promoted into curated files during the session, so the end-of-session handoff is the last step, not the only one. Second, because the previous state is already committed in git, a missed capture costs you the latest delta — not the history. It's a real gap, not a solved problem; treat the hook as the happy path, not a guarantee.

Not specific to Claude Code. The mechanism here is a Stop hook, but the pattern is tool-agnostic: any agent that can read a few files at startup and run a command when it finishes can use it. Swap the hook for whatever your runtime offers; the markdown lineage, bootloader, and curated notes stay the same.
Why it works

Why this beats a vector database here

One honest qualification, because it's easy to overclaim: only the always-on layer is truly deterministic. The on-demand layer is itself a form of retrieval — the agent decides which file to open from its one-line description, and that can miss, just as a vector search can. The difference isn't "no retrieval" — it's retrieval over a small, curated, human-named set with legible descriptions. A miss is rarer, visible, and fixable by rewriting a description; with embeddings it's opaque. You're trading similarity-misses for naming-and-description misses, not eliminating misses.

The rest of the cost is real and worth stating: it takes discipline to maintain, and it does not scale to searching thousands of documents semantically. For a focused agent on a bounded body of work, that trade is strongly in your favour.

What if a link in the chain breaks?

This is the fair objection: if one handoff is missing, incomplete, or wrong, does the error propagate silently down the chain? It can — and that's the honest weak point of the approach. Three things keep it manageable, and one doesn't:

When you should use a vector database instead

This approach is not a universal replacement for RAG. Reach for a vector database when:

The two also compose: lineage for the agent's working memory and behaviour, a vector store for the large reference corpus it occasionally consults. Memory and retrieval are different jobs.

FAQ
Do I need a vector database for AI agent memory?
No. For a single agent in long-running sessions, structured markdown plus a session-end capture hook gives reliable long-term memory without embeddings or retrieval. A vector database earns its place once the corpus is large, constantly changing, or must be searched semantically.
What is session lineage?
A written handoff log. Each session records what it did, the key decisions, and a note for the next session, which reads the latest entry at startup. Context is inherited deterministically rather than retrieved from an embedding store.
How does it compare to RAG?
RAG retrieves chunks at query time and can miss or surface the wrong context. Markdown lineage loads a small, curated, human-legible context that is guaranteed present, editable, and version-controlled — at the cost of manual discipline and limited scale.
Doesn't loading memory every session waste tokens?
Only if you load everything. A bootloader file loads a minimal always-on layer at startup and reads the rest on demand. Most files are never opened in a given session.
Does this only work in Claude Code?
No. Claude Code provides the Stop hook used for auto-capture, but the pattern is tool-agnostic. Any agent that can read a few files at startup and run a command when it finishes can use the same lineage, bootloader, and curated-notes structure.
What happens if a handoff entry is wrong or missing?
Because memory is plain text in git, a bad entry is visible in a diff and revertable in one step, and a structural check at session end can flag missing or malformed files. The limit is that a plausible-but-wrong entry won't be caught automatically — the next session trusts what's written. That's the real tax of a human-legible system, mitigated by discipline and cheap visibility rather than self-healing.

References. The memory architecture described here is open-sourced (MIT) as a dependency-free skeleton you can read or fork. A longer field report on running it over many sessions — including the parts that break — is in the case study.