The problem: agents forget between sessions
An AI coding agent starts every session blank. It has no idea what you decided yesterday, which approach already failed, or how you like things done. The standard fix is retrieval-augmented generation (RAG): embed your history into a vector database and pull back the "relevant" chunks at query time.
For large, fast-changing knowledge bases, that's the right tool. For a single operator running an agent on their own work, it's usually the wrong one. RAG adds infrastructure, introduces retrieval misses, and hides your memory inside an opaque store you can't read or edit by hand. There is a simpler architecture that, in practice, holds up better.
None of the individual pieces below are novel — files like CLAUDE.md, handoff notes, and session hooks are emerging community practice, not an invention. What's worth writing down is how they fit together, and the two or three places where the obvious version is subtly wrong.
Session lineage instead of retrieval
The core idea is session lineage: every working session ends by writing a short handoff — what was done, the key decisions, what was learned, and a note to the next session. The next session reads the most recent entry at startup.
Context is therefore inherited rather than retrieved at query time. But it's worth being precise about what the chain does and doesn't carry, because this is where the naive version goes wrong. Lineage carries the recent working thread — not every fact forever. If each session only ever read the latest handoff, anything decided in session 12 would survive to session 200 only if every session in between faithfully copied it forward. That's a game of telephone, and it degrades.
So durable facts don't live in the lineage at all. The moment something is worth keeping — a decision, a rule, a hard-won lesson — it gets promoted out of the handoff and into a curated file (a knowledge or feedback note that's addressed directly, not relayed). Lineage is short-term continuity; the curated files are long-term memory. Keeping those two jobs separate is the discipline that makes the system hold up: the handoff stays small and current, and the things that must not be lost are written down where the chain can't drop them.
A single entry is deliberately short — it's a handoff note for the next session to read, not a transcript:
## SESSION 204 — Open-source Date: 2026-06-06 Worked on: published the memory architecture as a public repo Decisions: open-source only what is unambiguously original Learned: strip promotional flourish from public copy Handoff: one follow-up link still missing — tracked in the index
Only the most recent entry stays in the always-loaded file; older entries roll over into an archive that's searchable but not loaded by default. The log grows without the startup cost growing with it.
A bootloader, not a context dump
The obvious objection is tokens: won't loading memory every session blow the context window? Only if you load everything. The pattern is a single bootloader file — I call mine BRAIN.md — that loads a minimal always-on layer at startup: who the operator is, how we work, where things live, and the handful of rules that must always apply.
Everything else is read on demand. Most memory files are never opened in a given session; they are looked up only when the task at hand calls for them. The bootloader is a map, not the territory.
Curated memory, organised by purpose
Beneath the bootloader, memory is a small set of markdown files grouped by what they're for, not by recency:
- Identity — who the operator is and how the agent should behave
- Rules — hard constraints and a deterministic do / ask / never matrix
- Knowledge — how to do specific recurring tasks, with the gotchas attached
- Feedback — corrections the operator gave, each with the reason and how to apply it
- Projects — state that isn't derivable from the code or git history
Each file holds one idea, with a one-line description used to decide relevance. The agent writes new memory itself when it learns something durable, and the operator can read or edit any of it in a text editor. Nothing is locked inside an embedding.
Auto-capture: session end as a first-class event
The system only works if the handoff actually gets written. Relying on the agent to remember to do it is fragile, so capture is automated. Claude Code lets you register a Stop hook — a command that runs when a session ends — in its settings:
// .claude/settings.json { "hooks": { "Stop": [ { "hooks": [ { "type": "command", "command": "session-capture.sh" } ] } ] } }
The hook receives the session metadata and the script appends one line to an activity.log — timestamp, project, topic:
# session-capture.sh (simplified)
echo "[$(date '+%F %T')] [$project] — $topic" >> memory/activity.log
The end of a session becomes an event the system reacts to, not a step someone hopes happened. The next boot reads the tail of that log to know what came just before.
With one caveat the "first-class event" framing can hide: a Stop hook only fires on a clean exit. A crash, a force-quit, or a killed process writes nothing — and that's exactly the moment a session's context is most painful to lose. Two cheap mitigations matter more than they look. First, durable takeaways are promoted into curated files during the session, so the end-of-session handoff is the last step, not the only one. Second, because the previous state is already committed in git, a missed capture costs you the latest delta — not the history. It's a real gap, not a solved problem; treat the hook as the happy path, not a guarantee.
Why this beats a vector database here
- Deterministic core — the always-on bootloader loads verbatim, every session. No similarity search decides whether your core rules and identity show up.
- Legible — memory is plain text you can read, diff, and correct. When the agent behaves oddly, you can see why.
- Editable — fixing a wrong belief is a one-line text edit, not a re-embedding job.
- Versioned — it lives in git, so memory has history and is recoverable.
- Cheap — no database, no embedding pipeline, no extra service to run or pay for.
One honest qualification, because it's easy to overclaim: only the always-on layer is truly deterministic. The on-demand layer is itself a form of retrieval — the agent decides which file to open from its one-line description, and that can miss, just as a vector search can. The difference isn't "no retrieval" — it's retrieval over a small, curated, human-named set with legible descriptions. A miss is rarer, visible, and fixable by rewriting a description; with embeddings it's opaque. You're trading similarity-misses for naming-and-description misses, not eliminating misses.
The rest of the cost is real and worth stating: it takes discipline to maintain, and it does not scale to searching thousands of documents semantically. For a focused agent on a bounded body of work, that trade is strongly in your favour.
What if a link in the chain breaks?
This is the fair objection: if one handoff is missing, incomplete, or wrong, does the error propagate silently down the chain? It can — and that's the honest weak point of the approach. Three things keep it manageable, and one doesn't:
- It's visible. A bad entry is plain text in a diff, not a silent vector. You can read it, see it's wrong, and fix it in one edit — unlike a poisoned embedding you'd never notice.
- It's recoverable. Everything is in git, so a corrupted or deleted entry is one
git revertaway. The chain has history. - It's checked. A structural check can run at session end (or start) to flag missing files, broken links between notes, or entries that don't parse — turning a silent drift into a loud warning.
- But it's not self-healing. Nothing detects a plausible-but-wrong handoff automatically. If you write "we decided X" when you decided not-X, the next session believes it. This is the real tax of a human-legible system: it trusts what's written. The mitigation is discipline and the cheap visibility above, not magic.
When you should use a vector database instead
This approach is not a universal replacement for RAG. Reach for a vector database when:
- Your corpus is large — thousands of documents you can't curate by hand
- It changes constantly and must stay fresh automatically
- You genuinely need semantic search across unstructured content
- Many users or agents query the same shared knowledge
The two also compose: lineage for the agent's working memory and behaviour, a vector store for the large reference corpus it occasionally consults. Memory and retrieval are different jobs.
References. The memory architecture described here is open-sourced (MIT) as a dependency-free skeleton you can read or fork. A longer field report on running it over many sessions — including the parts that break — is in the case study.