Started with AI-assisted coding in Cursor a year ago. Since then it became a workflow I use every day. 173 named sessions, 18 integrated services, a 150-file curated memory — a persistent setup where context carries across sessions instead of being rebuilt each time.
Practical findings from building and running the system. Last updated May 5, 2026 — session 174.
wins/ folder to track accomplishments over time — a reward system for the AI, complete with behavioral science references. Two independent analyses said no, but not because it was a bad idea. Because it was a duplicate of something the system already had. The answer wasn’t to add a new file; it was to edit the one-page primer (+5 lines) and delete 13 outdated feedback files. Memory density beats archive density. When a new feature feels necessary, first check whether the structure already covers it — most of the time it does.
BRAIN.md is a bootloader: who Martin is, who Claude is, the deterministic rule, intent patterns, integration map, feedback principles. The rest loads on-demand when needed. It's a cache strategy for the context window — load the minimum, but know where to find the rest. Session 133 cut the library from ~256 to 93 files after more files started producing worse results, not better.
time_estimate, due_date, waiting_on). 19 lists, 65 tasks, dependencies and timelines wired, all in a single session. Rate limits held, the backlog was usable the same day.
Every session starts with context loading, continues with real work, and ends with automated knowledge capture. Two real examples:
12 services connected through one setup. Click any card to see how it's used.
Four layers keep context persistent across sessions.
Six principles that run through every public-facing or remote component. Measured, enforced, audited across 130+ sessions. Security posture here comes from what actually gets enforced — not from what looks thorough.
Remote entrypoints reach 1 integration out of 9. Everything else stays blocked until a human is at the keyboard. Whitelist, not blocklist — blocklists expand slower than the attack surface.
Three tiers of different kinds, not one long chain: integrity (HMAC signing + replay nonce), runtime isolation (filesystem + network sandbox), and data access (row-level security at the DB plus model-layer allowedTools). Compromising one tier doesn’t give you the others.
Legacy guardrail tracked over 40 sessions: 0 real threats caught, 23 legitimate actions blocked per week. Removed only after data proved it was creating more attack surface than it covered.
A single encrypted env file. Never read via cat, never dropped into shell history, never copied into model context. Rotation uses interactive input — exposed tokens trigger a 2-phase replace.
Remote commands are HMAC-signed with a replay nonce. The nonce insert sits in the execute path, not the claim path — so replayed commands fail closed after verification. Prod-verified on a real attempted replay.
Every automation has its own role with the smallest permission set it needs. The bot can’t write where a human would; row-level security is enforced at the database, not only in application code.
Controlled experiment with real data.
Does defining a coaching persona via system prompt measurably improve LLM output quality compared to the same model without any role instruction? And does the effect depend on the type of persona, the model, and the task?
200 outputs were generated by 4 commercially available LLMs across 5 coaching personas under 2 conditions—with and without a system prompt. Each output was scored on a 5-criterion rubric (role consistency, structure, specificity, handling of uncertainty, usability) on a 1–3 scale. Maximum score: 15 per output. Identical user inputs in both conditions ensure observed differences are attributable to the system prompt, not input variability.
200 outputs 4 models 5 personas 5 rubric criteria 100 matched pairs
In 12 out of 100 matched pairs (12%), the system prompt decreased output quality. This "over-prompting" effect concentrated in two patterns:
By model: Gemini 3 Flash (6 cases) and Claude Sonnet 4.5 (5 cases)—models with naturally strong coaching behavior that gets disrupted by explicit instructions.
By persona: Life Coach (6 of 12 cases)—a persona defined by implicit style (tone, reflective questions) rather than explicit output structure. When the model already "knows" how to coach, adding a persona prompt can constrain rather than improve.
ChatGPT 5.2 showed no over-prompting cases in this sample. DeepSeek V3: 1 case excluded (malformed output prevented consistent scoring).
The practical rule: the less "natural" the desired behavior is for a model, the more explicit the persona must be. For roles already well-represented in training data (empathetic coaching), a light-touch prompt suffices. For roles with unusual output structure (analytical decomposition, philosophical distinction), explicit section-by-section prescriptions are necessary—and produce the largest quality gains (+4.50 for Philosophical Advisor, +3.95 for Pragmatic Coach).
This matters for anyone deploying LLMs in coaching, mentoring, or educational contexts: prompting is not a binary (on/off) but a calibration problem. More structure helps—until it doesn't. The evidence suggests that optimal persona design requires testing both conditions before deployment.
Full thesis (62 pages, dataset of 200 outputs, evaluation rubric, and analysis script) available upon request.
Things built through the setup described above.
Brief to deployed demo in 2–3 days. 40 shipped across industries.
Four sources → deduped, ranked summary delivered daily at 9:30. Runs forever, zero touch.
Tender scraper with daily cron + outbound pipeline with AI web research.
SDR app: client brief → AI web research → profile → email to AE.
Claude-powered Mattermost bot. NL commands, 6-layer security, queue.
Organized by type, session lineage, auto-capture via shell hooks.
ClickUp task → Fakturoid invoice → Gmail send. No manual typing.
Svelte MCP pulls current framework docs straight into context. No guessing API shapes from training memory.
Job board API → AI evaluation → ClickUp tasks, calendar invites, candidate emails. Six stages, no manual touch.
I work on AI in two modes: building internal tools at a digital agency, and teaching students at NEWTON University how to use AI for actual work. Based in Prague.