Stop Re-Litigating the Same Lessons: Specialist Memory for AI Agents (Part 1)

In our last post, From Chaotic Kittens to a Coordinated Squad, we closed with a teaser about persistent specialist memory: Sub-agents retain context so they don't re-litigate foundational decisions across features. We noted that this pattern does heavy lifting and probably deserves its own post.

This is that post. Part 1 of a two-part series—Part 1 is building it, and Part 2 covers the dividends.

The problem we were solving was simple, but the cost compounded. With every new feature, the same specialist sub-agents (the squad we wrote about last time) would re-discover the same foundational facts. They'd learn that our cloud alarm templates emitted a function the deploy API rejected. They'd discover that one of our database proxies refused a startup parameter the SDK sets by default. They'd find out, again, that our serverless runtime freezes the moment the response is written, silently killing any in-flight background task. Each rediscovery burned a PR cycle.

Mistake #1: We kept the lessons in the coordinator's head

Our first instinct was the obvious one: brief the coordinator agent on everything the team had learned, and let it inject the right context into each specialist's prompt at dispatch time.

It failed in two distinct ways.

First, the coordinator's context window bloated instantly. Every minor quirk of our infrastructure was crammed into the initial prompt of our most expensive, highest-latency routing agent.

Second, and more fatally, the coordinator had to know a lesson was relevant before surfacing it. If the implementer agent was writing a backend migration, the coordinator had to proactively know to inject a specific proxy-parameter gotcha into the payload. That meant the lesson had to live somewhere indexable anyway, and the coordinator was the wrong place for it. Every specialist needed access to these foundational truths directly while they were in the weeds, not filtered through an orchestrator that lacked the context of the code being written.

Mistake #2: We tried a shared runbook

Second instinct: we built a shared runbook. One big, living document, organized by topic, that every agent could read.

This was worse. Nobody loaded the right section at the right moment. The document grew faster than anyone could keep it current, and because it was monolithic, new lessons fought older ones in the same file. We were optimizing for the author of the document, not the agents who had to read it. Our marketing specialist did not need to wade through a swamp of infrastructure-as-code deployment gotchas to find a brand voice rule, and the backend implementer had no use for copy guardrails.

The breaking point was an infrastructure-as-code to cloud-API gap. An alarm function our template kept emitting was rejected by the cloud's deploy API. We watched two consecutive hotfix attempts repeat the same mistake because dev didn't exercise the failing stack, and the rule was buried in the wrong section of a master doc.

We needed a system that pays for itself the second time you hit a wall.

What worked: one specialist, one directory, one lesson per file

We threw away the monolithic runbook. Now, each specialist has its own memory directory.

At the top of that directory sits an index file: a simple table of contents with one line per memory, kept short enough to stay in the agent's context window. Each lesson lives in its own standalone file with light frontmatter (a slug, a one-line description, and a type: feedback, project, reference, or user-profile). The body leads with the hard rule, followed by a Why line linking the originating incident, and a How to apply line.

The specialist that hits the incident writes the file. The index loads at agent start; bodies load on demand when the description matches the task at hand.

That infrastructure-as-code incident is now a single file in the infra-reviewer's directory: every infra stack in prod must also deploy in dev, with explicit, written exceptions only.

The same shape repeats across the squad. The time an architect re-implemented a vendor SDK as raw HTTP and dropped a CDN-required header is now a file in the architect's directory. The serverless freeze-on-response trap that killed background cleanup tasks is a file in the implementer's directory.

This works where the alternatives didn't because each lesson is paid for once. The specialist who learns the rule writes it in their own dialect before their context window flushes. The localized index then ensures it surfaces at the right moment without relying on a central coordinator.

What we'd tell anyone building this

If you are building agentic memory stacks today, here is the playbook:

One lesson, one file. Resist consolidation. A file you can't link to in another file is a file you can't reuse.
Index up front, body on demand. The index is always in context; the bodies are not. Treat the index like a table of contents the agent reads every turn.
The specialist who hit it writes it. Not the coordinator, not a documentation pass at the end of the sprint. Capture it the same turn it was learned.
Date and link the originating incident. "We tried X, it broke, here's the rule" beats "the rule is X" every time. Future-you (and future-agents) needs the why to judge edge cases.

This ledger has been running in production for about a month, accumulating over a hundred distinct memory files across nine specialists. The implementation is settled; now it's time to look at the impact—we'll cover that in Part 2. Building specialist memory into your own stack? I'd love to compare notes.

Mistake #1: We kept the lessons in the coordinator's head

Mistake #2: We tried a shared runbook

What worked: one specialist, one directory, one lesson per file

What we'd tell anyone building this

More from Invents.US