From Chaotic Kittens to a Coordinated Squad: How We Cut Our AI Agent Squad in Half

At Invents.US, most non-trivial work ships through a single terminal slash command: /feature.

It triggers a workspace coordinator skill — think engineering manager — that decomposes the ask, dispatches specialist sub-agents, arbitrates disagreements, and hands the PR to an automated CI/CD review agent before a human sees it.

Original bench: Architect, UI/UX Implementer, Business Logic Coder, Systems Core Coder, Tester, Tech Writer.

The first cut burned a week's worth of agent compute budget in a single day. After the refactor below, larger features ship comfortably inside a weekly budget. Same shape, half the roles, better output.

Here's what we got wrong.

Mistake #1: We told the coordinator to "be aware of each agent's reasoning."

The coordinator was the arbiter, so it seemed obvious: feed it the full agent outputs so it could adjudicate disagreements. The result: the manager's context window absorbed every sub-agent's chain of thought, every speculative tangent, every "considered but rejected" branch.

By stage 4 of a 7-stage pipeline, the coordinator was running on fumes — making the worst decisions exactly when the calls mattered most (arbitrating reviewer conflict, deciding must-fix vs. nice-to-have).

The fix: agents return decisions and artifacts, not reasoning trails. The spec is the contract. The diff is the diff. The reviewer verdict is a verdict, not a transcript. If the manager needs to re-derive why, it asks. It doesn't preload.

Mistake #2: We kept a role that consistently failed at the same job.

We had a dedicated Tester persona. Its job was validating that the implementer's tests actually exercised the functional spec. It failed at that task, the same way, across feature after feature — losing the spec context, approving thin coverage, missing semantic gaps the spec made obvious.

We moved the responsibility to the Architect, who already owned the functional spec. Implementers still write the tests; the Architect validates them against the spec contract before they hit PR review. The role that kept failing went away. The role with the right context picked up the work.

This had a second-order effect: it forced us to look hard at every parallel agent. We were fanning out Architect + Tester + Tech Writer concurrently, each producing its own framing of the same feature. Three docs, three slightly different mental models, all flowing back into the coordinator's context — which had to reconcile them before downstream implementers could start.

We were paying multiples of the compute to manufacture multiples of the disagreement.

The new rule: parallel execution only when outputs are substantially independent. UI Implementer and Systems Coder in parallel? Yes — different boundaries, no overlap. Architect and Tester in parallel? No — variations on the same artifact. Tech Writer runs after implementation, against the real code diff, not in parallel with a spec that may still move.

What we kept

The skill-as-coordinator shape. Slash commands are superb UX for "do this whole flow."
Persistent specialist memory. Sub-agents retain context so they don't re-litigate foundational decisions across features. This one is doing heavy lifting and probably deserves its own post.
A trivial off-ramp. If the task is a typo or minor refactor, the coordinator skips the pipeline entirely.

What we'd tell anyone building this

Specialists are cheap to add and expensive to coordinate. Every new agent type is another voice in the manager's head. Start with fewer, broader roles. Split only when an agent is genuinely doing two unrelated jobs poorly — and be willing to retire a role that keeps failing at the one job it has.

Keep the coordinator's diet lean. Feed it artifacts, not transcripts.

Mistake #1: We told the coordinator to "be aware of each agent's reasoning."

Mistake #2: We kept a role that consistently failed at the same job.

What we kept

What we'd tell anyone building this

More from Invents.US