I Managed a Swarm of 20 AI Agents for a Week and Built a Product. Here Are the 8 Rules I Learned. | zach wills

9 min read Original article ↗

FYI – part 2 with copy/pastable agents + commands is out!

A couple weeks ago I went heads-down and experimented with a new development model. The results were unexpected: a production-ready application, ~800 commits, and 100+ PRs in a single week.

The core idea was to stop coding linearly and instead manage a swarm of ~20 parallel AI agents. This required building a custom parallelization tool, a playbook for managing sub-agent context windows, and a self-improving CLAUDE.md file that the system updated itself. It’s a new mental model for what a high-leverage engineer actually does, and it’s built on a few key rules.

And, to be clear, when I say production ready I mean it had good test coverage, CI/CD, includes auth, background jobs, and integrates LLMs and other 3rd party APIs. I’d consider it a respectable alpha version of the product.

The Old Way is Broken (And Your Boredom is a Signal)

It started with waiting. I’d hand a task to an AI agent, it would have a solid plan, and then… I’d just sit there, watching the terminal scroll.

My first reaction was frustration. My second was an epiphany: The AI wasn’t the bottleneck. I was. My workflow, built around a single thread of attention, was the thing holding back progress. I had a backlog of ideas, but I was stuck spectating.

So I built a parallelization tool. A system to spin up fully isolated dev environments on command. Suddenly, I had four terminals open, each a self-contained universe with its own server, database, and preview URL.

That’s when my role fundamentally changed.

The New Skill: The Multitasking Flow State

My role shifted from a hands-on coder to an orchestrator of multiple workstreams. My focus was still incredibly deep—Slack was off, the world was tuned out—but it was a fundamentally different kind of focus. It was a shift from a single laser to a powerful floodlight.

This wasn’t just passive monitoring; it was an active state of engagement across multiple fronts. I was simultaneously watching four terminals, remembering the specific context of each agent’s task, interjecting with high-level corrections, and planning the next steps for each parallel stream. Traditional “flow state” is about losing yourself in a single problem. This was the opposite: it was about maintaining a constant, high-level situational awareness of the entire system at once.

graph TD
    subgraph Traditional Deep Work
        A[Developer] --> B{Task 1};
        B --> C{Task 2};
        C --> D{Task 3};
    end

    subgraph Agentic Deep Work
        subgraph Parallel Stream 1
            Y1[Agent 1] --> Z1[Feature A];
        end
        subgraph Parallel Stream 2
            Y2[Agent 2] --> Z2[Feature B];
        end
        subgraph Parallel Stream 3
            Y3[Agent 3] --> Z3[Bugfix C];
        end
        X["Orchestrator<br>(Developer)"] --> Y1;
        X --> Y2;
        X --> Y3;
    end

The cognitive load of this new state was immense. After about three hours of intense orchestration, I would feel completely burnt. It was a clear signal that this new mode of working, while incredibly productive, demanded a different kind of mental energy than traditional coding.


The New Rules of Engagement

Managing a swarm of agents isn’t about fixing syntax; it’s about architecting an intelligent system. Over the week, I developed a new playbook.

Rule #1: Align on the Plan, Not Just the Goal.

My most effective workflow wasn’t just telling the AI, “Go build this.” I spent far more time iterating with it on the plan. My process often started with a high-level command like /spike for a small task or /tech plan for a larger one. It was like co-authoring a high-quality ticket with the AI. We would align on the refined plan first, and only then would I hand it off to the agent team for execution. It’s always cheaper to fix a bad plan than a bad implementation.

Rule #2: A Long-Running Agent is a Bug, Not a Feature.

A long runtime is a red flag. It means the agent is likely hitting its context limit, compacting its memory, and slowly forgetting the original intent of your command. It’s not focused; it’s lost.

Rule #3: Actively Manage the AI’s Memory.

For very large tasks, you have to actively manage the AI’s memory. The key is to checkpoint its progress somewhere persistent, then restart it with a fresh context. Initially, I used a low-tech method: telling the agent to write its progress to a local markdown file. Over time, I evolved this to be more robust. I’d have the agent post its progress as a comment on a GitHub pull request or a Linear ticket. Then I could clear the context and tell it to pick up where it left off.

Rule #4: Manage Context with Sub-Agents.

A more automated solution to the context problem is to architect your workflow into an assembly line of specialized sub-agents.

graph TD
    A[Main Agent] -- "Task: Build Feature X" --> B(Dispatch to Sub-Agent);
    B -- "1. Plan" --> C["Solution Architect Agent<br>(Fresh Context)"];
    C -- Plan Summary --> A;
    A -- "2. Implement" --> D["Senior Engineer Agent<br>(Fresh Context)"];
    D -- "Code & PR Link" --> A;
    A -- "3. Test" --> E["Dedicated Tester Agent<br>(Fresh Context)"];
    E -- Test Results --> A;
    A -- Final Output --> F[Done];

Rule #5: Trust the Autonomous Loop.

This, plus the parallel environments, was one of the biggest unlocks. Once the plan is aligned, the agent’s real power is its ability to iterate autonomously until the goal is met.

The loop can take many forms: sometimes it’s testing an API flow, or using Playwright to click through a browser. The specific tools don’t matter as much as the principle: define the components of your loop and automate it relentlessly until the tests pass. If a pre-built MCP isn’t available, you can have the AI write a temporary test script on the fly. It can use the application’s own database connection to query for changes or hit a third-party API to validate that a side effect occurred.

graph TD
    subgraph Autonomous Execution Loop
        direction LR
        A(Start: High-Level Plan) --> B[Agent Implements Code];
        B --> C{Agent Runs Tests};
        C -- Tests Fail --> D[Agent Analyzes Failure<br>and Refines Code];
        D --> B;
        C -- Tests Pass --> E(End: Task Complete);
    end

Rule #6: Automate the System, Not Just the Code.

My goal was to avoid doing anything by hand. I used Claude to build and maintain the development system itself. This is a place where people who already think in systems are going to be ahead early.

  • A Self-Updating CLAUDE.md: After completing a task, I would have the agent automatically update a core CLAUDE.md file with its learnings.
  • Self-Refining Tools: I didn’t manually tweak my custom commands. I tasked Claude with refining them. The system was constantly improving itself.

Rule #7: Be Ruthless About Restarting.

Early on, I found myself hesitating when an agent started to go off track, thinking, “Let me just see what it does.” I learned this was a waste of time. Near the end of the week, I became ruthless. The moment an agent started going in the wrong direction, I’d kill the process, give it a better instruction, and restart. The 5-10 minutes you might waste waiting for it to finish a bad thought is never worth it.

Rule #8: Commit Early and Often.

This is the safety net that makes Rule #7 possible. I built a directive into my commands to force agents to commit their work frequently. If an agent went off the rails, its progress was saved on a Git branch.


My Core Toolkit

To put these rules into practice, I relied on a few key frameworks and MCPs.

  • Serena: An agentic toolkit that gives the LLM IDE-like capabilities, allowing it to understand and edit the codebase at a deep, semantic level. In practice I felt this was faster than Claude’s default editing, and supposedly more token efficient.
  • Playwright MCP: This gave the agent the ability to control a headless browser, enabling the self-correcting test loops that are essential for autonomous work.
  • Neon Databases MCP: This empowered the agent to manage its own isolated, branched databases, a cornerstone of the parallelization workflow.
  • Sequential Thinking: This simple but powerful MCP forces the AI to outline its plan before executing, dramatically reducing “intent drift” and keeping it on track.

Forget typing; Use Your Voice

One of the most practical changes I made was switching from typing prompts to using voice-to-text.

When we type, we instinctively optimize for the fewest characters possible. When we speak, we’re naturally more narrative. I found that by dictating my requests, I would automatically include more context, share more of my thought process, and explain the “why” behind the task.

This was especially powerful for bug fixes, where I could just talk through what the issue appeared to be. That little bit of extra, naturally-provided context made a significant difference in the quality and accuracy of the AI’s work throughout the entire process.


Shift From Writing Code to Architecting Systems

The leverage in AI isn’t about writing code faster. It’s about changing the nature of the work.

The value of an engineer is shifting from the act of implementation to the art of direction. Your worth will be measured by how many agents you can effectively manage and how well you can architect an intelligent, self-improving system.

This is why senior engineers and systems thinkers are about to become even more valuable. They have the architectural vision to direct these agentic systems.

My experiment was on a greenfield project. The next challenge is applying these techniques to a messy, legacy codebase. I’ll be sharing what I learn from that adventure very soon.


Discover more from zach wills

Subscribe to get the latest posts sent to your email.