Anatomy of a good ad-hoc claude agent team

9 min read Original article ↗

I spent my first few weeks with agent teams treating them like a faster version of a single agent. Describe the task, let multiple agents run at it, collect the output. The results were fine. Not bad, not great. Roughly what I’d get from one agent, just more of it.

The shift happened when I stopped thinking about agents and started thinking about teams. Not AI teams. Teams. The kind I’ve built and run in organizations for years. The dynamics that make a group of people produce better work than any individual (role clarity, productive tension, structured disagreement, a single decision-maker) apply when the team members are Claude instances running in your terminal.

That realization changed how I use Claude Code. Agent teams are the third mode, after the main agent and sub-agents (background processes that work in isolation). Teams are background processes that work collectively, with the main agent relaying messages between you and them. They’re off by default; you enable them with CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 (the Claude Code docs cover the full setup). They’re the mode that made the difference.

Rather than walk through the theory first, I want to show what I’ve landed on and then take it apart.

The prompt

A recent prompt of mine looked something like this:

We're redesigning the notification preferences system. Users can't
figure out how to mute specific channels without muting everything,
and support tickets on this have tripled. I want you to work on this
with an agent team.

Outcomes:
1. Users can configure preferences per channel without affecting
   global settings
2. Fits within existing settings architecture (no new services)
3. Non-breaking migration from current schema

Agent team composition:
- Lead (you): The only person allowed to build, keep yourself available
  when you're not building so that you can relay my messages quickly to the team
- Principal engineer: conservative on scope, leads with questions,
  reviews all proposals for hidden complexity
- Frontend specialist: UX and interaction patterns for preference
  management
- Platform engineer: data model, API surface, migration strategy
- Director of Customer Success: voice of the customer, knows the
  support patterns and real-world workflows this needs to serve

Workflow:
1. Lead does light research on the current notification system
2. Spin up the team and brief them
3. Teammates research independently, propose approaches from their
   domain
4. PE runs a roundtable: questions each proposal, surfaces trade-
   offs. If an expert raises a concern, investigate it before
   moving on. Drive toward consensus
5. Present findings and agreed approach to me for approval
6. Once I greenlight, lead implements. Only the lead writes code
7. Keep the team for review. Teammates evaluate the work against
   what was agreed. Back to step 4 if concerns arise
8. Present completed work to me before committing

The specifics change every time, but the shape stays the same: problem, outcomes, team, workflow. Four blocks. Each one is load-bearing, and it took me a while to understand why.

My first instinct was to design a universal team. One configuration I could point at any task. It didn’t work well. The roles that matter for a frontend redesign are different from the ones that matter for a backend refactor or a strategic analysis.

Now I build teams fresh for each task. The roles come from the work, not from a template. Reusable patterns exist for certain categories of work, but starting ad hoc forces you to think about what the task actually needs rather than defaulting to what worked last time.

Why problems, not features

The top of the prompt doesn’t say “add per-channel notification toggles.” It describes a situation: users can’t figure out how to mute specific channels, and support tickets have tripled. Then it lists outcomes, not specifications.

The framing matters. I describe problems to solve, not features to build. “Users are dropping off during onboarding because they don’t understand the value prop before being asked to configure” produces better work than “add a welcome tour with three steps.”

Problems give the team room to reason about the right solution. Features constrain them to executing yours.

Why one writer

In the team composition block, the lead is defined as “the only person allowed to build.” This is the single most important constraint.

In every team I set up, only the lead (the main Claude Code process) makes changes to files. Teammates research, analyze, critique, and propose, but the pen stays with one agent. Without this, teammates overwrite each other’s work, and you spend your time untangling conflicts instead of making progress.

Why a principal engineer

The PE role is modeled as the expert minority in an organization. Part devil’s advocate, part quality bar. They lead with questions, not directives. They can’t make decisions unilaterally, only surface concerns and build consensus through discussion.

Depending on the task, I tune this role differently. For refactors and bug fixes, they’re conservative on scope. For product work, they’re allowed to suggest increasing scope if it solves the underlying problem more elegantly. The constant is that they challenge, they don’t dictate.

Why domain experts

If I’m working on a portfolio site, I might add a brand strategist and a design director. For an API redesign, a frontend consumer of the API and a platform engineer. In the example prompt, the frontend specialist and platform engineer each bring a lens the others lack.

The value shows up in the roundtable. When the frontend specialist proposes a tabbed preference panel, the platform engineer can flag that the current schema doesn’t support per-channel overrides without a migration. That tension between what’s ideal for the user and what’s feasible in the system is where the best solutions come from. Without both perspectives in the room, the team either builds something users love on a foundation that won’t hold, or something architecturally sound that no one can figure out how to use.

The Director of Customer Success in that example prompt is there for a reason I didn’t expect to matter as much as it does. In a real organization, when a non-technical person sits in a room with engineers, hierarchy shapes whose input carries weight. A VP of Engineering’s half-formed opinion about user workflows can override what the support lead knows from handling hundreds of tickets. People defer to title, not to expertise. I’ve watched it happen in my own teams.

Agent teams don’t have that problem. The Director of Customer Success contributes at the same level as the platform engineer because there’s no org chart to defer to. The PE treats their input with the same rigor as anyone else’s: questions it, pressure-tests it, works it into the consensus. A customer-oriented voice that would get politely nodded at in a real meeting becomes a first-class constraint in the design. They inherited the useful dynamics without the politics that distort them.

The right experts depend on the task, which is another reason ad hoc beats permanent.

Why the workflow matters

The team composition is half of it. The other half is giving them a way to work together. Without a defined workflow, agents do what unsupervised teams do: they go off in different directions and come back with incompatible ideas. Or worse, they all converge on the same safe answer because no one was asked to push back.

The workflow in the prompt has a deliberate rhythm.

It starts with the lead doing light research. Not solving the problem, just gathering enough context to brief the team well: what’s the current state, what’s been tried, what are the constraints. The lead forms a preliminary view, but holds it loosely.

Then the lead spins up the team and briefs them: the problem, the context, the desired outcomes. Each teammate goes off and does their own research from their domain perspective. The frontend specialist looks at interaction patterns; the platform engineer, at the data model. They come back with independent proposals.

This is where the principal engineer earns their role. They convene what I think of as the roundtable: a structured discussion where proposals are questioned and trade-offs surfaced. One thing I’ve found matters here is that the PE doesn’t just collect concerns, they make sure each one is investigated. If a domain expert raises a flag, the group digs into it before moving on. No one picks a winner. The PE asks the questions that reveal which approach holds up under pressure, and the group works toward consensus through that process.

Before any implementation begins, the lead presents the team’s findings and the agreed approach to me. I stay in the loop at this checkpoint because the team can produce a well-reasoned plan that’s solving the wrong problem, or one that’s scoped beyond what I want to take on right now. Once I greenlight the direction, the lead implements (only the lead).

After implementation, the team stays on for review. The teammates evaluate the work against what was agreed, and if concerns come up, the cycle goes back to the roundtable. Once the team is satisfied, the lead presents the completed work to me before committing. Two human checkpoints: one before work begins, one before it ships.

I define this workflow explicitly in every prompt because agents, left to their own devices, skip the discussion phase. They jump from research to implementation. The roundtable and the checkpoints are what produce the quality difference, and they only happen if you ask for them.

Different work, different teams

Much like product and engineering organizations, no single workflow fits all situations. A complex architectural decision benefits from a five-person team with extended discussion. A well-scoped bug fix might need only the lead and a reviewer. Tiger teams for ambitious sprints, parallel workstreams for longer efforts, a single engineer with a code reviewer for focused tasks.

I keep coming back to a question: what is it about certain team compositions that makes them consistently outperform others? The patterns I’ve borrowed from human teams have worked well so far, but I suspect agent teams have dynamics of their own that I haven’t learned to see yet.

I run with --dangerously-skip-permissions to avoid approving every tool call across every teammate; allow lists in .claude/settings.json are the more cautious path.