Agent Orchestration is Not the Future

Matt Freeman

Folks smarter than me and with tons of experience keep saying that agent orchestration is the future of AI code development and AI application in general. I have been hearing this for months now.

I notice several patterns that make me doubt this:

1. Call it the Umami Lesson. The eventual incremental improvement of the core model intelligence always blows away whatever capabilities improvements can be obtained with advanced agent orchestration systems. Rather than spending 80 hours building an orchestration harness for 4.5 Sonnet, you would have been better off waiting for 4.5 Opus. Likewise, you are better off now just using 4.5 Opus to the edge of its ability and waiting for 5.0 Sonnet, or whatever is coming next. I assert this with confidence based on the pattern we have observed over the last couple of years. Your agent harness is not going to make the results that much better, it will take forever, and then the new model will make all your work irrelevant when it arrives anyway.

2. They never actually work. Folks build enormously complex agent orchestration frameworks. They write blog posts about them that are widely read, and publish them in Git repos which get thousands of stars. But when you talk to anybody who has actually built something cool, they say they used Claude Code, perhaps with a sprinkling of Skills and an MCP server if they needed it. Getting pulled into the swamp of agent orchestration seems to mostly lead to spending more and more time developing your elaborate agent orchestration framework. I understand the allure of this, but it is not productivity. All of my productivity happens in either Claude Code or the Claude desktop app. Agent orchestration frameworks are my guilty pleasure but I see them as, if anything, a kind of research project into management theory, moreso than a practical solution to a problem. I am not aware of any impressive artifact that was built by someone using an agent orchestration framework, that could not have been built in a similar amount of time without one.

3. The METR plot keeps improving. And the METR plot will keep improving. All of the plots, in fact, will keep improving. The models will get better and better and better; their context windows will get longer, they will use their tokens increasingly efficiently, they will make better decisions and write fewer bugs, they will ask better clarifying questions. This has been happening and will keep happening. There is no ceiling to this process in sight. My gut-check estimate is that a highly refined orchestration system, that you fully understand and carefully set up, might improve by 10% relative to Claude Code’s effectiveness just sitting there typing “continue” (with a similar level of time invested into your design doc). It surprises me not at all that the flavor of the week is Ralph Wiggum Mode, the Claude slash-command that literally just nudges the agent to check that it did what it was supposed to do, the most boneheaded level of “agent orchestration” imaginable. This is, in fact, the Pareto optimum; something to just poke the agent to keep going.

The main reason I feel somewhat qualified to comment on this is that I, like everybody, tried to build my own agent orchestration system. Two separate times, actually, once before the advent of Claude Code, and once after.

Before Claude Code, it failed for the same reason all these projects fail—you can’t organize a bunch of 80 IQ workers into a gestalt that operates at 140 IQ. It is, in fact, not possible. Or at least, nobody knows how to do it. If they did, they would be rich already.

After Claude Code, it didn’t so much fail as I incrementally made it simpler and simpler until I converged on something that was functionally the same as Ralph Wiggum Mode, and at that point realized that I should probably just be using Claude Code interactively like a normal person. And this just cemented the point to me: you don’t want an orchestration framework for multiple agents. You want a harness that will empower one agent really effectively, and you want that agent to be really smart and good at being an agent.

Nevertheless, I predict that we will keep doing this. We will keep building these systems because we as humans have a powerful intuition that to do really hard things you need lots of minds working together, communicating, arguing, coordinating. This is true for us, and the beings in our terminals now seem so much like us. But I see no evidence that they are like us in this particular regard. They don’t benefit from collaboration - the characteristic LLM pattern has always been that they get the right answer almost immediately, or they never do. And then next model release gets the answer right immediately.