Recursive Refinement

3 min read Original article ↗

Recursive refinement is a protocol that has a team of agents revisit their work against the original ask, again and again, until it is done. It exists because of a pattern I kept seeing inside my own sessions. The team gets to 80 or 90 percent and the internal review loop starts answering the wrong question. Reviewers default to grading tolerance. They look at what was produced and ask whether it is good enough, which is a different question from whether the original ask has been satisfied. The first question gets a yes too early. Only the second question keeps closing the gap.

The protocol works through two mechanisms. The first is probe-before-scoring. Before any score counts, the reviewer has to answer what the original ask still requires. A score given without that answer is not valid. They can no longer assess the artifact in isolation. They have to hold the artifact next to the ask and name what is missing. Only then can they put a number on it. The facilitator holds the gate.

The second mechanism is the rung ladder. Once the team scores the work at 9 out of 10, I get a choice: ship, or go deeper. If I choose deeper, the sequence is fixed. 9.25, then 9.5, then 9.75, then 10. No exit once you step onto it. Each rung forces the reviewer to find the next gap and the team to close it. The reviewer is never the author of the production work, and the person who decides “done” does not produce it. The author wants to be finished. The reviewer wants the ask satisfied. Holding those two impulses in different agents is what gives the protocol its grip.

I built this into the Swarms plugin for Claude Code. Everyone on my team runs it now. Others using Swarms have picked it up.

The origin is mundane. Sometime around January, when Agent Teams landed in Claude Code, I started running them on my own work and the output sharpened, mostly in quality rather than quantity. Quality works that way: when you have it, you want more. So I kept doing the same thing at the end of every implementation: opening a fresh prompt and asking the team to revisit what they had just shipped. New prompts, because one-shotting makes me a better communicator and the results compound week over week. But Agent Teams are not free, and a fresh prompt for every revision pass is an expensive way to chase the last 10 percent. I needed the revisit to live inside the same run, with a structure that did not let the team off the hook. The pattern repeated enough across sessions that it could become a command. Recursive refinement was born.

What the protocol closes is the execution gap: the work gets to the outcome you asked for, side effects and all. What to build, what to ship, what counts as success: that judgment stays with you.