Same prompt, worse results

2 min read Original article ↗

Back in March, I shared a command I was using to spawn agent teams. A lot of people tried it. On greenfield projects, it worked well. On existing codebases, the results weren’t as good as mine.

When I tested the prompt on my system on the same codebases, my results were better. Theirs weren’t, even when the prompt looked identical. I went looking for the difference and it wasn’t where I assumed. What wasn’t identical was everything around it: a different CLAUDE.md, different memories, different local skills, different settings hooks. My environment had been rewriting what the prompt meant. Theirs was rewriting it differently.

I thought I was sharing the whole system. I was sharing the visible half.

The first attempt at a fix was a mega-prompt: every assumption I’d buried in memory, hoisted into one block of text people could copy. It helped at the edges. But it had an activation cost I hadn’t priced in. To wield it well, you needed the same tacit fluency the original prompt had assumed. I’d moved the problem, not solved it.

Once I stopped trying to write the perfect team for people and instead handed an LLM the rules a good team follows, it constructed teams I’d ship with. Teams where the review round caught real problems instead of producing cheap approval. Not the teams I’d build, but ones that did the job, with room left over for the kind of fine-tuning I enjoy doing anyway.

That collapsed the activation problem in a way the mega-prompt couldn’t. The plugin I ended up with is called Swarm, and it’s the rules and phases of my agent-team workflow packaged so they travel with the work instead of living silently in my machine. You can find it at github.com/DheerG/swarms.