Claude: "Yes, I did it, it's perfect"

There was a month earlier this year where I caught myself just copy-pasting client bug reports into Claude Code.

The agent was running with --dangerously-skip-permissions (it doesn't ask me to confirm anything. It just runs). I'd paste in what the client sent over, walk away, come back ten minutes later, and check if the changes made sense. If it did, I shipped. If it didn't, I'd paste the error back in and let Claude figure it out.

What was I actually adding? I wasn't writing code. I wasn't making architectural calls. I was a copy-paste bridge between a client's Slack message and a capable-enough agent doing the work. And I was charging for it.

Something had shifted. I just hadn't named it yet.

Clients like you

The copy-paste realization wasn't the start of the existential crisis. Just one wave of it. The crisis had been simmering for a while, on and off. Some weeks my wife could read it on me across the room. Some weeks I'd be ranting at strangers at kids birthday parties. The question underneath was always the same: what am I actually adding here? And underneath that one, the heavier version: what work will my kids do?

What boiled it over was the inbound.

We started getting messages from clients who'd already built the thing.

Not "I have an idea." Not "Here is a broken Lovable app." Actual working applications. With real users. Stripe integrations. Production-ish. Internal tools, customer-facing SaaS, both. Usually running somewhere on Vercel or Replit. The founders hadn't written any of the code themselves. They'd prompted it into existence with Claude Code, Replit, Lovable, etc. Three years ago, people came to us because they couldn't build the thing. Now they came because they already had, and they couldn't tell what they were holding. One was a COO at a large grant-services company. Another was a financial planner building a practice-management platform for other advisory firms. Another was an operations director at a medical practice with dozens of providers.

On a call with one of them, the rant slipped out: "This is kind of an existential crisis, right? The AI is getting better. What's the role of developers?"

Then I heard myself say the quiet part out loud: "We're encountering a lot of clients like you. People who took the project 90% of the way themselves."

Three or four more calls in the same shape over the following weeks really ground it in. This was the new top of our funnel. If our value was "we can build it for you," our value was about to be zero.

Under the hood

What pulled me out of the spiral was actually looking at the apps.

They looked great on live demos. Some even had a handful of real beta users. Then we'd ask about the code architecture, or the deployment pipeline, or logging, backups, security. Blank stares. Or we'd go look ourselves and find horrors.

The clearest framing I've heard for this came from my account lead. He used it with a client last week, and I've been stealing it ever since:

❝

A team of shipbuilders starts by designing the hull, the electrical, the plumbing, and the security system as unified blueprints. Every switch connects to the right circuit, every pipe lines up with the next room, every door has its own key. What happens instead is a passenger doing the directing. They walk the AI room to room: "this room needs lights, this one needs a toilet, oh and we probably need locks." The AI builds each room to match the description, wiring lights into whatever circuit is closest, running pipes through whatever wall is nearby, copying the same key for every door.

Everything that was asked for got built. Every room is legitimately in the ship. But the ship is not a ship. It's a set of rooms floating next to each other. The moment the ship hits water, everything floods.

I keep coming back to this metaphor whenever the existential crisis creeps back (usually after a new round of AI hype). Yes, AI will keep getting better at building what you ask for. What doesn't improve on the AI's side is knowing what you should have asked for. For a passenger-prompted app to actually hold up, the AI would have to upskill the prompter into a shipbuilder. An explanation can only be simplified so far before it stops being accurate enough to decide from. The ceiling isn't the AI. It's the passenger.

The one who stopped, and the one who didn't

Both cases below draw from real engagements. Industries and identifying details have been changed to protect client confidentiality. The technical findings are real.

Two cases from this month make the pattern visible.

Client one. A COO at a medium size grant-services firm. Her CEO asked her to eliminate the team's busywork. She opened Claude Code and, over sixty days of evenings and weekends, built them an internal client-management and workflow-tracking app. Vercel, Supabase, Microsoft auth, document upload, email integration, the works. She showed it to her coworkers and, by her words, they "were picking their jaws off the floor."

Then she stopped. She'd set up a QC step "kind of loose on purpose" because she knew her coworkers wouldn't use anything rigid. But she'd hit the ceiling of what she could evaluate herself, and she knew it. "I don't know what I don't know," she told me in the first meeting. "So maybe there's something I just don't know to be worried about."

On a first-pass review her codebase actually looked pretty good. Claude Code's default output is usually fine at a surface level. The deeper pass was a different story. The build script ran prisma db push --accept-data-loss on every deploy, meaning any schema change could silently drop production data. She'd been developing locally against a Supabase database while production actually ran on a completely different Neon instance. She didn't realize the two weren't connected, and Claude Code hadn't flagged it either. Fourteen API routes were completely unauthenticated, including admin endpoints exposing contact PII.

If she'd pushed on without the audit, the most likely outcome was some combination of silent data loss on a deploy, a security breach via the open admin routes, or a codebase too entangled to recover.

Client two, the contrast. A financial planner building a practice-management platform (client onboarding, portfolio tracking, document management, billing, compliance reporting) as a commercial B2B product for other advisory firms. Bootstrapped from personal savings. Built solo on Replit using AI. By the time we got on the discovery call, she'd done several demos, had more firms in the pipeline, and had a target launch date a month out.

She had also, the week before, signed a contract with a SOC 2 compliance firm. SOC 2 Type 2 is the enterprise-grade security attestation. It requires demonstrating six to twelve months of operational controls.

My team found over a hundred issues in the first six hours of audit. The backend was a single seventeen-thousand-line file. There was no tenant isolation, which for a multi-tenant financial app means any advisor at any firm could hypothetically pull another firm's client records and portfolio data by URL. Payment tokens were generated with Math.random(). A production API key for a third-party document service was hardcoded into frontend code shipped to every user's browser. Client custodial account balances - the kind that trigger regulatory action when they're wrong - were read and written without a database lock. Her actual state was a product with zero working access controls.

On the call, she said something I suspect every vibe coder comes to eventually. The later they get there, the more expensive it is:

❝

I think I was overly optimistic. You can build so fast with AI - it's just, I'll do this feature next, I'll do this feature next. And [the coding agents] are so encouraging. They're just like, yes, I did it, it's perfect.

She isn't reckless. Reckless would require her to have known what she was missing. Shes a real financial professional with real domain expertise in an industry that genuinely needs better tooling. She saw a gap, used the tools available, built something impressive. The tools didn't tell her when to stop. The AI told her, every feature along the way, "yes, I did it, it's perfect." And she believed it, because she had no other reference point.

Two founders. Same tools. Different outcomes. The difference wasn't capability. It was knowing where the boundary was. The first one knew because she'd spent a career running a 22-person company and had developed the instinct that when something is load-bearing, you call someone who knows what load-bearing means.

What this means for the agency

I'm rewriting our landing page this month. The pitch used to be about building software. Now it's about owning whether the thing actually works in the real business it's deployed into. Whether it holds under load. Whether it survives a regulator. Whether it stays in sync with a workflow that changes every quarter.

I'm still figuring this part out. I don't have a neat playbook for what a sustainable AI consultancy looks like in 2028 - nobody does. But I know what we're investing in: getting deep enough into specific industries (legal, healthcare, professional services) that we accumulate the business context an LLM can't.

And I know what we're not doing anymore. The month I described at the top, where I was a copy-paste bridge and charging for it: that was brief, maybe six weeks. Most agencies are still inside that window and not saying so out loud. The honest version is that a lot of "developer hours" are AI hours with a human copy-pasting, and the invoice doesn't distinguish. Going forward, clients pay for human time and judgment. AI compute is overhead. If a problem can be solved by handing it to Claude unsupervised, we shouldnt be charging for it, and we're going to stop pretending otherwise.

The bet

So here's the bet I'm making with the agency.

Code will be effectively free. But the ceiling I described will hold for a long time. The limitation isn't the AI, it's the passenger. The work that matters stays stubbornly human: understanding a business deeply enough over time to be trusted with its outcomes, owning the consequences when something goes wrong, knowing where the load-bearing walls are. Not because humans are magical, but because knowing what to ask for is still the whole game, and AI hasn't changed that yet.

That's a narrower strip of ground than agencies used to stand on. "We write good code" is gone. What's left is knowing which questions to ask before the first line gets written.

The COO at the grant-services firm understood this, I think, in the second meeting. She said, "I'm willing to be your guinea pig, because I know that this is new." She was telling me she knew we were both figuring out the rules in real time, and she was okay with that, because the alternative, her and Claude alone, had a ceiling that scared her.

The financial planner building the advisory platform would probably have learned it, eventually. Or she'd keep racing the avalanche and one of her queued firms would discover the cross-firm data leak, and she'd learn it the expensive way.

AI writes the code. We own the outcome. I don't know if that's enough. But it's a better bet than billing for Claude time and hoping nobody notices.