Stop Posting About Claude Getting Worse, You're Embarrassing Yourselves

A note before I start: I am not a ranter. I usually read these posts, close the tab, and go ship something. Three of them have hit my feed in the last forty-eight hours, and that strategy is no longer working. So here we are.

Twice a day on Hacker News. Like clockwork. “Claude is getting worse.” “I cancelled my Pro.” “They’re degrading the model on purpose to save GPU.” Then the comments open and three hundred people who have never read a system card race to the bottom.

This is a skill issue. PEBKAC. The model is fine. You are bad at using it.

Receipts first, in case you want to close the tab. By day, I ship a real product. Frontend, backend, infrastructure, and the marketing apps that sit in front of it. Seventeen services across three environments, the engineering side owned by one person. This past month I tore out a monolithic ninety-second pipeline and rebuilt it as a five-agent flow that fault-isolates each stage and fans three agents out in parallel. Latency dropped 60% for a 3% cost bump. I built a gateway that fronts three independent knowledge stores behind one interface with a debounced ingest queue so concurrent writes do not trample each other. The security posture is mine. Per-tenant KMS keys. JWT-scoped routes. Every AWS service traffic over VPC endpoints. The OT buyers we sell into audit before they buy, so I built it the way they require. I shipped two customer-facing marketing apps in the same window. Critical-path test coverage runs in the thousands and stays green because I write the tests with the model, every time, before the code merges.

By night, I run a self-hosted personal chief-of-staff that fires fourteen scheduled runs a day, maintains a three-thousand-file knowledge graph that auto-cross-links new entities as they are mentioned, and serves a phone-friendly dashboard over Tailscale.

With Claude. Shipping.

Numbers. In 115 days of 2026, I ran 467 Claude Code sessions. 866 hours of model working time against 124 hours of mine. Seven times my own input. 87% of it autonomous. You can argue with the multiplier but you can’t argue with the ship list.

So when I tell you the model is fine, understand that I am not a hobbyist with a side project. I am running systems with paying users, and I do not have time to feel betrayed by point releases.

So what is actually going wrong for you?

What I can speculate: you treat new model releases like iOS updates. Tap the button, watch the progress bar, expect everything to feel the same but slightly faster. That is not what is happening. 4.5 to 4.6 to 4.7 is not a point release on the same product. The training mix changed. The reasoning shape changed. The way it responds to ambiguous instructions changed. If you don’t update your prompts, your harness, and your mental model when the model updates, you are using an old pair of glasses you found in a drawer and wondering why everything is blurry.

Reading a system card takes twenty minutes. Reading a Claude Code changelog takes five. People who built their entire workflow on these tools will not spend half an hour understanding what changed in the tools they depend on. They will spend an afternoon writing a thousand-word post about how the company is gaslighting them.

“Just don’t use AI to code.” Cool. Enjoy the moat of your typing speed and stay mad at the printing press.

“I cancelled too.” Nobody asked. The unsubscribe page is not a confession booth. The people getting value out of this are not racing you to the exit.

“They’re nerfing the model on purpose to save compute.” Flat-earth tier. Weights are static once they ship. The thing you talked to today is the thing you talked to last Tuesday. What changed is the harness (real changes, smaller than you think), but mostly your ability to perceive the difference between “the model is dumber” and “I asked it a worse question.” If you believe a major lab is gaslighting you to shave a few cents of inference cost, please do not ship code that talks to anything I rely on.

“Boris keeps shipping things that break my workflow.” Yes, the creator of a new coding paradigm wakes up every morning asking how to break your spirit personally. He lives for it. The changelog is right there. You did not read it.

“But you didn’t really build that, the model did.” I design the systems. I decide infrastructure runs as code. I decide nothing merges without tests. I sequence the build pipelines. I set the boundaries, write the failure cases, hook the harness, and review every diff before it merges. The model types. If your bar for authorship is keystrokes, you did not write your compiler either. Tools change what counts as authorship. They do not erase it.

How do I know the model is fine? I instrument it. I run a canary that fires a deterministic prompt at every model I depend on every thirty minutes and pages me when latency or token rate drifts. Two days ago it caught a latency degradation on Haiku through Bedrock. Not the model… the inference path. I knew within the hour, before anyone caught a vibe. That is the difference between feelings and a graph.

The actual problem is that prompting/context is not the whole game. Neither is the model. I have written about Boundary Engineering, but now we focus on Harness Engineering.

The model is one component. Around the model you have a harness: the loop that decides what context to load, what tools to expose, when to compact, when to stop. You have boundaries: where the model owns and where you own and what it can and can not do. You have workflows: the patterns you’ve built so the same kind of problem gets solved the same way every time. If your harness was tuned for 4.5 and you swapped in 4.7 without changing anything else, you are not measuring 4.7. You are measuring 4.7 wearing a costume tailored for an older sibling. The fact that you cannot tell is the problem.

I deleted my IDE. I work out of iTerm2 with three to eight tabs open at a time, each one a Claude session pointed at a different repo. I do not worry about whoopsies. Pre and post hooks exist. The harness literally cannot run certain commands because I told it not to. Blast radius is a configuration choice. The fact that yours is unbounded is not the model’s fault.

I have spent more time on harness engineering this year than on any single feature. I deleted entire categories of prompts because the new model does not need them. I rewrote my agent definitions when 4.6 dropped because the old ones were over-instructing it. I rewrote them again, more aggressively, when 4.7 dropped. The result is fewer tokens in, better output out. The other result is that I do not write blog posts about how the model is broken.

I can talk to a model for two hours before it writes a single line of code. Researching the codebase. Walking the constraints. Killing bad approaches before they cost me anything. When it finally starts writing, the code is right. That is not luck. That is a workflow.

Some people are framing this as a slot machine. That coding with a model is addictive the way pulling a lever for random rewards is addictive. Shut up. That is the dumbest thing I have ever heard. Slot machines pay out independent of input. Money in, slight chance more money out. That is the whole thing. The model’s output is downstream of your prompt, your context, your harness. If you feel like you are gambling, you have a workflow problem. The lever you keep pulling is your own.

Most of the people complaining are typing “fix this bug” into a textbox, getting bad output, and concluding the tool is broken. They skipped the part where the work actually happens.

A tool that adapts faster than you do exposes your willingness to adapt. The senior engineers I know who are getting the most out of Claude are not the ones with the fanciest prompts. They are the ones who treat every release like a small migration. New model, new defaults, new workflow tweaks, new things you can stop doing manually because the model finally does them right.

If you are not doing that work, the model will feel like it is getting worse, because the gap between what it can do and what you are letting it do widens with every release. That is not the model degrading. That is you, standing still on a treadmill that keeps speeding up.

Stop telling on yourself in the comments. Read the system card. Rewrite the prompts. Update the harness. Build a workflow you can actually trust. Or, sure, cancel and go back to writing tab completion macros. The rest of us are busy shipping.

Not a guide per se, but a bare minimum. If you are going to keep paying for this thing and want to stop being wrong on the internet:

Read the system card on release day. Read the Claude Code changelog too. Twenty minutes for the model card, five for the changelog. The card tells you what changed in training. The changelog tells you what changed in the tool you actually use. Not optional.
Diff your prompts. Most of your CLAUDE.md assumes the old model’s failure modes. Cut those instructions. The new model already knows things you used to have to spell out.
Audit your harness. Tool count, context size, when you compact, when you delegate to sub-agents. The right shape moved when the model moved. You have not checked.
Wire up pre and post hooks. The harness should not be able to run commands that scare you. Forbid them at the hook layer. Blast radius is a configuration choice.
Build a real workflow. A reproducible loop for the kind of work you actually do. Not a vibe. Same problem, same approach, every time, so you can measure regressions that are not just “felt off today.” If something worked really well, codify it in a skill or your CLAUDE.md

Do those five things and if the model still feels worse, congratulations, you finally have evidence. Post that.

Some folks have read this as harsh and uppity. Others have asked for the how. Both fair, quick on each.

On the harsh read. What I’m frustrated by is the default mode of the conversation when something feels off in someone’s workflow: blame big tech. The model is worse. The platform is throttling me. Anthropic is gaslighting us. None of those takes require critical thinking, and they all let the person making them off the hook for examining their own setup. Big tech deserves criticism in specific provable cases. Defaulting to “they nerfed the model” any time something feels off is the opposite of doing the engineer’s work.

Yes there was a recent post on how a small amount of users saw three bugs in totally different time frames across the last 90 days. Either I was lucky, or my workflow happened to “dodge” the bug. The only thing I know for sure is I did burn tokens very quickly in 4.6 for about 2 days but I was also in the middle of a LARGE refactor and had about 10 tabs open running multi agents… so maybe just a coincidence, I can’t be sure.

On the how. Seven concrete patterns from my last 30 days of Claude Code work.

Sub-agents for isolated context. Agent tool used 39 times in 30 days. When I need to audit a large number of things across our codebase, I delegate to a sub-agent and consume the summary back. My main thread keeps its context budget for the work that needs it. I also use clear and compact A LOT. I use a pre-compact skill that I wrote so the handoff is more seamless and less chancey. Which leads too…

Skill ecosystem. Codified workflows for the recurring stuff. Database queries, pipeline status, Jira triage, org cleanup, onboarding new customers, security review. Each is a SKILL.md file the model loads on demand. “/pipeline-status what’s failing in dev” loads the skill and follows it, I don’t need to re-explain the workflow session over session.

Pre/post hooks. Real entries in settings.json that block specific commands at the harness layer. The model can’t run rm -rf, can’t push to main, can’t touch production credentials, can’t delete prod stores (that one was a lesson learned). Hard guardrails at the system level and Claude responds “the harness denied me doing that”.

Iterative refinement is the dominant session shape. Half the sessions in 30 days were multi-turn improvement loops on a single target. The harness is tuned for that loop. The /research → /plan → /implement workflow lives in this register. I mentioned it in the post but I can talk about architecture design for days through multiple iterations before I ever say “GO”.

Multi-repo orchestration. One recent session touched 5+ repos to fix CI failures (dependency versions, test assertions, auth headers, TypeScript errors). I do one Claude Code session that follows the feature through out each repo. Focused context per where that code should live in each step of the data flow, and I let the other agents know there is other stuff going on. I tried to do a mutual scratchpad between agents but it was always too messy.

Programmatic over manual. Anywhere I can build a tool that replaces asking, I do. A dashboard monitors all our build pipelines from one place. I see what’s failing the moment I open it. Plus small custom utilities for anything I do more than twice. The compounding return on these is bigger than people think. Which is also where things like the canary comes in and again pre/post hooks.

Instrumentation. I built an activity-report tool on top of /insights. If you’re not using /insights, start. It’s the most underrated feature in Claude Code right now. That is where I got the numbers from the post. It also tells you exactly what it did wrong and what to add to CLAUDE.md to avoid that issue. It also coaches you on how to use it better, which is where I learned a lot of what I am saying here.

I hope this answers the questions, and no I am not a mean spirited person, I just like to vent into the ether sometimes, and surprisingly this is my most popular post and it is by far my most “divisive”. So say what you want but CLEARLY humans like drama and abrasiveness.

Stop Posting About Claude Getting Worse, You're Embarrassing Yourselves

Discussion about this post

Ready for more?