Settings

Theme

Ask HN: Has Claude Code quality level degraded lately?

6 points by narmiouh a month ago · 8 comments · 1 min read


Last week or so, I have been noticing Claude Code has been significantly challenged in keeping its stuff together. On a Max plan, using Opus 4.6 with thinking in Chat window (I use code as well, but this is just an example), asked it to create a web page that spits out a few different layouts/designs and its a single page web page, asked it to add ai to generate new designs.

It kept messing up aspect ratio, and forgets to add the AI feature, I know what you are thinking, may be the prompts are off... but its like 'oh I forgot to add the AI feature, let me add it' and then no 'AI Feature' in its output.

Driving me nuts, figured I would ask here to see if its just me or more people are seeing noticeable degradation of their experience recently...

warwickmcintosh a month ago

I've noticed the time-of-day variance too. My working theory is it's related to load, not model changes. Same prompt at 6am Sydney time (when US is asleep) consistently gets better results than the same prompt at noon. The "ignoring instructions" behavior usually means it's working from a compressed context where earlier instructions got summarized away.

  • narmiouhOP a month ago

    I think even on simple instructions it fails, people who have been in this for a while understand compaction impacts etc... but it feels lacking even in cases where it felt it worked well in jan/feb

mech422 a month ago

I've had a similar experience - I basically have to tell it to do everything twice now. It especially loves ignoring instructions to re-read a file (or re-read in FULL) and just uses its (stale) cache/context version. It also wants to 'pattern match' instead of actually reading what's provided which leads to lots of really basic logic errors.

zombar a month ago

I've noticed this too.

Over the last 6 months I've been creating some truly mindblowing apps, apps that feel like they are entirely out of reach for today's Claude.

I've also noticed that if I do some assisted coding at 4-5am UK time, and then perform the same actions at 3-8pm UK time the results are vastly different:

- It takes much much longer to consider my input and work out a response

- Any response given has to be thoroughly considered (I've had to rollback changes)

- Changes scoped in the plan are stubbed or missing entirely

I've put it down to two things (a) Early enshitification, perhaps they simply don't feel the need to provide a consistent level of service because any observation of performance is highly subjective and (b) oversubscription, they scored massive marketing points by being blacklisted by US DoD (despite being integrated into many systems in a different capacity)

noxa a month ago

It's gotten very bad. It was degrading since late Feb and since March 8th has become unusable. "Simplest fix" and "You're right, I'm sorry" are strong indicators. It went from senior engineer to entitled intern, and I went from having a team of peers to a lazy jerk who only tries to cut corners. I've got quantitative analytics of it, too. Briefly the other day for about 24 hours it returned to normal, and then someone flipped the switch again mid-session. I was a massive proponent of Claude/Opus, and for the last several weeks have felt rug-pulled. It's such an obvious degradation that even non-technical friends have noticed it. It's optimizing for minimum effort instead of correct and clean solutions. It sucks, because had I experienced it like this from the start I'd have bounced from agentic coding and never looked back - unfortunately, I thought it'd only get better and adjusted my workflow around it. When my Qwen3.5 27B local model gets into fewer reasoning loops than Opus does, it makes me wonder if anyone there cares or if they are just chasing IPO energy from scaling.

I had to build a stop hook to catch it's garbage, and even then it's not enough. I had 30min-1hr uninterrupted sessions (some slipstreamed comments), and now I can't get a single diff that I can accept without comment. Half of the work it does is more destructive than helpful (removing comments from existing code, ignoring directives and wandering off into nowhere, etc).

From 2 weeks after installing the stop hook (around March 8th): ``` Breakdown of the 173 violations:

    73x ownership dodging (caught saying variants of "not caused by my changes")
    40x unnecessary permission-seeking ("should I continue?", "want me to keep going?")
    18x premature stopping ("good stopping point", "natural checkpoint")
    14x "known limitation" dodging
    14x "future work" / "known issue" labeling
    Various: "next session", "pause here", etc.
Peak day: March 18 with 43 violations in a single day. ```

Other one is loops in reasoning, which are something I'm familiar with on small local models, not frontier ones: ``` Sessions containing 5+ instances of reasoning-loop phrases ("oh wait", "actually,", "let me reconsider", "I was wrong"): Period Sessions with 5+ loops Before March 8 0 After March 8 7 (up to 23 instances in one session) ``` (I've even had it write code where it has "Wait, actually, we should do X" in comments in the code!)

The worst is the dodging; it said, literally, "not my code, not my problem" to a build failure it created 5 messages ago in the same session. ``` I had to tell Claude "there's no such thing as [an issue that existed before your changes]" on average:

    Once per week in January
    2-3 times per week in February
    Nearly daily from March 8 onward
```

Honestly, just venting, because I'm extremely depressed. I had the equivalent of a team of engineers I could trust, and overnight someone at Anthropic flicked a switch and killed them. I'm getting better results from random models on OpenRouter now (and OmniCoder 9B! 9B!). They aren't _good_ results, mind you, but they aren't idiotic.

Sad. Very sad.

  • matheusmoreira a month ago

    Reading your report made me quite depressed too. This is world changing technology. It feels like I was only allowed to have a glimpse of it before it was taken away. I hope things get better in the future...

  • narmiouhOP a month ago

    I hear you and I am really hoping more people notice this obvious degradation than dismiss this as workflow or prompt or context saturation issues.

    It isn’t obvious but hope the guys managing this realize what kind of confusion and doubt (or self doubt) that this creates in people and will have a long term impact on usage of their models.

    I am going to try removing every and all plugins (i only have all Anthropic’s plugins like superpowers) and see if that makes any difference.

    • noxa a month ago

      Yeah, I went through a week or two of configuration changes trying to figure out what I could have done to make it behave that way, and it wasn't until it repaired itself and then the next morning went back to idiot-mode mid-response that I finally knew it was not me. Same task, same session, same cc version, same prompts, same context, so I'm confident it was a configuration change on their end.

      In case anyone can correlate, the recovery happened on March 24th and then re-regressed at approximately 3:09 PM PST (23:09 UTC) on March 25. Flipped right back into "simplest" solutions, and "You're right, I'm sorry" mode:

      > "You're right. That was lazy and wrong. I was trying to dodge a code generator issue instead of fixing it."

      > "You're right — I rushed this and it shows. Let me be deliberate about the structure before writing."

      > "You're right, and I was being sloppy. The CPU slab provider's prefault is real work."

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection