[MODEL] Quantified evidence: Sonnet 4.6 quality regression since March 9 — 1400+ frustration events across 50 sessions

3 min read Original article ↗

Summary

I have quantified data showing a dramatic quality regression starting the week of March 9, 2026. This is not a vibe check — it's measured from 60 days of conversation logs across 50 sessions.

Data

I track how often I need to repeat instructions or correct Claude (my "WTF frequency"). Here's the weekly distribution:

60-day data across 50 sessions:

W09 (Feb 24)    26  ██                                ← baseline
W10 (Mar 02)    23  ██                                ← baseline
W11 (Mar 09)   221  ██████████████████████            ← started (8.5x baseline)
W12 (Mar 16)   484  ████████████████████████████████  ← peak (outage week)
W13 (Mar 23)   479  ████████████████████████████████  ← still bad
W14 (Mar 30)    65  ██████                            ← relax
W15 (Apr 06)   294  █████████████████████████████     ← under way
  • Baseline (W09–W10): ~25/week
  • Peak (W12–W13): ~480/week — 19x baseline
  • Total across 50 sessions: 1,400+ events

Affected models

I've been forced to switch from Sonnet to Opus as my primary model. Sonnet 4.6 is basically unusable now. My subjective rating of current model quality:

  • Opus 4.6 now = Sonnet 4.6 before
  • Sonnet 4.6 now = Haiku before
  • Haiku = Haiku (unchanged — nothing left to degrade)

This means I'm paying Opus prices for what used to be Sonnet-level performance.

What "regression" looks like in practice

The model consistently fails to:

  1. Follow its own reasoning loop (OODA) despite explicit CLAUDE.md instructions
  2. Read files before modifying them — guesses instead
  3. Stop repeating the same mistake — same error 5–8 times per session without self-correction
  4. Follow explicit behavioral constraints across sessions ([MODEL] Opus 4.6: Systematic Failure to Follow Explicit Behavioral Constraints Across Independent Sessions #41217)

Timeline alignment

The W12 peak (March 16) aligns exactly with:

Environment

  • Claude Code CLI v2.1.94
  • Linux
  • bypassPermissions mode
  • Heavy skill/hook usage with structured CLAUDE.md rules

What I'm asking

  1. Acknowledge that model quality has regressed — the data is clear
  2. Explain whether this is a compute constraint issue (as widely suspected) or a checkpoint/RLHF regression
  3. Provide a model versioning mechanism so users can pin to a known-good checkpoint

Related issues