[MODEL] Quantified evidence: Sonnet 4.6 quality regression since March 9 — 1400+ frustration events across 50 sessions

Summary

I have quantified data showing a dramatic quality regression starting the week of March 9, 2026. This is not a vibe check — it's measured from 60 days of conversation logs across 50 sessions.

Data

I track how often I need to repeat instructions or correct Claude (my "WTF frequency"). Here's the weekly distribution:

60-day data across 50 sessions:

W09 (Feb 24)    26  ██                                ← baseline
W10 (Mar 02)    23  ██                                ← baseline
W11 (Mar 09)   221  ██████████████████████            ← started (8.5x baseline)
W12 (Mar 16)   484  ████████████████████████████████  ← peak (outage week)
W13 (Mar 23)   479  ████████████████████████████████  ← still bad
W14 (Mar 30)    65  ██████                            ← relax
W15 (Apr 06)   294  █████████████████████████████     ← under way

Baseline (W09–W10): ~25/week
Peak (W12–W13): ~480/week — 19x baseline
Total across 50 sessions: 1,400+ events

Affected models

I've been forced to switch from Sonnet to Opus as my primary model. Sonnet 4.6 is basically unusable now. My subjective rating of current model quality:

Opus 4.6 now = Sonnet 4.6 before
Sonnet 4.6 now = Haiku before
Haiku = Haiku (unchanged — nothing left to degrade)

This means I'm paying Opus prices for what used to be Sonnet-level performance.

What "regression" looks like in practice

The model consistently fails to:

Follow its own reasoning loop (OODA) despite explicit CLAUDE.md instructions
Read files before modifying them — guesses instead
Stop repeating the same mistake — same error 5–8 times per session without self-correction
Follow explicit behavioral constraints across sessions ([MODEL] Opus 4.6: Systematic Failure to Follow Explicit Behavioral Constraints Across Independent Sessions #41217)

Timeline alignment

The W12 peak (March 16) aligns exactly with:

The March 17 outage acknowledged on status.claude.com
r/ClaudeCode thread "After the outage today, does Claude feel dumber?"
Multiple GitHub issues filed the same week ([Bug] Claude Code model regression: degraded code quality and reliability #37052, [BUG] Anthropic Claude Model Degradation — Sourced Evidence #35271, [Bug] Claude reads files but ignores actionable instructions contained in them #32290)

Environment

Claude Code CLI v2.1.94
Linux
bypassPermissions mode
Heavy skill/hook usage with structured CLAUDE.md rules

What I'm asking

Acknowledge that model quality has regressed — the data is clear
Explain whether this is a compute constraint issue (as widely suspected) or a checkpoint/RLHF regression
Provide a model versioning mechanism so users can pin to a known-good checkpoint

Related issues

Opus 4.6 is getting dumber #46106 Opus 4.6 is getting dumber
[MODEL] Opus 4.6: Systematic Failure to Follow Explicit Behavioral Constraints Across Independent Sessions #41217 Systematic Failure to Follow Explicit Behavioral Constraints
[BUG] Silent context degradation — tool results cleared without notification on 1M context sessions this issue documents three separate mechanisms (microcompact, cached microcompact, session memory compact) #42542 Silent context degradation
Opus 4.6 1m has degraded performance and now acts dumber than Sonnet 3.5 #38338 Opus 4.6 acts dumber than Sonnet 3.5
[Bug] Claude Code model regression: degraded code quality and reliability #37052 Claude Code model regression
Regression: Opus 4.5 "Shadow Downgrade" Significant reasoning quality drop & aggressive context compaction #21046 Opus 4.5 "Shadow Downgrade"