Is Opus 4.7 a Downgrade?

Opus 4.7 is not generally a worse model than Opus 4.6, but there is a real downgrade: with Opus 4.7, the control over the thinking budget is now fully owned by Anthropic.

This change matters in a way that benchmarks do not measure.

With Opus 4.6, if a task was hard, you could tell Claude to spend many thinking tokens. You could decide that a refactor, architecture review, debugging session, or migration deserved maximum reasoning, and you could set the budget. If the result was still bad, you knew the model did the best it could, using the highest level of reasoning available.

With Opus 4.7, that control is gone. You can still choose an effort level, even max. But max is no longer a user-defined thinking budget. It is part of an adaptive system where Claude decides how much thinking to use.

That is the downgrade: no more deterministic access to the model's best reasoning. So if Anthropic doesn't have the compute cpacity to fulfil your request at highest reasoning, they are now free to just serve your request with less reasoning, resulting in a lower quality output, without even telling you.

Side note: If you're interested in this topic, read my paper on arxiv.org.

Opus 4.6 let the user force the issue

In Opus 4.6, fixed thinking budgets still work, even though Anthropic now deprecates them. You can set manual thinking with a numeric token budget:

{
  "thinking": {
    "type": "enabled",
    "budget_tokens": 32000
  }
}

That matters because it gives the user direct control. If you want the model to think more, you allocate more thinking tokens. If you want to compare two runs, you can hold the thinking budget constant. If you want maximum quality on a hard task, you can spend the budget.

Opus 4.6 did not always give perfect results, and a high thinking budget did not solve every problem. But it gave the user direct control over a critical variable: how much reasoning the model could spend before answering.

For Claude Code, this matters. Coding-agent work is often limited by reasoning depth. Planning a risky database migration, untangling a bad abstraction, reviewing a large diff, or debugging a subtle production issue are tasks where you want to force the model to think harder, not just hope it decides the task deserves it.

Opus 4.7 turns maximum thinking into a request

In Opus 4.7, Anthropic removed fixed thinking budgets. The Opus 4.7 migration docs say that thinking: {"type": "enabled", "budget_tokens": N} now returns a 400 error. The only supported mode is adaptive thinking:

{
  "thinking": {
    "type": "adaptive"
  },
  "output_config": {
    "effort": "high"
  }
}

Anthropic's adaptive thinking docs explain the change: instead of manually setting a thinking token budget, adaptive thinking lets Claude decide when and how much extended thinking to use based on request complexity. Opus 4.7 only supports adaptive thinking; manual budget_tokens are no longer accepted.

That is the key difference. Opus 4.7 does not give you the old "think this much" control. It gives you an effort setting that guides the adaptive system.

The Claude Code model docs confirm this from the CLI side. Effort levels control adaptive reasoning. Opus 4.7 supports low, medium, high, xhigh, and max, with xhigh as the default. But the docs also say adaptive reasoning lets the model decide whether and how much to think on each step, and fixed thinking-budget mode does not apply to Opus 4.7.

So even when you choose max, you are not setting a fixed number of thinking tokens. You are selecting the deepest available adaptive effort level. It may often think deeply, but it is not the same control Opus 4.6 exposed.

In Opus 4.6, the user could force the model into a fixed high-thinking mode. In Opus 4.7, the user can only request maximum effort within Anthropic's adaptive reasoning system.

Why this can make Opus 4.7 worse in practice

This is why the downgrade question is not straightforward. Opus 4.7 can be a better model and still produce worse results in a specific comparison.

Imagine giving the same hard coding task to Opus 4.6 and Opus 4.7. With Opus 4.6, you set a very high fixed thinking budget. With Opus 4.7, you set effort to max. Both seem to be asked to do their best, but in practice they are different controls.

Opus 4.6 gets a fixed reasoning budget. Opus 4.7 is asked to use the maximum adaptive effort level, but the model still decides how much thinking the task deserves.

Opus 4.7 can underperform in a real workflow even if it is more capable overall. If the adaptive system decides a step does not need as much reasoning as you would have forced with Opus 4.6, the result can be worse. If the model misjudges the task, the user has no way to override that with a numeric thinking budget. If the same prompt behaves differently across runs because adaptive reasoning allocates effort differently, quality becomes less predictable.

This point gets lost when people reduce the debate to benchmarks. The issue is not just model capability. It is whether the user can reliably extract maximum capability on demand.

With Opus 4.6, the answer was yes: set a high budget and pay for it. With Opus 4.7, the answer is weaker: set a high effort level and hope the adaptive system spends enough reasoning.

That is a real regression for power users.

Capacity constraints

This change comes at a time when Anthropic faces capacity constraints. That matters because reasoning tokens are expensive. When a provider removes direct user control over expensive reasoning and replaces it with provider-mediated adaptive allocation, users will reasonably ask whose interests the allocator serves.

Anthropic has acknowledged the infrastructure pressure. In its April 2026 Amazon compute announcement, it said enterprise and developer demand accelerated in 2026, consumer usage rose sharply across Free, Pro, and Max, and that growth strained infrastructure. It also said consumer growth affected reliability and performance for Free, Pro, Max, and Team users during peak hours.

The Claude status page has shown repeated recent incidents across claude.ai, the API, and Claude Code. On May 8, 2026, 90-day uptime was below what many developers expect from a core production dependency: 98.7% for claude.ai, 99.03% for the API, and 99.2% for Claude Code.

Anthropic also changed usage limits. It ran a March 2026 off-peak usage promotion, then on May 6 announced higher Claude Code limits and a SpaceX compute deal, including removal of the peak-hours limit reduction on Claude Code for Pro and Max users.

None of this proves Anthropic is secretly throttling reasoning quality. Anthropic stated in a September 2025 postmortem that it does not reduce model quality because of demand, time of day, or server load.

But the incentive problem is clear. Anthropic is capacity constrained. Thinking is expensive. Adaptive thinking gives Anthropic more control over how much thinking happens. This does not require a conspiracy theory. It is what the product now does.

If a user sets max in Opus 4.7, they are not saying, "spend exactly this much reasoning." They are saying, "please use your maximum adaptive effort." The final amount of thinking remains within Anthropic's system.

That is a materially weaker guarantee.

Why `max` is not enough

Some will say Opus 4.7 still has max effort, so there is no problem. That misses the point.

max is not a fixed token budget. It is not the old MAX_THINKING_TOKENS behavior. It does not let the user say, "use 32,000 thinking tokens for this task" or "use 64,000 thinking tokens for this task." It selects the deepest adaptive mode the model has.

That can be useful and often the right setting. But it still leaves the actual reasoning allocation to Claude and Anthropic's adaptive implementation.

For many users, that distinction will not matter. If they ask normal questions and get good answers, adaptive thinking is probably better. It removes configuration, saves tokens, and lets the model decide how much work to do.

For serious Claude Code users, it matters greatly. They need more than a good average answer. They need repeatability. They need to know whether a bad result came from the model, the prompt, the context, or insufficient reasoning. They need a way to force maximum effort on tasks where failure is costly.

Opus 4.7 gives them a slider. Opus 4.6 gave them a budget.

Those are not the same.

So is Opus 4.7 a downgrade?

Yes, in this specific way: Opus 4.7 downgrades user control over maximum reasoning.

It may improve the model's average capability. It may improve long-horizon agentic tasks. It may reduce wasted thinking on easy prompts. It may be the better default for many users. But for workflows where the user wants to force the model to spend as much reasoning as possible, Opus 4.7 is worse than Opus 4.6.

The problem is not that Opus 4.7 cannot think deeply. The problem is that the user can no longer make thinking depth deterministic in the same way.

That changes the quality ceiling in practice, not because the model is necessarily weaker, but because access to its best reasoning is now mediated by an adaptive system the user cannot fully control.

What I'm building

Delegate tasks. Get software.

Give Vroni a GitHub issue, bug report, spec, or rough idea. It reads the repo, plans the change, writes code, runs checks, and works toward a review-ready pull request.

Take a look at vroni.com

I respect your privacy. Unsubscribe at any time.