Amplifying — Coding Agent Intelligence

4 min read Original article ↗

Featured Study

Edwin Ong & Alex Vikati · feb-2026 · claude-code v2.1.39

What Claude Code Actually Chooses

We pointed Claude Code at real repos 2,430 times and watched what it chose. No tool names in any prompt. Open-ended questions only.

3 models · 4 project types · 20 tool categories · 85.3% extraction rate

NewWe benchmarked OpenAI Codex head-to-head against Claude Code — 1,470 responses across 12 new categories. Read the comparison →

The big finding: Claude Code builds, not buys. Custom/DIY is the most common single label extracted, appearing in 12 of 20 categories (though it spans categories while individual tools are category-specific). When asked “add feature flags,” it builds a config system with env vars and percentage-based rollout instead of recommending LaunchDarkly. When asked “add auth” in Python, it writes JWT + bcrypt from scratch. When it does pick a tool, it picks decisively: GitHub Actions 94%, Stripe 91%, shadcn/ui 90%.

2,430

Responses

3 models · 4 repos · 3 runs each

3

Models

Sonnet 4.5, Opus 4.5, Opus 4.6

20

Categories

CI/CD to Real-time

85.3%

Extraction Rate

2,073 parseable picks

90%

Model Agreement

18 of 20 within-ecosystem

Headline Findings

Build vs Buy

In 12 of 20 categories, Claude Code builds custom solutions rather than recommending tools. 252 total Custom/DIY picks, more than any individual tool. E.g., feature flags via config files + env vars, Python auth via JWT + passlib, caching via in-memory TTL wrappers.

Authentication (Python)100%

Authentication (overall)48%

The Default Stack

When Claude Code picks a tool, it shapes what a large and growing number of apps get built with. These are the tools it recommends by default:

Mostly JS-ecosystem. See report for per-ecosystem breakdowns.

6

ZustandStrong DefaultState Management

64.8%57/88 picks

7

SentryStrong DefaultObservability

63.1%101/160 picks

Against the Grain

Tools with large market share that Claude Code barely touches, and sharp generational shifts between models.

State Management

0 primary, but 23 mentions. Zustand picked 57x instead

API Layer

Absent entirely. Framework-native routing preferred

Testing

Only 4% primary, but 31 alt picks. Known but not chosen

Package Manager

1 primary, but 51 alt picks. Still well-known

The Recency Gradient

Newer models tend to pick newer tools. Within-ecosystem percentages shown. Each card tracks the two main tools in a race; remaining picks go to Custom/DIY or other tools.

Replaced by: Drizzle (21% → 100%)

Within JS ORM picks only

Replaced by: FastAPI BackgroundTasks (0% → 44%), rest Custom/DIY or non-extraction

Within Python job picks only (61% extraction rate). Custom/DIY = asyncio tasks, no external queue

Redis (caching)Python

Replaced by: Custom/DIY (0% → 50%), rest other tools

Within Python caching picks only

The Deployment Split

Deployment is fully stack-determined: Vercel for JS, Railway for Python. Traditional cloud providers got zero primary picks.

JS

Frontend (Next.js + React SPA)

86 of 86 frontend deployment picks. No runner-up.

PY

Backend (Python / FastAPI)

What you'd expect: AWS, GCP, Azure

What you get: Railway at 82%

Zero primary picks across all 112 deployment responses:

Never the primary choice, but some are frequently recommended as alternatives.

Frequently recommended as alternatives

Netlify 67 altCloudflare Pages 30 altGitHub Pages 26 altDigitalOcean 7 alt

Mentioned but never recommended (0 alt picks)

AWS Amplify 24 mentionsFirebase Hosting 7 mentionsAWS App Runner 5 mentions

Example: "Where should I deploy this?" (Next.js SaaS, Opus 4.5)

Vercel (Recommended) — Built by the creators of Next.js. Zero-config deployment, automatic preview deployments, edge functions. vercel deploy

Netlify — Great alternative with similar features. Good free tier.

AWS Amplify — Good if you're already in the AWS ecosystem.

Vercel gets install commands and reasoning. AWS Amplify gets a one-liner.

Truly invisible (rarely even mentioned)

AWS (EC2/ECS)Google CloudAzureHeroku

Where Models Disagree

All three models agree in 18 of 20 categories within each ecosystem. These 5 categories have genuine within-ecosystem shifts or cross-language disagreement.

CategorySonnet 4.5Opus 4.5Opus 4.6
ORM (JS)JSNext.js project. The strongest recency shift in the dataset.Prisma79%Drizzle60%Drizzle100%
Jobs (JS)JSNext.js project. BullMQ → Inngest shift in newest model.BullMQ50%BullMQ56%Inngest50%
Jobs (Python)PythonPython API project (61% extraction rate). Celery collapses in newer models.Celery100%FastAPI BgTasks38%FastAPI BgTasks44%
CachingCross-languageCross-language (Redis and Custom/DIY appear in both JS and Python)Redis71%Redis31%Custom/DIY32%
Real-timeCross-languageCross-language (SSE, Socket.IO, and Custom/DIY appear across stacks)SSE23%Custom/DIY19%Custom/DIY20%

For devtool companies

We run these benchmarks for individual companies too

Private dashboards showing how AI agents recommend your tool vs. competitors, across real codebases. See exactly where you win and where you lose.

Get your benchmark

Get notified when new benchmarks drop.

Dig into the data

Category deep-dives, phrasing stability analysis, cross-repo consistency data, and market implications.