Amplifying — Coding Agent Intelligence

7 min read Original article ↗

Research

Edwin Ong & Alex Vikati · mar-2026

What Codex Actually Chooses
(vs Claude Code)

We gave two flagship AI coding agents the same prompts across the same repos — 1,470 successful responses, yielding 1,452 analyzable tool picks. How does your AI coding agent shape the stack you build?

12 categories · 5 repos · 3 runs each

Claude Code v2.1.78 running Opus 4.6 · OpenAI Codex CLI 0.114.0 running GPT-5.3

The big finding: 7/12 categories agree on the top pick — 6 of 7 on Custom/DIY. The one exception: both pick Grafana for log aggregation.

Key signals: Statsig (27% Codex vs 0% Claude), Bun gap (63% Claude vs 13% Codex), plus divergent platform leanings: Codex favors Cloudflare-branded tools, Claude favors Vercel.

1,470

Total Responses

735 + 735

2

Agents

Codex CLI 0.114.0 / GPT-5.3

Claude Code v2.1.78 / Opus 4.6

7/12

Agreement

6 of 7 on Custom/DIY

1,452

Analyzable Picks

Codex 729 / Claude 723

These 12 categories are intentionally different from our original 20-category study. The original focused on full-stack infrastructure (CI/CD, payments, auth, ORM). This comparison targets categories where tool choice is more contested — areas like search, secrets, rate limiting, and edge compute where both agents have diverse opinions and the winner isn't obvious.

Repos Used

nextjs-saas

Next.js 14, TypeScript

python-api

FastAPI, Python 3.11

react-spa

Vite, React 18, TS

go-microservice

Go 1.22, Chi

ruby-rails-app

Rails 7, Ruby 3.3

The repo a prompt runs against shapes the recommendation. A Next.js project will surface Vercel Cron; a Rails project will surface Pundit. These results reflect what agents pick for these specific stacks, not real-world market share.

Head-to-Head: 12 Categories

Same prompts, same repos. The top pick each agent chose per category.

Agree on top pickDifferent top pick

Headline Findings

The Divergent Stack

5 categories where they disagree

Search, image/media, secrets, and scheduled tasks are where the default recommendation changes most clearly by agent.

JS Runtime & ToolchainNode.jsvsBun

SearchCustom/DIYvsPostgreSQL FTS

SMS & Push NotificationsCustom/DIYvsTwilio

Scheduled Tasks / Croncron (OS)vsAPScheduler / Vercel Cron

Edge & Serverless ComputeCloudflare WorkersvsVercel Edge

The Ownership Question

Statsig: Codex 27% vs Claude 0% · Bun: Claude 63% vs Codex 13%

The acquired-tool gaps are clear in this benchmark: Codex recommends Statsig while Claude does not, and Claude recommends Bun far more often than Codex.

Correlation, not causation: These gaps show alignment between an agent and its parent company's acquired tools — but the causation arrow could point the other way. Bun and Statsig may have been acquisition targets precisely because they were best-in-class products, and the agents are simply reflecting that quality. We show the pattern because it's notable; we don't claim it's intentional.

Statsig primary pick rate

Platform Preferences

Cloudflare vs Vercel

In selected Cloudflare/Vercel brand-family counts, Codex leans toward Cloudflare while Claude leans toward Vercel.

Codex → Cloudflare Workers

47Cloudflare picks across study

Claude: 9 picks

Claude → Vercel Edge

29Vercel picks across study

Codex: 17 picks

The Ownership Question

Statsig and Bun are the clearest company-linked tools in the dataset. The data shows pick-rate gaps and conversion gaps; it does not identify the cause.

Statsig

OpenAI acquisition · Feature Flags

Ownership signal

AgentPrimaryMentionedResponses
Codex27%(20)41%(31)75
Claude Code0%(0)28%(21)75

Codex picks Statsig as primary 27% of the time. Opus picks it zero times out of 75 responses — but mentions it 28% of the time, so the gap is not just a simple awareness gap.

Bun

Anthropic acquisition · JS Runtime

Ownership signal

AgentPrimaryMentionedResponses
Codex13%(4)73%(22)30
Claude Code63%(19)97%(29)30

Claude recommends Bun at 63% — ~5× Codex's 13%. This is the largest acquired-tool gap in the study.

Both Agents Know These Tools Exist

These acquired-tool gaps are not just about awareness. Both agents mention the other company's tool; the difference is how often that mention becomes the primary recommendation.

ToolAgentMention %Primary %Conversion
StatsigCodex41%27%64.5%
Claude28%0%0%
BunClaude97%63%65.5%
Codex73%13%18.2%

Claude mentions Statsig in 28% of feature flag responses but never recommends it as primary. Codex lists Bun as an option in 73% of JS runtime responses but rarely promotes it to #1. The safest conclusion is descriptive: conversion differs much more than awareness does.

Platform Preferences: Cloudflare vs Vercel

Beyond acquired tools, each agent leans toward a different cloud platform when recommending infrastructure. These are selected brand-family counts, not a full platform market share — but the directional preference is consistent across categories.

Codex → Cloudflare (47 picks across categories)

Edge/Serverless — Cloudflare Workers

Image & Media — Cloudflare Images

Claude → Vercel (29 picks across categories)

Edge/Serverless — Vercel Edge

Scheduled Tasks — Vercel Cron

Codex picks Cloudflare-branded tools 47 times across the study; Claude picks them 9 times. Claude picks Vercel-branded tools 29 times; Codex picks them 17 times. These are selected brand-family sums — not a complete platform accounting — but the directional lean is consistent across the categories where both brands appear.

Selected Codex-Leaning Checks

Acquired tool plus selected cloud-service rows

In this selected set, all four rows lean toward Codex. Statsig is the clearest company-linked example; the cloud rows are descriptive patterns rather than ownership claims.

Selected Claude-Leaning Checks

Acquired tool, web-ecosystem rows, and open-source controls

2 of 7 rows clear the 10-point threshold for Claude alignment: Bun (+50pp) and Vercel Edge (+17pp). The two open-source controls (PostgreSQL FTS, Meilisearch) are excluded from alignment labeling because they have no corporate tie. The remaining rows are neutral.

OpenAI announced plans to acquire Astral (makers of Ruff and uv) on March 19, 2026. We ran a dedicated Python tooling benchmark to measure pick-rate gaps for those tools — read the full Astral analysis.

All 12 Categories

Expand any category to see the full side-by-side breakdown with every tool both agents considered.

Up-and-Comers Worth Watching

Beyond category winners, several startup tools appear meaningfully in recommendations. Some show up in both agents; others are championed by only one. Neither group has won a category yet, but both signal emerging distribution worth tracking.

Cross-Agent Picks

DopplerSecret Management

Strongest startup signal — near-identical rates from both agents

UpstashRate Limiting

Quiet but consistent serverless Redis alternative

MeilisearchSearch

Modern search engine — Claude's preferred startup pick

AxiomLog Aggregation

Modern logging challenger both agents notice

Agent-Split Picks

TypesenseCodex

Codex's search startup pick — mirrors Claude's Meilisearch

OneSignalCodex

Codex's notification startup default

Fly.ioClaude

Claude's app platform preference for edge compute

StoryblokCodex

Codex's CMS pick when it doesn't build from scratch

UnleashClaude

Claude's open-source feature flag pick

InfisicalCodex

Codex's emerging open-source secrets pick

Always in the Conversation

These established tools earn consistent recommendations from both agents but never land the #1 spot in their category.

HashiCorp VaultSecret Management

3pts behind winner — both agents know it, neither leads with it

RedisRate Limiting

Near-identical rates from both agents as a runner-up

ContentfulHeadless CMS

Legacy CMS leader, consistently second to Sanity

PunditRBAC

Ruby-native authorization — strong in Rails, absent elsewhere

Firebase Cloud MessagingSMS & Push

Both agents mention FCM but lead with Twilio or OneSignal

AlgoliaSearch

Codex-only runner-up — Claude never picks it as primary

Search split: Meilisearch vs Typesense is another agent-split pick — Claude favors Meilisearch (19%), Codex favors Typesense (19%). Doppler is the strongest cross-agent startup signal at ~20% from both agents.

Build vs Buy

Custom/DIY rate by category, sorted by absolute delta. Overall rates are similar (Claude 33% vs Codex 28%), but category-level variance is high.

28%Codex overall DIY

33%Claude overall DIY

CategoryCodex Custom/DIYClaude Custom/DIYDelta
RBAC / Authorization

55%

81%

-26pp
Log Aggregation

0%

17%

-17pp
SMS & Push Notifications

27%

16%

+11pp
Edge & Serverless Compute

24%

13%

+11pp
Headless CMS

24%

33%

-9pp
Image & Media Processing

27%

35%

-8pp
Secret Management

31%

36%

-5pp
Search

31%

35%

-4pp
Scheduled Tasks / Cron

12%

15%

-3pp
Feature Flags & Experimentation

40%

41%

-1pp
Rate Limiting

32%

33%

-1pp

Positive delta means Codex builds custom more often. Negative means Claude does. Categories with 0% on both sides are excluded.

Methodology

How we ran the comparison: same prompts, same repos, independent agents, structured extraction.

Agents

Claude CodeOpus 4.6, v2.1.78

OpenAI CodexGPT-5.3, codex-cli 0.114.0

Study Design

  • 12 categories, 5 prompts each
  • 5 repos (4 stacks + Rails)
  • 3 independent runs per combo
  • Structured tool extraction

Scale

  • 1,470 total responses
  • ~735 per agent
  • Git-reset between prompts
  • Worktree isolation per run

Repos Used

nextjs-saas

Next.js 14, TypeScript

Full-stack SaaS

python-api

FastAPI, Python 3.11

Data processing API

react-spa

Vite, React 18, TS

Client-side SPA

go-microservice

Go 1.22, Chi

Payment microservice

ruby-rails-app

Rails 7, Ruby 3.3

Team collaboration

For devtool companies

We run these benchmarks for individual companies too

Private dashboards showing how AI agents recommend your tool vs. competitors, across real codebases. See exactly where you win and where you lose.

Get your benchmark

Get notified when new benchmarks drop.

Explore the original study

This comparison builds on our original 2,430-response Claude Code study across 20 categories and 3 models. Dive into the full dataset.