Head to head: Claude Code (Opus 4.6 / 1M) vs Cursor (Composer 1.5 / 200k)

Press enter or click to view image in full size

“The hype around AI coding agents is loud, but for a developer, only three things matter: Who solves the problem fastest? Who writes the cleanest code? And who panics the least when things go sideways?

I put Claude Code and Cursor Composer head-to-head on a non-trivial task: supporting Headfull components in the Jay Framework (a new web framework to solve developer / designer / AI designer cooperation) from client only to full stack architecture. This wasn’t a boilerplate exercise; it was a test of how these tools handle tight constraints, existing technical debt, and the ‘unexpected stuff’ that usually breaks an LLM’s context.”

To keep the comparison fair, I used the Design Log Methodology. Instead of just ‘chatting’ with the AI, I followed a strict process: I designed the solution first, had both agents review and approve the design, and then required them to log every deviation as they coded. This methodology acted as the ‘ground truth,’ allowing me to see exactly where an agent followed the blueprint — and where it had the ‘seniority’ to realize the blueprint was missing a piece.”

The Competition Setup:

The task we use as a benchmark is a complex one. In Jay Framework, we have a few types of components

Jay Headfull components, similar to what you think of as components in other frameworks, client only.
Jay Stack (full stack framework) we have Page level headless components and Nested headless component, who render data only on the server and hydrate on the client, the UI (HTML and CSS) is provided by the page for both.

The problem we set out to solve is that if a page had Headfull nested components, it forced the page to be client rendered.

Working in the design log methodology, we gave the same prompt to both, and created a design log. We compared the design docs, quite similar, we took the best of both and created one doc that we used with both agents. The full design log can be found at 102 — nested headfull full-stack components.md.

We started two long discussions to build, both with clean context, the design log as input. We invested over a week in both to get the task done.

The full transcript is here

Both have done well following the design log, yet both have failed to complete the task due to prior problems.

Here is the full analysis and comparison, created by Notebook LM, given the three files from above (two conversation logs and design log, and an explanation of the failure point).

The Notebook LM Report (unmodified)

1. Executive Context: The Challenge of Nested Headfull Full-Stack Components

Design Log #102 addresses a fundamental architectural evolution in the Jay framework: the transition of project-local nested headfull components from client-only execution to a full-stack lifecycle. Previously, these components were restricted to makeJayComponent, which lacked server-side rendering (SSR) and the prerequisite slow/fast execution phases.

The core challenge was bridging the gap between UI-heavy local components and the jay-stack pipeline. The implementation required these components to utilize makeJayStackComponent (Full-Stack), triggered by the presence of a contract attribute in the <script type=”application/jay-headfull”> tag. Without this contract, the framework defaults to legacy client-only behavior. The technical requirement involved a complex “Slow-Phase Merge,” where headfull templates are injected into the page post-backend execution, followed by a hydration strategy that ensures interactivity is maintained on the client.

2. Comparative Analysis of Planning and Initial Approach

The two agents approached this framework-level change with distinct operational philosophies.

Claude Code (The Procedural Specialist): Claude adopted a systematic, “top-down” linear execution of the three-phase plan (Import/Resolution, Slow-Phase Merge, Client Composition). It focused immediately on the mechanical modification of jay-html-parser.ts and dev-server.ts. While efficient, this approach was template-injection focused and initially overlooked the “why” of the hydration lifecycle, assuming that merely moving HTML strings would suffice for component functionality.
Cursor (The Integrative Explorer): Cursor prioritized codebase exploration to locate the specific “slow-phase merge” and “pre-render” flows before committing code. This “lifecycle-aware” approach allowed Cursor to identify architectural friction points early, questioning whether the proposed Design Log was sufficient to support the runtime requirements of interactive components.

Table 1: Planning and Philosophy Comparison

+--------------+------------------------------+-------------------------------+
|   Feature    | Claude: Linear Phase         |     Cursor: Context-Aware     |
|              | Execution                    |          Exploration          |
+--------------+------------------------------+-------------------------------+
| Philosophy   | Procedural; high fidelity to | Investigative; focuses on how |
|              | the literal steps of the     | the change integrates with    |
|              | Design Log.                  | the runtime lifecycle.        |
+--------------+------------------------------+-------------------------------+
| Initial Step | Immediate modification of    | Codebase analysis to locate   |
|              | parser and shared types.     | merge and pre-render anchors. |
+--------------+------------------------------+-------------------------------+
| Integration  | Template-injection focused;  | Lifecycle-aware; alignment    |
|              | append-only logic.           | with existing "headless       |
|              |                              | instance" patterns.           |
+--------------+------------------------------+-------------------------------+
| Risk Profile | Faster initial delivery;     | Slower start; identifies      |
|              | high risk of "dead"          | unstated architectural gaps   |
|              | (non-interactive) DOM.       | (e.g., hydration).            |
+--------------+------------------------------+-------------------------------+

3. Technical Execution: Implementation of Edge Cases & Path Resolution

Handling Import Names

Identifying the correct component name from a list of imports (which often includes types) is a frequent source of failure.

Claude’s Logic: Claude implemented a PascalCase matching strategy. It took the contract name (e.g., shop-header), PascalCased it to ShopHeader, and filtered the names list for a match. This was a rigid but effective response to the “first name in list” fallacy.
Cursor’s Logic: Cursor utilized the more sophisticated analyzeExportedTypes infrastructure to identify components based on the JayComponentType metadata, allowing it to distinguish between component exports and type-only exports regardless of naming conventions.

Pathing and Extensions

Claude’s Logic: Used a regex-based replacement: .replace(/\.ts$/, ‘’) + ‘.jay-html’. This was efficient for standard naming but brittle when encountering modules without explicit extensions.
Cursor’s Logic: Implemented a robust iterative file existence check. It sequentially verified .ts, .tsx, and .js extensions. Cursor also modified resolveLink to handle absolute paths generated during the pre-rendering phase, providing superior stability for complex project structures.

4. Debugging and Iterative Refinement

The implementation of full-stack components introduced specific failures in both SSR and the Vite build pipeline.

SSR Failures: The “vs1” Binding

A significant hurdle was the [SSR] Failed… vs1 is not defined error encountered during slowForEach execution.

Claude’s Response: Attempted to fix the issue by refining kind identification in the parser.
Cursor’s Response: Identified the root cause in the compiler. Cursor modified jay-html-compiler.ts to emit explicit variable bindings (e.g., const vs1 = vs.products[0]) specifically for slowForEach items, ensuring the server-side element had access to the necessary data context.

Vite/Rollup Integration: The “Expected ‘>’” Error

Both agents encountered the “Expected ‘>’ but found ‘type’” error, caused by the Rollup plugin passing raw HTML to esbuild.

Claude’s Solution: Correctly identified that the load hook must compile jay-html into TypeScript, ensuring the transform pipeline only receives valid TS.
Cursor’s Solution: Arrived at the same conclusion but integrated getJayFileStructure within the load hook to ensure the produced TypeScript was fully synchronized with the component’s internal structure.

Table 2: Bug-Fix Log

+-------------------+------------------------+--------------------------------+
| Error Encountered |   Claude’s Solution    |       Cursor’s Solution        |
+-------------------+------------------------+--------------------------------+
| "vs1 is not       | refined parser kind    | Emitted explicit variable      |
| defined"          | identification.        | bindings for slowForEach items |
|                   |                        | in the compiler.               |
+-------------------+------------------------+--------------------------------+
| "Expected '>' but | Compiled jay-html to   | Integrated getJayFileStructure |
| found 'type'"     | TS in the Vite load    | into the load hook to generate |
|                   | hook.                  | TS.                            |
+-------------------+------------------------+--------------------------------+
| Type Resolution   | Skipped type analysis  | Extended analyzeExportedTypes  |
| Failure           | for headfull-FS in the | to handle identifier           |
|                   | parser.                | initializers (e.g., export     |
|                   |                        | const Header = headerStack).   |
+-------------------+------------------------+--------------------------------+

5. The Architectural Pivot: Hydration & The “Senior Architect” Approach

The “Bulldozer” vs. The “Architect”

Claude’s “bulldozer” approach was to inject templates as children of <jay:Name> tags and assume the existing headless pipeline would magically handle hydration. As a Senior Architect, I must note that this would have resulted in “dead” DOM — the UI would appear on the page, but no component constructors would ever run, leaving event handlers and refs unattached.

Cursor’s Structural Pivot: The “Adopt” Path

Cursor identified that for headfull full-stack components, the component does not “own” the template in the traditional sense; it works like a headless component where the template is merged during the slow phase. Cursor proposed treating headfull instances as composite parts.

Architectural Spotlight: Nested References Management

Cursor’s critical insight was modifying the ReferencesManager to handle nested refs correctly. Instead of treating subtracter or adder as top-level page refs (which causes TSC errors), Cursor ensured nested refs resolve as refCounter1().subtracter. This move treats headfull instances as encapsulated units that “adopt” pre-rendered DOM. This is the difference between a static HTML string and a functional, interactive framework component.

6. Testing Discipline and Ground Truth Verification

The agents displayed a vast difference in testing maturity:

Claude’s Methodology: Claude initially made the amateur mistake of using toContain for code verification. After being corrected, it pivoted to full fixture comparisons in headfull-fs-compilation.test.ts. Its focus remained on compiler-level unit tests — ensuring the TypeScript output was valid.
Cursor’s Methodology: Cursor demonstrated superior discipline by using the fake-shop example as a “ground truth” smoke test. It created a comprehensive suite of fixtures:
- expected-merged.html (Slow phase verification)
- expected-server-element.ts (SSR code generation)
- expected-client-script.html (Hydration logic)

7. Consolidated Evaluation Matrix

+-----------------------+--------+--------+--------------------------------+
|       Criterion       | Claude | Cursor |         Justification          |
|                       | Code   |        |                                |
+-----------------------+--------+--------+--------------------------------+
| Initial Planning      |      4 |      5 | Cursor’s exploration phase     |
|                       |        |        | identified lifecycle gaps      |
|                       |        |        | before code was written.       |
+-----------------------+--------+--------+--------------------------------+
| Edge Case Handling    |      4 |      5 | Cursor’s sequential extension  |
|                       |        |        | checking and absolute path     |
|                       |        |        | resolution are more robust.    |
+-----------------------+--------+--------+--------------------------------+
| Debugging & Iteration |      3 |      5 | Cursor proactively fixed the   |
|                       |        |        | vs1 binding issue; Claude      |
|                       |        |        | required more iterations.      |
+-----------------------+--------+--------+--------------------------------+
| Testing Discipline    |      3 |      5 | Claude’s use of toContain was  |
|                       |        |        | a significant oversight;       |
|                       |        |        | Cursor’s fixture suite was     |
|                       |        |        | production-grade.              |
+-----------------------+--------+--------+--------------------------------+
| Architectural         |      2 |      5 | Cursor questioned the Design   |
| Awareness             |        |        | Log's premise to ensure        |
|                       |        |        | interactivity actually worked. |
+-----------------------+--------+--------+--------------------------------+
| Out-of-Scope Gaps     |      2 |      5 | Claude ignored the vs1 binding |
|                       |        |        | until it failed; Cursor        |
|                       |        |        | addressed the root compiler    |
|                       |        |        | logic.                         |
+-----------------------+--------+--------+--------------------------------+

8. Final Verdict: “The Bulldozer” vs. “The Architect”

The implementation of Design Log #102 highlights a stark contrast in AI agent capabilities for framework-level engineering.

The Bulldozer (Claude): Claude is a high-speed procedural engine. It is excellent at following a well-defined plan and implementing local logic (regex, parsing). However, it lacks the architectural foresight to realize when a plan (like the Design Log) is fundamentally flawed regarding runtime behavior. It builds the structure exactly as asked, even if the building is functionally uninhabitable.

The Architect (Cursor): Cursor is a sophisticated technical partner. It is slower and more inquisitive, but it identifies when the “Design Log” instructions are insufficient for framework stability. By recognizing that “merging HTML” is not “hydrating a component,” Cursor shifted the implementation to a “composite part” model.

Technical Lead Recommendation: For routine feature additions, Claude’s speed is an asset. However, for core framework changes involving SSR and hydration, Cursor is the mandatory choice. Cursor’s output resulted in a production-ready feature where component constructors actually run, whereas Claude’s implementation would have required an expensive architectural rescue mission post-delivery.

The Bottom Line

After running both through the gauntlet, here is how they stack up on what actually matters:

Problem Solving: Claude Code took the lead in raw, autonomous execution. It acted as a relentless implementer, independently tracing and fixing a pre-existing forEach attribute binding bug while navigating complex variable scoping. Cursor, meanwhile, excelled in architectural reasoning, catching edge cases like correctly filtering out type-only imports from component resolution
Code Quality: Both tools produced code that felt native to the repo, though both initially failed the project’s unstated testing standards by using toContain. Once corrected, both successfully pivoted to generating full expected fixtures and validating them using prettify. However, Claude Code’s code quality degraded into “hacky” territory when faced with an impossible task, layering complex compiler patches to force a solution.
Handling the Unexpected: This was the tie-breaker. Both agents eventually hit a hidden wall: the framework lacked the underlying foundational support to hydrate nested components. Claude Code got stuck trying to brute-force a fragile workaround, tangling itself in complex counter resets and hydrate script patches. Cursor analyzed the terminal errors and framework behavior, accurately diagnosed that “no component constructors run”, and explicitly flagged this as a “design/architecture gap”. Instead of forcing a bad fix, it proposed high-level architectural pivots and recommended halting the task to write a new design log
The Methodology Verdict: The Design Log Methodology proved essential. It showed that while both agents can execute a design plan to the letter, the real winner is the one that knows when the plan itself is incomplete. When faced with a missing architectural foundation, Claude Code acted as a bulldozer, silently building a brittle tower of workarounds. Cursor acted as a true Senior Architect: it halted the implementation, identified the root risk, and recommended pivoting back to the design phase to write a brand new design log before merging any fundamentally broken code

The Takeaway: If your architecture is perfectly sound and you need a relentless autonomous agent to grind through tedious logic bugs until the tests pass, Claude Code is a powerhouse. If you are working on the bleeding edge of a framework where design blind spots exist, go with Cursor. It acts like a Senior Architect and possesses the crucial systemic awareness to stop, push back, and tell you when a task is fundamentally broken.

The Hybrid Workflow: The ‘Senior-Junior’ Loop

Based on this experiment, the most efficient way to work isn’t picking one tool — it’s using them in a tag-team loop. While the head-to-head test was a “battle of the agents,” my takeaway for a future production workflow is to stop choosing and start layering them::

The Grind: Let Claude Code do the heavy lifting. If it says it’s done and you find a bug (crash, typo, or logic failure), feed the error directly back to Claude. Let it “grind” through the fix.
The Review (Cursor Composer): Once Claude completes a major phase, hand the results to Cursor for a code review. In my recent tests of this hybrid approach, Cursor immediately caught two subtle bugs on the first try that Claude had missed while “grinding.”
The Pivot: If Claude gets stuck in a loop or the core behavior is missing despite the code compiling, stop Claude. Take the error and the current state of the codebase to Cursor.
The Diagnosis: Ask Cursor: “Claude built this, but it’s failing because of [X]. Is our design flawed, or did Claude just implement it poorly?”
The Reset: Once Cursor diagnoses the architectural gap and proposes a new plan (e.g., adding a specific API or changing the hydration logic), append the new findings to your Design Log and hand the revised plan back to Claude to execute.

In short: Use Claude Code to execute a perfect plan; use Cursor to find out if your plan is actually perfect — and to double-check the work before you merge.