Your AI Writes Code in Seconds. Why Do Your Tests Take Minutes?

5 min read Original article ↗

The bottleneck used to be writing code. You think, you type, you think some more. Whether your tests take 10 seconds or 10 minutes, it doesn't make that big a difference. You're the slow part.

AI agents flip this. The code appears in seconds. Now the agent is in a loop: write, test, read results, adjust, repeat. Your test speed becomes your development speed. If every test takes 5 seconds and you have a hundred of them, that's too slow. Your tests need to be fast and they need to be reliable.

This is the problem with single-page applications. When your UI is rendered by JavaScript, the only way to test it is to run that JavaScript. That means a real browser. Puppeteer, Playwright, spinning up Chrome. Every test has to boot a browser, navigate to a page, wait for scripts to load, wait for rendering to finish. Each one takes seconds. Multiply that across your test suite and you're waiting minutes.

And browser tests aren't just slow. They're fragile. Timeouts, race conditions, elements that haven't rendered yet. These aren't bugs in your code, they're artifacts of testing through a complex runtime.

Server-rendered HTML sidesteps all of this. When the server does the rendering, the entire application lives on the backend. Which means you can test it there too. Not just individual requests, but complete user flows. Click this link, see this page, fill in this form, submit it, verify what comes back. The whole sequence runs in one process, in one language, with no browser involved.

This is different from the usual backend test where you set up some state, make a request, and check the response. Those tests are brittle because you're mocking the state the user would have been in. Maybe the request you're testing isn't even the request the real UI generates. Your test passes but your app is broken.

Instead, a fake browser can simulate the actual mechanics: clicking a link changes the page, submitting a form sends a POST, the response updates what the user sees. You flow through the app the same way a real user would. The test reflects reality because it follows the same path. And because all of this is just function calls and string matching, each test runs in milliseconds. The whole suite finishes in seconds.

Here is a test from Stumpy that verifies the entire agent creation flow:

it("creates an agent from the homepage", async () => {
  await FakeBrowser.loginAsTestUser()
    .visit("/")
    .see("Stop doing everything yourself")
    .click("Create your first agent")
    .see("Create an agent")
    .clickAndType("Agent name", "My Test Agent")
    .click("Create Agent")
    .see("My Test Agent")
    .see("Chat")
    .done();
});

No headless browser. No waiting for hydration. The test logs in, visits the homepage, clicks through to agent creation, fills out the form, and verifies you land in your new agent's chat. It reads like a description of what a user would do, because that's exactly what it is.

There's something counterintuitive here. Server-rendered HTML feels like a step backward. It's what we built websites with before React and SPAs showed us a better way. But the "better" way optimized for developer experience at the cost of machine-readability. We made our UIs harder for computers to understand just as we started asking computers to build them.

And server-rendered doesn't mean clunky. Stumpy uses htmx for partial page updates, transitions, and interactive elements. The app feels like a modern SPA. But under the hood it's all HTML responses from the server. That just means FakeBrowser needs to handle the htmx interactions the same way it handles everything else. So we do that. You get the modern UX without giving up the testing advantage.

The natural objection is that React is better represented in training data. Agents know React. They can spin up a React app fast. And that's true. If all you want is a prototype with no tests, React and an agent will get you there quickly.

But prototype velocity and sustained velocity are different things. Sustained velocity comes from tests. Without tests, you hit a wall where you're creating bugs as fast as you're fixing them. That's the ceiling everyone runs into with AI-generated code that has no test coverage.

So you realize you need tests. But you built it with React, and now your tests are slow. You're stuck.

If you start with server-rendered HTML, you don't lose much at the beginning. Modern models have no trouble building with htmx. You have velocity on day one. But you also have it on day one hundred, because the architecture is testable from the start. Fast tests mean sustained speed, and sustained speed is how you get past the prototype and build something real.

It all points in the same direction. If you're building a web app with AI agents, you probably shouldn't be using React.