llms.txt Isn't Enough | DocsAlot Blog

TL;DR

Someone pastes a docs link into Claude Code, cursor or even chatGPT. What happens next?

The agent fetches the page. It gets HTML—divs, React hydration markers, navigation sidebars, cookie consent banners, and somewhere in there, the actual content. The agent parses it. Works fine. But you just burned 14,000 tokens on a page that's 500 tokens of useful information.

llms.txt doesn't fix this.

llms.txt is a discovery mechanism. It tells agents what pages exist. But most AI interactions don't start with "give me an index of all docs." They start with a URL. And when agents fetch that URL, they get HTML.

Here's the token cost for a typical docs page:

Format	Size	Tokens
HTML	58KB	~14,500
Markdown	2.1KB	~525

27x difference. At scale, this is the gap between AI integrations being viable or not.

Content Negotiation

Use basic headers.

Server sees Accept: text/markdown, returns markdown. Same URL, different representation. Browsers send Accept: text/html, they get HTML. AI agents send Accept: text/markdown, they get markdown.

No special URLs. No .md suffix. Standard HTTP.

Implementation

Middleware checks the Accept header:

If the request wants markdown, serve markdown. If not, HTML.

Gotchas

HEAD requests. Agents send HEAD to check headers before downloading:

Next.js middleware rewrites don't preserve query params. Pass data via headers:

Cache headers matter. AI agents respect caching:

Discovery Headers

Content negotiation needs discovery. Add these to every response:

Agents can HEAD any page and find your llms.txt without downloading content.

llms.txt Still Matters

llms.txt isn't useless—it's just not the whole picture. We wrote a full guide on making your docs AI-readable a few weeks ago. The short version:

Good for agents that need to explore. llms-full.txt concatenates everything for agents that want the whole picture.

But neither helps when someone pastes a single URL into an AI chat. Content negotiation does.

Test Your Docs

Most sites fail all three.

The Point

The AI-readable web is being rebuilt. We obsess over SEO for Google. We optimize for crawlers that haven't changed in 20 years. But the new crawlers—the ones that answer questions and write code—are getting HTML soup.

llms.txt helps agents find pages. Content negotiation makes reading them 27x cheaper.

We rolled this out across all Docsalot sites this week—/llms.txt, /llms-full.txt, discovery headers, content negotiation. If you're curious, try curl -H "Accept: text/markdown" against any page on solid-docs.docsalot.dev.

The llms.txt spec is at llmstxt.org. It's short. Read it.