Cloudflare Enters the Robots.txt Fray with a “Content Signals Policy” for AI bots

Cloudflare just introduced a new way to tell bots what they can do with your content after they fetch it. The Content Signals Policy adds a short, human-readable policy block to robots.txt, plus a simple, machine-readable line that relays your prefs for search and AI training. For SEOs and publishers living with AI overviews and model scraping, this is a practical control you can ship today.

First, the scope: this policy does not change any crawl access rules. This sits alongside your existing User-agent, Allow, and Disallow yada yada yada directives. The comments define 3 use cases, and the new bot signal makes your preference explicit.

content-signal: search=yes,ai-train=no

Cloudflare defines three uses of fetched content:

search for building indexes and listing results,
ai-input for feeding live AI answers such as RAG or grounding, and
ai-train for model training or fine-tuning

These are deliberately narrow so bots can comply without guesswork.

What the new block looks like

Cloudflare recommends placing a commented section in robots.txt that explains the meaning of each signal. Here is the essence of that new block:

# As a condition of accessing this website, you agree to these content signals:
# (a) content-signal=yes ? allowed for that use
# (b) content-signal=no ? not allowed for that use
# Signals:
# search ? build a search index and show results (not AI summaries)
# ai-input ? use content as live input for AI answers
# ai-train ? train or fine-tune AI models

This comment text is human-readable. Crawlers ignore it, but it documents your intent.

The bot-readable line you use to actually enforce in policy

Your preference lives in a single comma-delimited line. A typical publisher stance today is “yes to search, no to training,” with no statement about live AI input:

User-agent: *
Content-Signal: search=yes, ai-train=no
Allow: /

If you omit a signal, you are not granting or restricting via this mechanism for that use. You can later add ai-input=yes|no once you decide.

What Cloudflare is doing by default

If you use Cloudflare’s managed robots.txt on 3.8 million+ domains, Cloudflare will insert the policy comments and set Content-Signal: search=yes, ai-train=no on your behalf. Free zones with no existing robots.txt will at least receive the commented policy text, not any allow or deny directives or signals. You can disable this in the dashboard.

Compliance, limits, and the bigger picture

This is a signal, not a scraper blocker. Some bots will ignore it. Cloudflare suggests pairing signals with WAF rules and Bot Management to enforce your stance. They released the policy under CC0 so any platform can adopt it, and they are pushing for broader standards support.

Why SEOs should care

The referral economy changed. Cloudflare’s framing is blunt: scraped content can now compete with its source, and bot traffic is projected to outpace human traffic later this decade. Clear, machine-readable rights signals help you separate classic search from AI uses, which sets the stage for licensing, paid access, or stricter blocking policies if needed.

Recommended stance for most publishers today?

Most sites will want to keep traditional search visibility while curbing model training. A practical, low-friction starting point:

# Keep search open, stop model training for now
User-agent: *
Content-Signal: search=yes, ai-train=no
Allow: /

If you operate a Q&A, docs, or news site sensitive to live answer cannibalization, consider declaring ai-input=no as well, then monitor revenue and traffic before loosening it.

Pair signals with enforcement

Signals work best alongside bot controls. Examples you can implement in Cloudflare:

# Example WAF expression idea for known AI crawlers
(http.user_agent contains "GPTBot") or
(http.user_agent contains "Google-Extended") or
(http.user_agent contains "Claude-Web")

Use allow lists for search bots you rely on. Consider Cloudflare’s “block AI crawlers by default” posture and Pay-per-Crawl style programs if you plan to monetize access. These features have been rolling out across 2025.

How to deploy now

Generate the policy text at ContentSignals.org, paste the comments and your Content-Signal line into /robots.txt, then add WAF rules for known violators. Cloudflare customers can use the managed robots.txt option or deploy via the dashboard. Track impact in logs and analytics, and revisit ai-input once you have data.

As the CEO and founder of Pubcon and WebmasterWorld., Brett Tabke has been instrumental in shaping the landscape of online marketing and search engine optimization. His journey in the computer industry has spanned over three decades and has made him a pioneering force behind digital evolution. Full Bio
Visit Pubcon.com