Cloud & DevOpsSecurityWeb Development

Cloudflare AI Crawler Rules: Act Before September 15

Geometric blue barrier blocking AI crawler bot icons with glowing x402 payment coin passing through gate, representing Cloudflare AI crawler access control policy

On July 1, Cloudflare flipped the web’s default. AI crawlers that blend training or agent behavior with ordinary search will now be blocked from ad-monetized pages — unless site owners explicitly allow them. The company calls it Content Independence Day. Developers behind Cloudflare have until September 15 to decide what they actually want, because doing nothing is now a choice with consequences.

Three Crawlers, Three Behaviors

Cloudflare is sorting AI bots into three categories based on what they do with your content:

  • Search — indexes your content so AI can reference it later; you typically get referral traffic back
  • Agent — fetches your content in real time to complete a task for a user right now; no referral traffic, pure consumption
  • Training — scrapes your content into model weights permanently; no ongoing relationship, no credit, no traffic

Starting September 15, Training and Agent bots are blocked by default on pages that show ads — for new Cloudflare domains and all free-tier accounts. Search crawlers remain allowed. Paid accounts on existing domains are unaffected for now, but new domains added after September 15 get the new defaults immediately.

The Googlebot Problem No One Has a Clean Answer For

Here is where things get genuinely complicated. About 36% of AI crawler activity comes from mixed-use bots — crawlers that perform multiple functions simultaneously. Googlebot is the obvious example: it both indexes pages for Google Search and feeds Google’s AI systems. Bingbot and Applebot have similar split personalities.

Cloudflare applies a strictest-rule approach to these. Block Training, and you block the entire mixed-use bot — including its search function. The choice being presented to site owners is stark: allow Googlebot and accept that Google trains on your content, or block the training use and risk disappearing from search results.

Unlike robots.txt — advisory and technically ignorable — Cloudflare operates at the network level. There is no workaround. The block is real.

There is a counterpoint worth acknowledging: Google’s AI Overviews appear to use the same Googlebot that handles core search, not Google-Extended (the dedicated Gemini training crawler). A Training block may not prevent Google’s AI features from using your content in practice. But Cloudflare’s strictest-rule enforcement makes the situation murky enough that reviewing your settings before September 15 — not after — is the only safe move. Search Engine Journal has more on the Googlebot risk.

What You Need to Do Right Now

Log into your Cloudflare dashboard and check the AI Crawl Control settings. For each bot category — Search, Agent, Training — you can block, allow, or configure content-use preferences. You can also extend your robots.txt with Cloudflare’s Content Signals use parameter:

  • use=immediate — interact with content but store nothing
  • use=reference — index, excerpt, and link back (new managed robots.txt default)
  • use=full — summarize and reproduce freely

If you have already enabled Cloudflare’s managed robots.txt, use=reference was added to your file automatically. Verify that it matches what you actually want.

The Carrot: Charge Crawlers Directly via x402

The flip side of blocking is monetization. Cloudflare launched its Monetization Gateway alongside the new crawler rules — a system that lets you charge AI agents per content access using the x402 protocol.

The mechanic: an AI agent requests a resource, receives an HTTP 402 Payment Required response with a price and a stablecoin payment address (USDC or OpenUSD), pays in under a second, and resubmits. Cloudflare verifies the payment at the edge. No checkout pages, no payment API integration, no chargebacks. The Gateway is currently on a waitlist, with early partners Ceramic.ai and You.com already paying publishers when their content shapes an AI answer.

The Bigger Picture

Cloudflare sits in front of roughly 20% of the web. That is enough leverage to change crawler behavior in a way robots.txt never managed. More than half of all web traffic is now non-human. The traditional advertising model was built for humans who click links and return for more — AI agents do neither. TechCrunch covered the full policy shift and its implications for publishers.

The mixed-use bot problem is real and unresolved — Cloudflare has not given site owners a way to say “allow search, block training” for crawlers that do both. That gap needs to be fixed. But the direction here is right: if AI companies are building billion-dollar products on content they never paid for, changing the default from “help yourself” to “pay for it” is overdue.

If you run a site behind Cloudflare, check your AI Crawl Control settings before September 15. Letting the default choose for you is still a choice — it just might not be the one you intended.

ByteBot
I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.

    You may also like

    Leave a reply

    Your email address will not be published. Required fields are marked *