Most Websites Are Not Ready for AI Agents, Says Cloudflare

Cloudflare just released a free scoring tool that audits any website across four dimensions: discoverability, content format, bot access control, and protocol support.

Cloudflare’s New Scorecard Reveals Most Websites Are Not Built for AI Agents | NervNow
AI Infrastructure  ·  Enterprise Tech

Cloudflare’s New Scorecard Reveals Most Websites Are Not Built for AI Agents

A new audit tool scores websites on their readiness for AI agents across discoverability, content format, access control, and protocol support. Data from 200,000 domains makes the picture clear: the web is structurally unprepared.

NervNow  ·  April 20, 2026

The web learned to serve browsers. Then it learned to serve search engine crawlers. Now AI agents are navigating the internet on behalf of users, and most websites are structurally unprepared for that interaction. Cloudflare quantified this gap on April 17 with the launch of isitagentready.com, a free tool that scores any website on how well it supports AI agents, and a companion dataset on Cloudflare Radar that tracks standards adoption across the internet’s most-visited domains.

The motivation is straightforward: as agents move from novelty to operational infrastructure across enterprise workflows, the websites those agents need to read, query, and transact with become the bottleneck. Cloudflare is betting that a scoring system, analogous to how Google Lighthouse drove web performance improvements, can accelerate adoption of the emerging standards that make agents work reliably at scale.

Cloudflare analyzed 200,000 of the most visited domains globally, filtering out redirects, ad servers, and tunneling services to focus on the businesses, publishers, and platforms that agents would realistically interact with. The data was unambiguous. While 78% of sites have a robots.txt file, the vast majority were written for traditional search crawlers, not AI agents. Only 4% of sites have declared their AI usage preferences using Content Signals, a standard that lets site owners specify whether their content can be used for AI training, inference, or search. Markdown content negotiation, which enables a server to return clean, token-efficient markdown instead of HTML when an agent requests it, passes on just 3.9% of sites. Newer standards such as MCP Server Cards and API Catalogs appear on fewer than 15 sites across the entire dataset. The Radar chart tracking these numbers will update weekly and is also accessible via the Radar API.

“The transition from a human-read web to a machine-read web is the biggest architectural shift in decades.”

The scoring tool evaluates sites across four dimensions. Discoverability covers whether the site has robots.txt, sitemap.xml, and HTTP Link headers (RFC 8288) that allow agents to surface resources directly from the HTTP response, without parsing HTML. Content covers Markdown content negotiation, which can reduce token consumption by up to 80% compared to HTML, making agent responses faster, cheaper, and more complete given the context window limits most agent tools impose. The tool checks for proper Markdown handling by default; llms.txt support, a plain-text file providing agents a structured reading list of the site’s content, can be added to the scan optionally. Bot access control covers Content Signals directives in robots.txt, AI-specific crawl rules, and Web Bot Auth, an IETF draft standard that lets well-behaved bots cryptographically sign their HTTP requests and be verified by receiving servers using published public keys at a /.well-known endpoint. Protocol discovery, the most advanced dimension, covers API Catalog (RFC 9727) at /.well-known/api-catalog, MCP Server Cards at /.well-known/mcp/server-card.json, Agent Skills indexes at /.well-known/agent-skills/index.json, OAuth server discovery via RFC 9728 for sites requiring login, and WebMCP, a browser-native protocol enabling in-page agent tool exposure.

The MCP Server Card deserves particular attention for enterprise teams evaluating agentic tooling. It is a JSON file that describes a server’s tools, transport endpoint, and authentication requirements before an agent connects. An agent reads the card and has everything it needs to begin interacting with that server, without having to scrape a developer portal or parse documentation. Similarly, the OAuth discovery flow means that when an agent encounters a protected site, it can direct the user through a standard grant flow rather than requiring workarounds like handing agents full browser sessions with live cookies. Cloudflare Access now supports this OAuth flow as of Agents Week 2026.

The tool also checks for agentic commerce standards, specifically x402, Universal Commerce Protocol, and Agentic Commerce Protocol, though these do not currently count toward the score. x402 revives a long-unused HTTP 402 status code to enable machine-readable payment negotiation: an agent requests a resource, the server responds with 402 and payment terms, the agent pays and retries. The standard exists because web checkout flows were designed for humans, and agent-driven procurement tasks need a protocol-level equivalent. Cloudflare co-founded the x402 Foundation with Coinbase to drive adoption of the open standard.

For each failing check, the tool generates a prompt that can be passed directly to a coding agent to implement the required fix. The isitagentready.com site is itself MCP-enabled, exposing a stateless MCP server at /.well-known/mcp.json so any compatible agent can run scans programmatically. Cloudflare also integrated the same checks into its existing URL Scanner under a new Agent Readiness tab, available via API by passing an agentReadiness flag in the scan request.

Alongside the tool launch, Cloudflare detailed how it rebuilt its own developer documentation to perform well on the criteria it had defined, and benchmarked the results. The documentation team addressed a core structural problem first: a single llms.txt covering 5,000-plus pages exceeds most agents’ immediate context windows, forcing iterative keyword searches that fragment context, accumulate thinking tokens, and slow response times. The fix was to generate a separate llms.txt for each top-level product directory, with the root file pointing to subdirectories, so an agent handling a specific product query can load the full relevant directory in one context window. The team also stripped roughly 450 directory-listing pages that provided no semantic content, and audited page titles, descriptions, and URL structures so agent metadata was accurate and actionable.

The documentation team added several agent-specific layers on top of that structural work. Every HTML page in the docs now includes a hidden directive for LLMs, informing any agent reading the HTML version that a Markdown version exists and directing it to use the llms.txt or llms-full.txt endpoints. This directive is stripped from the Markdown version itself to prevent a recursive loop. For pages covering deprecated products such as Wrangler v1, Cloudflare implemented redirects for AI training crawlers, routing them away from outdated content to current documentation while keeping historical pages accessible to human readers. A dedicated LLM Resources entry in every product directory’s sidebar makes these resources discoverable to developers building agent-powered tools. The team also tested the documentation against afdocs, an open-source agent-friendly documentation specification, and built custom audit tooling on top of it to monitor quality over time.

The benchmark results from that rebuild were material. Testing with an agent pointed at Cloudflare’s documentation versus comparable technical documentation sites showed 31% fewer tokens consumed on average, and correct answers delivered 66% faster. For enterprise teams where agent queries run at volume, those margins translate directly into cost and latency at scale. The underlying mechanism is context window efficiency: when an agent can ingest a complete product directory and identify the exact page it needs in a single pass, it avoids the grep loop, the iterative narrowing, and the compounding inaccuracy that comes from never holding the full picture at once.

The broader implication is not only about Cloudflare’s infrastructure. Organizations deploying agents for internal knowledge retrieval, vendor research, procurement, or customer-facing products are downstream of these standards. The quality, speed, and cost of agent interactions depend substantially on how the websites those agents query are structured. How quickly that changes is now measurable, publicly, every week.

Avatar photo
NN Desk

Leave a Reply

Your email address will not be published. Required fields are marked *

Stay updated with NervNow Weekly

Subscribe now