State of AI Agent Policies - 2026

We crawled 1 million domains. 90% have no AI policy.

The first comprehensive study of how the web governs AI agents. We parsed 8 competing standards across the Tranco top 1M domains to map the permissions landscape that every AI agent navigates - mostly blindly.

999,316 domains analyzed8 standards parsedFeb 12–22, 2026By Maango

90.1%

No machine-readable AI policy

4.8%

Explicitly block all AI agents

6.9%

Block GPTBot (most-blocked bot)

2.6%

Have comprehensive AI policies

01 - Key Findings

The web is not ready for AI agents.

Between February 12 and 22, 2026, we crawled 999,316 domains from the Tranco top 1M list, parsing every known AI policy standard: robots.txt AI directives, llms.txt, ai.txt, TDMRep, Cloudflare Content Signals, meta tags, and others. What we found is a governance vacuum.

9 in 10 domains have no machine-readable AI policy at all. Of the 10% that do express a stance, most rely solely on robots.txt - a 30-year-old protocol that was never designed for AI and has no concept of “training” vs. “search” vs. “summarization.”

Only 2.6% of all domains have what we'd consider a comprehensive policy - signals across multiple standards with clear rules for different AI use cases. The rest of the web is either silent or using blunt instruments.

The compliance gap goes deeper than protocol signals. We scanned 79,173 Terms of Service pages for AI and crawling-related language. 7,575 domains explicitly prohibit crawling or AI training in their ToS but have zero AI-specific robots.txt rules, meaning an agent checking only machine-readable signals would see no restriction while the site's legal terms say otherwise. YouTube, Discord, Substack, and Target are among them.

Meanwhile, the stakes are rising. Anthropic has agreed to a proposed $1.5B copyright settlement (pending final court approval). The EU Copyright Directive makes TDM opt-out signals legally binding. OpenAI updated its crawler documentation in December 2025 to remove ChatGPT-User from robots.txt compliance, signaling that user-initiated browsing may no longer honor robots.txt directives. The gap between what AI agents are doing and what websites have said about it is enormous - and growing.

The core problem: There are 8 competing standards, near-zero interoperability, and almost no adoption of anything beyond robots.txt. AI agents are navigating a web that hasn't decided how to talk to them.

02 - Methodology

How we collected this data

Transparency matters. Every number in this report is derived from a direct crawl of the Tranco top 1M list - a research-grade domain ranking based on aggregated DNS usage data from Cloudflare Radar, Farsight DNSDB, and other sources. Unlike Alexa (discontinued 2022), Tranco is designed to resist manipulation and is used widely in academic security research.

Source List

Tranco Top 1M (Feb 2026)

Domains Crawled

999,316 of 1,000,000

Crawl Period

Feb 12–22, 2026

Failure Rate

0.07% (697 domains)

Standards Parsed

8 (robots.txt, llms.txt, ai.txt, TDMRep, Content Signals, Markdown for Agents, meta tags, ToS AI mentions)

For each domain, we fetched and parsed every known AI policy signal, extracted per-bot directives from robots.txt, checked for llms.txt and ai.txt files, inspected HTTP headers and meta tags for Content Signals and TDM headers, and computed an openness score (0–100) and stance classification. All raw data is stored in Supabase and queryable through the Maango API.

What we did not do: We did not crawl subdomains or subpages - each entry represents the root domain only. We did not contact domain owners or make editorial judgments about intent.

What this data doesn't tell you

A domain having no machine-readable AI policy is not inherently a problem. For the vast majority of the long tail - personal blogs, small business sites, hobby projects - AI crawling may pose no meaningful risk, and configuring policy files would be wasted effort. The 90% figure reflects the current state of the web, not a universal failure rate. The domains where this gap matters most are those with high-value content, significant traffic, or legal exposure - a meaningful but much smaller subset of the million we crawled.

03 - Stance Distribution

What the web says about AI agents

We classified every domain into one of five categories based on the signals we found. The result is stark: the vast majority of the web says nothing at all.

AI Stance Distribution - All 999,316 Domains

No Policy = 90.1% of domains with no intentional AI-specific policy (includes ~22K broad wildcard blocks). Source: Maango crawl of Tranco Top 1M, Feb 2026

Stance	Definition	Domains	Share
No Policy	No machine-readable AI-specific signal found (includes ~22K broad wildcard blocks that catch AI bots as a side effect)	900,736	90.1%
Blocks All AI	Explicitly targets known AI bots by user-agent name	47,697	4.8%
Selective	Allows some AI use cases, blocks others	35,016	3.5%
Allows All	Explicitly permits all AI access	8,643	0.9%
Blocks Training Only	Allows search/browsing but blocks training	7,224	0.7%

Defining “No Policy”: 58% of domains have a robots.txt file - so how can 90% have “no policy”? Because most robots.txt files contain only generic crawl rules (like Disallow: /admin/) that predate AI entirely. We classify a domain as having an AI policy only if it contains AI-specific signals: rules targeting known AI bots (GPTBot, ClaudeBot, etc.), AI-related meta tags, llms.txt, TDMRep headers, or Content Signals. The 90.1% includes approximately 22,000 domains with broad wildcard rules (like Disallow: *) that block all bots, including AI crawlers, as a side effect, not as a deliberate AI stance.

The “No Policy” category deserves scrutiny. Having no AI-specific signals doesn't mean a domain wants to be crawled - it means they haven't expressed a preference in any machine-readable format an AI agent can interpret. In many cases, especially for smaller sites, the owner may not even be aware these standards exist.

It's also worth noting that for some domains, silence is intentional. Open-source projects, academic institutions, documentation sites, and platforms under open licenses (like Wikipedia's CC BY-SA) may have no AI policy precisely because they welcome broad access - including by AI agents. Not every silent domain is an unaware one, and not every “no policy” result represents a governance gap.

The nuance gap: Only 4.4% of all domains have what you might call a “considered” AI policy - one that differentiates between use cases (selective) or explicitly opts in (allows all). The other 95.6% either say nothing, block everything, or block only training, leaving almost no middle ground between silence and restriction.

04 - Signal Adoption

Which standards are actually being used?

There are 8 competing standards for expressing AI permissions. In practice, one dominates and the rest barely register.

Signal Adoption Across 999,316 Domains

Note: A domain may have multiple signals. "AI-specific robots.txt" means robots.txt with rules targeting known AI bots (GPTBot, ClaudeBot, etc.), distinct from generic robots.txt rules.

robots.txt remains the backbone. 58% of domains have a robots.txt file, but only 11.1% include rules specifically targeting AI bots. The rest have generic crawl directives that predate the AI era.

llms.txt is the leading “new” standard at 3.24% adoption - roughly 32,400 domains. It's been adopted by major players including Adobe, Shopify, Stripe, Salesforce, Nvidia, and Dropbox. But llms.txt is an information file, not a permissions mechanism - it tells LLMs what content is available and how it's structured, not what they're allowed to do with it.

Cloudflare Content Signals reach 3.48% - but this warrants a caveat. Nearly all Content Signals domains use the identical pattern: search=yes, ai-train=no. This uniformity strongly suggests these are Cloudflare platform defaults applied automatically to customer sites, not deliberate policy choices by individual domain owners.

TDMRep is virtually non-existent at 37 domains out of 1 million. This is significant because the EU Copyright Directive requires that TDM opt-outs be expressed in “machine-readable means” - and TDMRep is the W3C protocol specifically designed for this purpose. While robots.txt AI directives are also being considered as a valid opt-out mechanism (11.1% adoption), the protocol purpose-built for EU compliance has near-zero adoption. The EU Commission launched a consultation in December 2025 to identify generally-agreed opt-out protocols, signaling that this question remains unresolved.

Detection covers /.well-known/tdmrep.json and HTTP response headers. Implementations via HTML meta tags on subpages are not captured. Actual adoption, particularly among European academic and news publishers, is likely higher.

Terms of Service provide a hidden policy layer. We probed for Terms of Service pages across all domains, scanning homepage links and common paths like /terms, /tos, and /legal/terms, and found discoverable ToS URLs on 79,173 domains (7.9%). We then scanned each page for AI and crawling-related language. 11,637 domains (1.2%) contain explicit mentions of AI training, crawling, or bot access in their ToS, and 7,481 explicitly prohibit crawling or automated access. ToS discovery scales with site popularity: 12.7% of the top 1K have a parseable ToS, vs. 7.9% overall. This data feeds into our conflict detection and is explored in detail in Section 12.

Signal Adoption by Domain Popularity

Signal	Top 1K	Top 10K	Top 100K	All 1M
AI-specific robots.txt	17.7%	15.5%	12.5%	11.1%
llms.txt	5.4%	3.8%	3.4%	3.2%
Content Signals	0.1%	1.7%	2.6%	3.6%
ai.txt	0.1%	0.1%	0.1%	0.1%
ToS with AI/crawling mentions	—	—	—	1.2%
TDMRep	0.0%	0.0%	0.0%	0.004%

An interesting pattern emerges: the most popular sites are more likely to use robots.txt AI rules and llms.txt (they have the engineering resources), while Content Signals adoption increases further down the list (driven by Cloudflare's broad install base among smaller sites).

05 - The Blocking Landscape

GPTBot is the most blocked bot on the internet.

When domains do block AI agents, they rarely discriminate. But the order of blocking reveals market dynamics and publisher sentiment.

Top 10 Most-Blocked AI Bots

% of all 999,316 domains that explicitly disallow each bot in robots.txt

GPTBot leads at 6.9% - roughly 68,700 domains actively block OpenAI's crawler. ClaudeBot (Anthropic) follows at 6.1%, then Amazonbot (6.0%), Google-Extended (5.9%), and Applebot-Extended (~5.6%). Meta's bot appears under multiple user-agent strings across robots.txt files; counting Meta-ExternalAgent and its aliases together, approximately 56,000 unique domains block Meta's crawler - placing it firmly in the top five.

The blocking is highly correlated. A precise 2×2 breakdown of GPTBot vs. ClaudeBot reveals: 58,791 domains block both, 9,888 block only GPTBot, 2,521 block only ClaudeBot, and 928,116 block neither. The asymmetry (more GPT-only than Claude-only) is partly explained by NULL handling in early robots.txt parsers that defaulted to disallowing unrecognised agents - but the dominant pattern is clear: blocking is overwhelmingly a blanket decision, not a targeted choice about specific companies.

ChatGPT-User vs. GPTBot: OpenAI uses two bots - GPTBot for training and ChatGPT-User for real-time browsing. GPTBot is blocked on 6.9% of domains; ChatGPT-User on only 2.9%. This gap of ~40,000 domains may represent sites that want to block training but not search - or, more likely, sites that haven't updated their robots.txt since ChatGPT-User was introduced.

Blocking Rates Among the Top 10K

The most popular sites block AI bots at roughly the same rates, with one exception: CCBot (Common Crawl) is blocked more heavily among top domains (7.4% vs 5.4% overall), likely because well-known publishers were early to block the training dataset that powered early LLMs.

06 - The Popularity Divide

Bigger sites are more restrictive.

We segmented domains by Tranco rank to see how AI policy adoption varies with site popularity.

AI Stance by Domain Popularity Tier

Tier	No Policy	Blocks All	Selective	Allows All	Avg Openness
Top 1K	84.0%	8.7%	4.7%	0.6%	34.0
Top 10K	86.7%	6.4%	4.7%	0.9%	34.8
Top 100K	89.7%	5.1%	3.2%	0.8%	35.5
All 1M	90.1%	4.8%	3.5%	0.9%	34.8

* “No Policy” includes ~22K broad wildcard block domains.

The top 1,000 domains block AI agents at 1.8x the overall rate (8.7% vs. 4.8%). They're also more likely to have selective policies (4.7% vs. 3.5%). This makes intuitive sense - high-value content sites have more to protect and more resources to configure policies.

Conversely, the long tail (ranks 100K–1M) is overwhelmingly silent. These are the sites where AI agents have the least guidance and the most latitude.

07 - Notable Domains

Who's blocking, who's open, and who's silent.

Top 100 Domains That Block All AI

facebook.com (#5)instagram.com (#12)twitter.com (#16)amazon.com (#24)netflix.com (#37)pinterest.com (#47)x.com (#58)

Social platforms and major consumer brands dominate the “blocks all” list. These companies have the most user-generated content and the most to lose from unauthorized AI training.

Top 100 Domains With Selective Policies

linkedin.com (#19)github.com (#31)tiktok.com (#64)yandex.ru (#82)

These sites differentiate between AI use cases - allowing some bots while blocking others, or permitting search access while blocking training crawlers.

Top 100 Domains With No Policy

google.com (#1)youtube.com (#9)microsoft.com (#3)apple.com (#8)wikipedia.org (#29)

Perhaps the most striking finding: the world's biggest websites - Google, YouTube, Microsoft, Apple, Wikipedia - have no AI-specific machine-readable policy. This doesn't mean they don't have AI-related terms in their ToS, but they haven't expressed those terms in any format that an AI agent can programmatically read and respect.

A caveat: “no policy” doesn't mean “no position.” Some of these domains have AI-related terms in their Terms of Service, internal crawling frameworks (Google created Google-Extended as an AI training control mechanism), or open licenses that serve as implicit permission (Wikipedia is CC BY-SA). What they lack is a machine-readable signal that an AI agent can programmatically discover and respect without human interpretation - which is precisely the gap this report measures.

Broad Wildcard Blocks

reddit.com (#103)

Within the “No Policy” category, approximately 22,000 domains use broad Disallow: * or blanket user-agent rules in robots.txt that happen to catch AI bots as a side effect - not because the owner explicitly targeted AI. Reddit is the highest-profile example: its robots.txt blocks most automated access broadly, but this predates AI crawler policy and cannot be read as a deliberate AI stance.

Most Restrictive Major Domains

The lowest openness scores (5-10 out of 100) among domains ranked in the top 10K are dominated by news publishers:

espn.com (#387)cnbc.com (#442)latimes.com (#777)nbcnews.com (#792)bbc.co.uk (#212)newsweek.com (#1,122)

Most Open Major Domains

The highest openness scores (95 out of 100) include cybersecurity vendors and platforms that benefit from broad AI indexing:

kaspersky.com (#150)sophos.com (#517)expedia.com (#2,273)lg.com (#2,621)

08 - Infrastructure Patterns

Your CDN and CMS shape your AI policy.

One of the more unexpected findings: the infrastructure a site runs on is a strong predictor of its AI stance.

CDN and AI Blocking

AI Blocking Rate by CDN Provider

% of domains on each CDN that block all AI agents

Cloudflare-hosted sites block AI agents at 11.3% - more than double the overall rate of 4.8%. This is almost certainly a product feature effect: Cloudflare launched a one-click “Block AI Bots” toggle in July 2024, and in July 2025 went further by blocking AI crawlers by default for all new domains, making it trivially easy for site owners to opt out. When blocking is one click away, more people click.

This points to a broader pattern: the 90% silence may be driven more by tooling friction than by lack of interest. When a platform makes blocking one click, adoption jumps 2.3x above baseline. When expressing a nuanced, multi-standard policy requires editing config files, adoption stays near zero. The gap is in tooling and awareness, not intent.

At the other end, Vercel (1.3%), Akamai (1.2%), and Netlify (0.7%) sites are far more permissive. These platforms are developer-oriented and don't offer default AI blocking features. We verified that the Vercel and Cloudflare detection signals have only 40 domains of overlap, confirming that the CDN categories are effectively independent - the Cloudflare blocking spike is not an artefact of cross-CDN detection.

A note on CDN detection: We identify CDNs via HTTP response headers (Server, x-served-by, via, etc.) and CNAME records. This method has known limitations - Akamai in particular often does not expose identifying headers, which likely explains its low count (2,122) relative to its true market share.

CMS and AI Blocking

AI Blocking Rate and Avg Openness by CMS

CMS	Domains	% Block All	Avg Openness
Shopify	22,378	1.2%	63.1
Gatsby	1,719	1.2%	52.4
Webflow	10,251	2.6%	57.5
Wix	2,053	3.2%	21.6
Squarespace	1,758	3.4%	21.9
Drupal	9,967	4.5%	33.4
Ghost	3,201	4.8%	40.1
Hugo	2,533	4.9%	33.9
Next.js	29,618	5.4%	42.9
WordPress	150,626	6.3%	46.9

Shopify sites are the most open to AI (1.2% blocking, 63.1 average openness). This makes strategic sense - e-commerce sites want maximum discoverability, and AI-powered product search is a distribution channel, not a threat.

WordPress sites are the most restrictive CMS at 6.3% blocking. Ghost is popular with independent publishers and bloggers - the exact audience most concerned about AI content extraction.

Wix and Squarespace are anomalies with average openness scores of just 21.6 and 21.9 respectively - far below any other CMS. This likely reflects platform-level default settings rather than individual site owner decisions.

09 - Geographic Patterns

Where you are shapes how you feel about AI.

TLD analysis reveals significant geographic variation in AI policy adoption.

AI Blocking Rate by Country-Code TLD

Only TLDs with 100+ domains shown. Percentage = share of domains that block all AI agents.

The UK is among the most restrictive major markets - 4.4% of .uk domains block all AI agents, among the highest rates for major country-code TLDs. This aligns with the UK's active regulatory posture and its large publishing industry.

Japan and Iran are the most permissive at 1.1% and 0.9% respectively. Japanese domains are notably open, possibly reflecting a more AI-friendly regulatory environment and technology culture.

Russia is more open than Europe - .ru domains block at 1.2% vs. 3.2% for .de (Germany) and 4.1% for .fr (France). European domains are broadly more restrictive, likely influenced by the GDPR compliance culture and the EU Copyright Directive.

10 - Use Case Analysis

Training, search, and inference are blocked at similar rates.

We analyzed how domains treat three distinct AI use cases: training (building models on the content), search (indexing for retrieval), and inference (real-time reasoning over the content).

Policy Stance by AI Use Case

Use Case	Blocked	Allowed	No Policy	Selective
Training	6.5%	0.4%	91.8%	1.3%
Search	6.1%	0.8%	91.6%	1.5%
Inference	6.6%	0.4%	93.0%	0.0%

The differences are smaller than you might expect. Training and inference are blocked at almost identical rates (6.5% vs 6.6%). Search is slightly less blocked (6.1%) and more often selectively allowed (1.5% vs 1.3%).

The nuance gap is real: Only 0.7% of domains (those classified as “blocks training only”) differentiate between training and other use cases. The remaining 99.3% either block everything, allow everything, or say nothing. The sophisticated, purpose-aware AI policy that regulators envision is essentially absent from the web.

11 - Conflicts Between Standards

6,317 domains contradict themselves.

When multiple standards exist, they don't always agree. We detected 6,317 domains (0.63% of all, but 6.4% of domains that have any signals) where different policy sources express conflicting positions.

This is a structural problem, not an edge case. A site might block GPTBot in robots.txt but set ai-train=no, search=yes in Content Signals - one says “no access,” the other says “search is fine.” Which does the agent follow?

Today, each agent resolves this differently - or ignores it. There is no standard for conflict resolution. This is exactly the kind of ambiguity that creates legal liability for agent developers and frustration for website owners.

Conflicts will grow. As newer standards gain adoption, the probability of contradictions increases. An agent that only checks robots.txt will make different decisions than one that also reads Content Signals. A neutral aggregation layer that detects and resolves these conflicts becomes more valuable with every new standard.

To be clear: for the majority of domains that do have policies, the picture is simpler than this section might suggest. Of the ~99,000 domains with any AI-specific signal, 96.2% have no conflicts between standards. A basic robots.txt check gives an agent a reasonably accurate answer for most policy-bearing domains today. The complexity that demands aggregation lives at the margins - but those margins are growing with every new standard adopted, and the conflict rate will rise as multi-signal adoption increases.

12 - The ToS Gap

7,575 domains say “no” in their Terms of Service, but nothing in robots.txt.

Machine-readable signals are only half the picture. We went beyond robots.txt, llms.txt, and the other protocol-level standards to scan the actual legal text that governs every website: the Terms of Service.

We probed for ToS pages across all 999,316 domains using two methods: scanning homepage HTML for links matching common patterns (terms, tos, terms-of-service, legal/terms, acceptable-use) and HEAD-probing fallback paths when no link was found. We discovered parseable ToS pages on 79,173 domains (7.9% of all crawled). Each page was then scanned for AI and crawling-related language using keyword matching across sliding text windows, looking for explicit mentions of scraping, crawling, bots, AI training, machine learning, and related terms, alongside stance signals (prohibits, allows, restricts).

What we found

ToS AI Stance	Domains	Share of ToS-bearing
Silent (no AI/crawling keywords)	55,594	70.2%
Prohibits crawling	7,481	9.4%
Prohibits AI training	1,377	1.7%
Mentions crawling (neutral)	823	1.0%
Mentions AI training (neutral)	606	0.8%
Mixed signals	369	0.5%
Restricts crawling	316	0.4%
Allows AI training	292	0.4%
Allows crawling	289	0.4%
Restricts AI training	84	0.1%

7,481 domains explicitly prohibit crawling in their Terms of Service. Another 1,377 specifically prohibit AI training. These aren't vague. They use language like “prohibited activities include automated access,” “you may not scrape, crawl, or use bots,” and “use of content for machine learning or AI training is not permitted.”

The gap that matters

Here is the finding that should concern every AI agent developer:

Of the 8,858 domains that prohibit crawling or AI training in their ToS, 85.5% (7,575) have zero AI-specific robots.txt rules.

(The 7,575 figure includes domains prohibiting either crawling or AI training, which is why it slightly exceeds the 7,481 crawling-only count.)

If your agent checks only robots.txt, these domains look like “no policy.” Green light. But their legal terms say stop. This is not a theoretical conflict. It's a live compliance gap affecting thousands of domains, including some of the most visited sites on the internet.

Notable domains in the gap

These high-traffic domains have no AI-specific robots.txt rules but explicitly prohibit crawling or automated access in their Terms of Service:

youtube.com (#9) · discord.com (#165) · line.me (#450) · target.com (#563) · substack.com (#586) · tenor.com (#851)

An AI agent visiting any of these domains would see no robots.txt restriction and proceed, while the site's legal terms prohibit exactly what the agent is doing.

ToS prohibitions by popularity tier

Tier	Prohibits crawling in ToS	Total	%
Top 1K	13	998	1.3%
Top 10K	177	9,916	1.8%
Top 100K	1,231	99,829	1.2%
All 1M	7,481	999,316	0.7%

The top 10K sites are the most likely to prohibit crawling in their ToS (1.8%), which makes sense: they have legal teams, they have terms, and they have content worth protecting. The long tail is lower because most small sites don't have substantive ToS pages at all.

A counterintuitive finding

Domains that prohibit crawling in their ToS have an average openness score of 56.1, well above the overall average of 34.8. This isn't a contradiction: these tend to be large, popular sites with robust policy infrastructure. They score higher because they have more signals present (ToS, robots.txt, sometimes llms.txt). They're not “more open” in intent; they're more documented. The openness score measures signal presence, not permissiveness.

Why this matters

ToS prohibitions may carry legal weight that robots.txt does not. robots.txt is a voluntary protocol with no legal enforcement mechanism. Terms of Service, on the other hand, are contractual: visiting and using a site may constitute acceptance of its terms, depending on jurisdiction and implementation. A company that scrapes a site in violation of its ToS faces a different legal exposure than one that ignores a robots.txt Disallow.

This is exactly the kind of signal that a comprehensive policy check must include. An agent that only reads machine-readable files will miss what the law might actually enforce.

Methodology note: Our ToS scanning uses keyword matching, not LLM-based semantic analysis. This means we may miss AI-related clauses that use unusual phrasing, and we may flag some false positives where crawling keywords appear in unrelated contexts. A future crawl will incorporate LLM analysis for more accurate ToS interpretation. We also only scan the root ToS page. Sites with multiple terms documents (e.g., separate API terms, data processing agreements) are not fully captured.

13 - Implications

What this means.

For AI Agent Developers

You are almost certainly not checking permissions comprehensively. If your agent only reads robots.txt, you're seeing 1 of 8 signals and missing conflicts, purpose-specific rules, and newer standards that may carry legal weight. The 90% of domains with no policy are a gray zone - absence of a signal is not the same as permission. As regulatory enforcement tightens, “we didn't check” becomes increasingly untenable.

The ToS gap makes this even more urgent. 7,575 domains prohibit crawling in their Terms of Service but have no AI-specific robots.txt rules. If your agent checks only machine-readable signals, these sites look like “no policy,” but their legal terms explicitly prohibit what your agent is doing. ToS violations carry different legal exposure than robots.txt violations. A comprehensive compliance check must go beyond protocol-level signals.

For Website Owners

If you haven't configured AI-specific signals, AI agents are making their own decisions about your content. The tools exist - robots.txt AI directives, llms.txt, Content Signals - but adoption is minimal. The gap between the legal right to opt out (especially under the EU Copyright Directive) and the practical expression of that right is enormous. Most opt-out mechanisms that regulators point to have near-zero adoption.

For Regulators

The EU Copyright Directive's TDM reservation mechanism - as expressed through the purpose-built TDMRep protocol - is adopted by 37 domains out of 1 million. Broader opt-out signals like robots.txt AI directives have more traction (11.1%), but the gap between regulatory intent and practical implementation remains large. This is an implementation challenge, not a policy failure. The intent behind machine-readable opt-out mechanisms is sound - the adoption gap suggests that tooling, platform defaults, and awareness need to catch up. Platforms, registrars, and infrastructure providers have a role to play in making opt-out as accessible as Cloudflare made blocking.

For the Industry

Eight competing standards is not a governance framework - it's a fragmentation problem. The web needs convergence, not more proposals. Until that happens, a neutral aggregation layer that reads everything and presents a unified view is not optional infrastructure - it's the only way to make the current patchwork work.

Look up any domain's AI policy.

Maango crawls 8 standards, detects conflicts, and serves unified AI permissions through a single API call.

Try Maango - Free API Docs

Protect your site

Set your AI policy in 2 minutes. Free, no signup required.

Build your policy