API Reference
BetaBase URL: https://api.maango.io
Beta: This API is in active development. Endpoints and response shapes may change. If something looks off, let us know at contact@maango.io.
All endpoints except /v1/auth/signup require an Authorization: Bearer header.
Quick Start
Get up and running in under a minute.
Get your API key
Or use curl:
curl -X POST https://api.maango.io/v1/auth/signup \
-H "Content-Type: application/json" \
-d '{"email": "you@example.com"}'Make your first call
curl -H "Authorization: Bearer YOUR_KEY" \ https://api.maango.io/v1/domain/nytimes.com
Response:
{
"domain": "nytimes.com",
"found": true,
"stance": "blocks_all_ai",
"use_cases": {
"training": "blocked",
"search": "blocked",
"inference": "blocked"
},
"bots": { "blocked": ["GPTBot", "ClaudeBot", "...29 total"], "allowed": [] },
"signals": { "robots_txt": true, "llms_txt": false, "ai_txt": false, "tdmrep": false, "content_signals": false, "agents_json": false }
}Integrate into your agent
import httpx
MAANGO_KEY = "maango_sk_xxxxx"
async def check_before_visit(url: str) -> bool:
domain = url.split("//")[-1].split("/")[0].replace("www.", "")
r = await httpx.get(
f"https://api.maango.io/v1/domain/{domain}",
headers={"Authorization": f"Bearer {MAANGO_KEY}"}
)
policy = r.json()
if not policy.get("found"):
return True # Unknown domain, proceed with caution
if policy["stance"] == "blocks_all_ai":
return False # Blocked, don't visit
return TrueEndpoints
/v1/auth/signupGet an API key. No authentication required.
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
| string | Yes | Your email address |
Example Request
curl -X POST https://api.maango.io/v1/auth/signup \
-H "Content-Type: application/json" \
-d '{"email": "developer@example.com"}'Example Response
{
"api_key": "maango_sk_a1b2c3d4e5f6...",
"email": "developer@example.com",
"limits": {
"per_minute": 200,
"per_day": 10000,
"per_month": 100000
},
"message": "Store this key securely. It will not be shown again."
}Errors
/v1/domain/{domain}Check any domain's AI policy. The primary endpoint.
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
| domain | string | Yes | The domain to look up (e.g. nytimes.com) |
Example Request
curl https://api.maango.io/v1/domain/nytimes.com \ -H "Authorization: Bearer maango_sk_xxxxx"
Example Response
{
"domain": "nytimes.com",
"found": true,
"stance": "blocks_all_ai",
"detail": "blocks_all_ai",
"use_cases": {
"training": "blocked",
"search": "blocked",
"inference": "blocked"
},
"bots": {
"blocked": [
"CCBot", "GPTBot", "Scrapy", "YouBot", "BLEXBot",
"Diffbot", "Timpibot", "ClaudeBot", "Omgilibot",
"cohere-ai", "omgilibot", "Bytespider", "Claude-Web",
"Claude-User", "FacebookBot", "TurnitinBot", "ChatGPT-User",
"ImagesiftBot", "anthropic-ai", "DataForSeoBot",
"DuckAssistBot", "OAI-SearchBot", "PerplexityBot",
"magpie-crawler", "Google-Extended", "Claude-SearchBot",
"Applebot-Extended", "meta-externalagent",
"meta-externalfetcher"
],
"allowed": []
},
"signals": {
"robots_txt": true,
"llms_txt": false,
"ai_txt": false,
"tdmrep": false,
"content_signals": false,
"agents_json": false
},
"site": {
"tranco_rank": 155,
"cdn": "fastly",
"cms": "wordpress",
"behind_cloudflare": false,
"has_paywall": false,
"has_tos": true,
"tos_url": "https://nytimes.com/terms"
},
"last_updated": "2026-02-27T03:19:29Z"
}Errors
Domain not found? If the domain isn't in the registry, you'll get "found": false, "queued": true. The domain is queued for analysis and will be available shortly. In your code, treat "found": false as "unknown" — either allow with caution or block depending on your use case.
Data freshness: The last_updated field shows when the domain was last crawled. Maango re-crawls the full dataset regularly. If a domain's policy is older than 30 days, consider rechecking — policies can change as publishers update their robots.txt or adopt new standards.
Stance → Detail mapping
Each stance can have multiple detail values. Use stance for broad decisions and detail for fine-grained logic:
| Stance | Possible detail values |
|---|---|
| blocks_all_ai | blocks_all_ai, wildcard_block |
| blocks_training | blocks_training |
| selective | selective |
| allows_all | allows_all, has_llms_txt |
| no_policy | no_ai_rules, no_robots, has_signals, robots_blocked, unreachable |
The detail field
A more granular classification than stance. Possible values:
| Value | Description |
|---|---|
| blocks_all_ai | Explicitly blocks all AI bots |
| blocks_training | Blocks training use specifically |
| selective | Allows some bots/use-cases, blocks others |
| allows_all | Explicitly allows AI access |
| no_ai_rules | Has robots.txt but no AI-specific directives |
| has_llms_txt | No AI bot rules but provides an llms.txt file |
| has_signals | No AI bot rules but has other signals (content signals, meta tags) |
| no_robots | No robots.txt file found at all |
| robots_blocked | robots.txt fetch was blocked (e.g. Cloudflare 403) |
| wildcard_block | Blocks all bots with a wildcard rule, not AI-specific |
| unreachable | Domain couldn't be reached during crawl |
/v1/domain/{domain}/fullFull raw crawl data for a domain. Returns all JSONB fields parsed including raw robots.txt content, meta tags, headers, and ToS snippets.
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
| domain | string | Yes | The domain to look up |
Example Request
curl https://api.maango.io/v1/domain/nytimes.com/full \ -H "Authorization: Bearer maango_sk_xxxxx"
Example Response
{
"id": 3561,
"domain": "nytimes.com",
"robots_txt_exists": true,
"llms_txt_exists": false,
"ai_txt_exists": false,
"tdmrep_exists": false,
"agents_json_exists": false,
"ai_stance": "blocks_all_ai",
"policy_detail": "blocks_all_ai",
"ai_openness_score": 10.0,
"policy_completeness": "moderate",
"confidence": 0.7,
"ai_bots_blocked_count": 29,
"ai_bots_allowed_count": 0,
"ai_bots": {
"GPTBot": "blocked",
"ClaudeBot": "blocked",
"CCBot": "blocked"
// ... all 29 bots
},
"all_bots": {
"GPTBot": {
"status": "blocked",
"is_ai_bot": true,
"allowed_paths": [],
"disallowed_paths": ["/"]
}
// ... all bots including non-AI
},
"crawl_rules": {
"crawl_delays": {},
"blocked_paths": [],
"sitemaps_count": 10,
"directives_count": 325,
"has_wildcard_block": false
},
"use_case_policies": {
"training": "blocked",
"search": "blocked",
"inference": "blocked"
},
"conflicts": ["robots_blocks_ai_but_tos_allows_or_silent"],
"tos_url": "https://nytimes.com/terms",
"tos_ai_stance": "silent",
"cdn_provider": "fastly",
"cms": "wordpress",
"has_paywall": false,
"behind_cloudflare": false,
"markdown_for_agents": false,
"parsed_at": "2026-02-12T23:33:52.922662+00:00"
// ... additional fields
}Errors
The confidence field (0.0–1.0) indicates how reliably Maango could determine the domain's AI policy. Low values (below 0.5) typically mean the site was partially unreachable, had ambiguous signals, or conflicting directives. Use it to decide whether to trust the stance or fall back to a default policy.
/v1/searchSearch domains by name with optional stance filter.
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
| q | string | Yes | Search query (min 2 characters) |
| stance | string | No | Filter by stance: blocks_all_ai, selective, allows_all, no_policy, blocks_training |
| limit | integer | No | Results per page, 1-100 (default: 20) |
| offset | integer | No | Pagination offset (default: 0) |
Example Request
curl "https://api.maango.io/v1/search?q=news&stance=blocks_all_ai&limit=5" \ -H "Authorization: Bearer maango_sk_xxxxx"
Example Response
{
"results": [
{
"domain": "news.com.au",
"stance": "blocks_all_ai",
"detail": "blocks_all_ai",
"tranco_rank": 842
}
],
"limit": 5,
"offset": 0,
"has_more": true
}Errors
/v1/batchLook up policies for up to 25 domains in a single request.
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
| domains | string[] | Yes | Array of 2-25 domains to look up |
Example Request
curl -X POST https://api.maango.io/v1/batch \
-H "Authorization: Bearer maango_sk_xxxxx" \
-H "Content-Type: application/json" \
-d '{"domains": ["nytimes.com", "github.com", "wikipedia.org"]}'Example Response
{
"domains": [
{
"domain": "nytimes.com",
"found": true,
"stance": "blocks_all_ai",
"use_cases": {
"training": "blocked",
"search": "blocked",
"inference": "blocked"
},
"bots": {
"blocked": ["GPTBot", "ClaudeBot", "..."],
"allowed": []
}
},
{
"domain": "github.com",
"found": true,
"stance": "selective",
"use_cases": {
"training": "blocked",
"search": "allowed",
"inference": "allowed"
},
"bots": {
"blocked": ["CCBot", "GPTBot"],
"allowed": ["Googlebot"]
}
},
{
"domain": "wikipedia.org",
"found": true,
"stance": "allows_all",
"use_cases": {
"training": "allowed",
"search": "allowed",
"inference": "allowed"
},
"bots": {
"blocked": [],
"allowed": []
}
}
],
"not_found": []
}Errors
Note: /v1/compare is a backward-compatible alias for this endpoint and works identically.
Batching more than 25 domains? Split into chunks and run concurrently:
import httpx, asyncio
MAANGO_KEY = "maango_sk_xxxxx"
async def batch_lookup(domains: list[str], chunk_size: int = 25):
results = []
async with httpx.AsyncClient() as client:
chunks = [domains[i:i+chunk_size] for i in range(0, len(domains), chunk_size)]
tasks = [
client.post(
"https://api.maango.io/v1/batch",
headers={"Authorization": f"Bearer {MAANGO_KEY}", "Content-Type": "application/json"},
json={"domains": chunk}
)
for chunk in chunks
]
responses = await asyncio.gather(*tasks)
for r in responses:
results.extend(r.json().get("domains", []))
return resultsIntegration Examples
Python: Pre-flight check for a LangChain agent
Add Maango as a policy checker before every web fetch in a LangChain tool. The agent won't visit blocked domains.
import httpx
from langchain.tools import tool
MAANGO_KEY = "maango_sk_xxxxx"
async def is_domain_allowed(url: str) -> bool:
"""Check if the domain allows AI access before visiting."""
domain = url.split("//")[-1].split("/")[0].replace("www.", "")
r = await httpx.get(
f"https://api.maango.io/v1/domain/{domain}",
headers={"Authorization": f"Bearer {MAANGO_KEY}"}
)
policy = r.json()
if not policy.get("found"):
return True # Not in registry, proceed with caution
# Block if the domain blocks all AI
if policy["stance"] == "blocks_all_ai":
return False
# Check specific use case
use_cases = policy.get("use_cases", {})
if use_cases.get("inference") == "blocked":
return False
return True
@tool
async def fetch_webpage(url: str) -> str:
"""Fetch a webpage, respecting AI policies."""
if not await is_domain_allowed(url):
return f"Cannot access {url}: domain blocks AI agents."
async with httpx.AsyncClient() as client:
r = await client.get(url, follow_redirects=True)
return r.text[:5000]JavaScript: Check policy in a Node.js agent
A simple wrapper for any Node.js agent that needs to check domain policies.
const MAANGO_KEY = "maango_sk_xxxxx";
async function checkPolicy(url) {
const domain = new URL(url).hostname.replace("www.", "");
const res = await fetch(
`https://api.maango.io/v1/domain/${domain}`,
{ headers: { Authorization: `Bearer ${MAANGO_KEY}` } }
);
if (!res.ok) {
console.warn(`Maango API error: ${res.status}`);
return { allowed: true, reason: "api_error" };
}
const policy = await res.json();
if (!policy.found) {
return { allowed: true, reason: "not_in_registry" };
}
if (policy.stance === "blocks_all_ai") {
return { allowed: false, reason: policy.stance, domain };
}
if (policy.use_cases?.inference === "blocked") {
return { allowed: false, reason: "inference_blocked", domain };
}
return { allowed: true, reason: policy.stance, domain };
}
// Usage
const result = await checkPolicy("https://nytimes.com/article/...");
if (!result.allowed) {
console.log(`Skipping ${result.domain}: ${result.reason}`);
} else {
// Proceed with fetch
}Python: Batch check domains from a list
Loop through a list of URLs, check each domain, and filter out blocked ones.
import httpx
import asyncio
MAANGO_KEY = "maango_sk_xxxxx"
async def batch_check(urls: list[str]) -> dict:
"""Check a list of URLs and categorize by policy."""
allowed = []
blocked = []
unknown = []
async with httpx.AsyncClient() as client:
for url in urls:
domain = url.split("//")[-1].split("/")[0].replace("www.", "")
try:
r = await client.get(
f"https://api.maango.io/v1/domain/{domain}",
headers={"Authorization": f"Bearer {MAANGO_KEY}"}
)
policy = r.json()
if not policy.get("found"):
unknown.append({"url": url, "domain": domain})
elif policy["stance"] == "blocks_all_ai":
blocked.append({"url": url, "domain": domain, "stance": policy["stance"]})
else:
allowed.append({"url": url, "domain": domain, "stance": policy["stance"]})
except Exception as e:
unknown.append({"url": url, "domain": domain, "error": str(e)})
return {"allowed": allowed, "blocked": blocked, "unknown": unknown}
# Usage
urls = [
"https://nytimes.com/article/example",
"https://github.com/some/repo",
"https://wikipedia.org/wiki/AI",
"https://reddit.com/r/tech",
]
results = asyncio.run(batch_check(urls))
print(f"Allowed: {len(results['allowed'])}")
print(f"Blocked: {len(results['blocked'])}")
print(f"Unknown: {len(results['unknown'])}")curl: Quick domain lookup
For people who just want to check a domain from the terminal.
# Check a single domain
curl -s https://api.maango.io/v1/domain/nytimes.com \
-H "Authorization: Bearer maango_sk_xxxxx" | python3 -m json.tool
# Just get the stance
curl -s https://api.maango.io/v1/domain/nytimes.com \
-H "Authorization: Bearer maango_sk_xxxxx" | python3 -c "
import json, sys
d = json.load(sys.stdin)
print(f'{d["domain"]}: {d["stance"]}')
print(f' Training: {d["use_cases"]["training"]}')
print(f' Search: {d["use_cases"]["search"]}')
print(f' Inference:{d["use_cases"]["inference"]}')
"
# Search for news sites that block AI
curl -s "https://api.maango.io/v1/search?q=news&stance=blocks_all_ai&limit=10" \
-H "Authorization: Bearer maango_sk_xxxxx" | python3 -m json.toolBatch compare policies across competitors
Use the /batch endpoint to see how different sites in the same industry handle AI (up to 25 domains per request).
import httpx
MAANGO_KEY = "maango_sk_xxxxx"
def compare_industry(domains: list[str]):
"""Compare AI policies across a set of competing domains."""
r = httpx.post(
"https://api.maango.io/v1/batch",
headers={
"Authorization": f"Bearer {MAANGO_KEY}",
"Content-Type": "application/json"
},
json={"domains": domains}
)
data = r.json()
print(f"{'Domain':<25} {'Stance':<20} {'Training':<12} {'Search':<12} {'Inference'}")
print("-" * 80)
for d in data["domains"]:
uc = d.get("use_cases", {})
print(f"{d['domain']:<25} {d['stance']:<20} {uc.get('training', '?'):<12} {uc.get('search', '?'):<12} {uc.get('inference', '?')}")
if data.get("not_found"):
print(f"\nNot in registry: {', '.join(data['not_found'])}")
# Compare major news sites
compare_industry([
"nytimes.com",
"washingtonpost.com",
"theguardian.com",
"bbc.com",
"reuters.com"
])
# Compare social media platforms
compare_industry([
"twitter.com",
"facebook.com",
"reddit.com",
"linkedin.com",
"tiktok.com"
])Error Format
All errors return a consistent JSON format:
{
"error": "rate_limit_exceeded",
"message": "Rate limit exceeded. Try again in 23 seconds.",
"retry_after": 23,
"limit_type": "minute"
}| Code | Error | Description |
|---|---|---|
| 400 | invalid_domain | The domain format is invalid |
| 400 | invalid_params | Request parameters are invalid |
| 401 | unauthorized | Missing or invalid API key |
| 429 | rate_limit_exceeded | Rate limit hit (includes retry_after) |
| 500 | internal_error | Server error |
Questions? Reach out to contact@maango.io