Back to Overview

API Reference

Beta

Base URL: https://api.maango.io

Beta: This API is in active development. Endpoints and response shapes may change. If something looks off, let us know at contact@maango.io.

All endpoints except /v1/auth/signup require an Authorization: Bearer header.

Quick Start

Get up and running in under a minute.

1

Get your API key

Or use curl:

curl -X POST https://api.maango.io/v1/auth/signup \
  -H "Content-Type: application/json" \
  -d '{"email": "you@example.com"}'
2

Make your first call

curl -H "Authorization: Bearer YOUR_KEY" \
  https://api.maango.io/v1/domain/nytimes.com

Response:

{
  "domain": "nytimes.com",
  "found": true,
  "stance": "blocks_all_ai",
  "use_cases": {
    "training": "blocked",
    "search": "blocked",
    "inference": "blocked"
  },
  "bots": { "blocked": ["GPTBot", "ClaudeBot", "...29 total"], "allowed": [] },
  "signals": { "robots_txt": true, "llms_txt": false, "ai_txt": false, "tdmrep": false, "content_signals": false, "agents_json": false }
}
3

Integrate into your agent

import httpx

MAANGO_KEY = "maango_sk_xxxxx"

async def check_before_visit(url: str) -> bool:
    domain = url.split("//")[-1].split("/")[0].replace("www.", "")
    r = await httpx.get(
        f"https://api.maango.io/v1/domain/{domain}",
        headers={"Authorization": f"Bearer {MAANGO_KEY}"}
    )
    policy = r.json()
    if not policy.get("found"):
        return True  # Unknown domain, proceed with caution
    if policy["stance"] == "blocks_all_ai":
        return False  # Blocked, don't visit
    return True

Endpoints

POST/v1/auth/signup

Get an API key. No authentication required.

Parameters

NameTypeRequiredDescription
emailstringYesYour email address

Example Request

curl -X POST https://api.maango.io/v1/auth/signup \
  -H "Content-Type: application/json" \
  -d '{"email": "developer@example.com"}'

Example Response

{
  "api_key": "maango_sk_a1b2c3d4e5f6...",
  "email": "developer@example.com",
  "limits": {
    "per_minute": 200,
    "per_day": 10000,
    "per_month": 100000
  },
  "message": "Store this key securely. It will not be shown again."
}

Errors

400Invalid email format
429Too many signups from this IP (limit: 3 per day)
GET/v1/domain/{domain}

Check any domain's AI policy. The primary endpoint.

Requires authentication

Parameters

NameTypeRequiredDescription
domainstringYesThe domain to look up (e.g. nytimes.com)

Example Request

curl https://api.maango.io/v1/domain/nytimes.com \
  -H "Authorization: Bearer maango_sk_xxxxx"

Example Response

{
  "domain": "nytimes.com",
  "found": true,
  "stance": "blocks_all_ai",
  "detail": "blocks_all_ai",
  "use_cases": {
    "training": "blocked",
    "search": "blocked",
    "inference": "blocked"
  },
  "bots": {
    "blocked": [
      "CCBot", "GPTBot", "Scrapy", "YouBot", "BLEXBot",
      "Diffbot", "Timpibot", "ClaudeBot", "Omgilibot",
      "cohere-ai", "omgilibot", "Bytespider", "Claude-Web",
      "Claude-User", "FacebookBot", "TurnitinBot", "ChatGPT-User",
      "ImagesiftBot", "anthropic-ai", "DataForSeoBot",
      "DuckAssistBot", "OAI-SearchBot", "PerplexityBot",
      "magpie-crawler", "Google-Extended", "Claude-SearchBot",
      "Applebot-Extended", "meta-externalagent",
      "meta-externalfetcher"
    ],
    "allowed": []
  },
  "signals": {
    "robots_txt": true,
    "llms_txt": false,
    "ai_txt": false,
    "tdmrep": false,
    "content_signals": false,
    "agents_json": false
  },
  "site": {
    "tranco_rank": 155,
    "cdn": "fastly",
    "cms": "wordpress",
    "behind_cloudflare": false,
    "has_paywall": false,
    "has_tos": true,
    "tos_url": "https://nytimes.com/terms"
  },
  "last_updated": "2026-02-27T03:19:29Z"
}

Errors

400Invalid domain format
401Missing or invalid API key
429Rate limit exceeded

Domain not found? If the domain isn't in the registry, you'll get "found": false, "queued": true. The domain is queued for analysis and will be available shortly. In your code, treat "found": false as "unknown" — either allow with caution or block depending on your use case.

Data freshness: The last_updated field shows when the domain was last crawled. Maango re-crawls the full dataset regularly. If a domain's policy is older than 30 days, consider rechecking — policies can change as publishers update their robots.txt or adopt new standards.

Stance → Detail mapping

Each stance can have multiple detail values. Use stance for broad decisions and detail for fine-grained logic:

StancePossible detail values
blocks_all_aiblocks_all_ai, wildcard_block
blocks_trainingblocks_training
selectiveselective
allows_allallows_all, has_llms_txt
no_policyno_ai_rules, no_robots, has_signals, robots_blocked, unreachable

The detail field

A more granular classification than stance. Possible values:

ValueDescription
blocks_all_aiExplicitly blocks all AI bots
blocks_trainingBlocks training use specifically
selectiveAllows some bots/use-cases, blocks others
allows_allExplicitly allows AI access
no_ai_rulesHas robots.txt but no AI-specific directives
has_llms_txtNo AI bot rules but provides an llms.txt file
has_signalsNo AI bot rules but has other signals (content signals, meta tags)
no_robotsNo robots.txt file found at all
robots_blockedrobots.txt fetch was blocked (e.g. Cloudflare 403)
wildcard_blockBlocks all bots with a wildcard rule, not AI-specific
unreachableDomain couldn't be reached during crawl
GET/v1/domain/{domain}/full

Full raw crawl data for a domain. Returns all JSONB fields parsed including raw robots.txt content, meta tags, headers, and ToS snippets.

Requires authentication

Parameters

NameTypeRequiredDescription
domainstringYesThe domain to look up

Example Request

curl https://api.maango.io/v1/domain/nytimes.com/full \
  -H "Authorization: Bearer maango_sk_xxxxx"

Example Response

{
  "id": 3561,
  "domain": "nytimes.com",
  "robots_txt_exists": true,
  "llms_txt_exists": false,
  "ai_txt_exists": false,
  "tdmrep_exists": false,
  "agents_json_exists": false,
  "ai_stance": "blocks_all_ai",
  "policy_detail": "blocks_all_ai",
  "ai_openness_score": 10.0,
  "policy_completeness": "moderate",
  "confidence": 0.7,
  "ai_bots_blocked_count": 29,
  "ai_bots_allowed_count": 0,
  "ai_bots": {
    "GPTBot": "blocked",
    "ClaudeBot": "blocked",
    "CCBot": "blocked"
    // ... all 29 bots
  },
  "all_bots": {
    "GPTBot": {
      "status": "blocked",
      "is_ai_bot": true,
      "allowed_paths": [],
      "disallowed_paths": ["/"]
    }
    // ... all bots including non-AI
  },
  "crawl_rules": {
    "crawl_delays": {},
    "blocked_paths": [],
    "sitemaps_count": 10,
    "directives_count": 325,
    "has_wildcard_block": false
  },
  "use_case_policies": {
    "training": "blocked",
    "search": "blocked",
    "inference": "blocked"
  },
  "conflicts": ["robots_blocks_ai_but_tos_allows_or_silent"],
  "tos_url": "https://nytimes.com/terms",
  "tos_ai_stance": "silent",
  "cdn_provider": "fastly",
  "cms": "wordpress",
  "has_paywall": false,
  "behind_cloudflare": false,
  "markdown_for_agents": false,
  "parsed_at": "2026-02-12T23:33:52.922662+00:00"
  // ... additional fields
}

Errors

400Invalid domain format
401Missing or invalid API key
429Rate limit exceeded

The confidence field (0.0–1.0) indicates how reliably Maango could determine the domain's AI policy. Low values (below 0.5) typically mean the site was partially unreachable, had ambiguous signals, or conflicting directives. Use it to decide whether to trust the stance or fall back to a default policy.

POST/v1/batch

Look up policies for up to 25 domains in a single request.

Requires authentication

Parameters

NameTypeRequiredDescription
domainsstring[]YesArray of 2-25 domains to look up

Example Request

curl -X POST https://api.maango.io/v1/batch \
  -H "Authorization: Bearer maango_sk_xxxxx" \
  -H "Content-Type: application/json" \
  -d '{"domains": ["nytimes.com", "github.com", "wikipedia.org"]}'

Example Response

{
  "domains": [
    {
      "domain": "nytimes.com",
      "found": true,
      "stance": "blocks_all_ai",
      "use_cases": {
        "training": "blocked",
        "search": "blocked",
        "inference": "blocked"
      },
      "bots": {
        "blocked": ["GPTBot", "ClaudeBot", "..."],
        "allowed": []
      }
    },
    {
      "domain": "github.com",
      "found": true,
      "stance": "selective",
      "use_cases": {
        "training": "blocked",
        "search": "allowed",
        "inference": "allowed"
      },
      "bots": {
        "blocked": ["CCBot", "GPTBot"],
        "allowed": ["Googlebot"]
      }
    },
    {
      "domain": "wikipedia.org",
      "found": true,
      "stance": "allows_all",
      "use_cases": {
        "training": "allowed",
        "search": "allowed",
        "inference": "allowed"
      },
      "bots": {
        "blocked": [],
        "allowed": []
      }
    }
  ],
  "not_found": []
}

Errors

400Less than 2 or more than 25 domains provided
401Missing or invalid API key
429Rate limit exceeded

Note: /v1/compare is a backward-compatible alias for this endpoint and works identically.

Batching more than 25 domains? Split into chunks and run concurrently:

import httpx, asyncio

MAANGO_KEY = "maango_sk_xxxxx"

async def batch_lookup(domains: list[str], chunk_size: int = 25):
    results = []
    async with httpx.AsyncClient() as client:
        chunks = [domains[i:i+chunk_size] for i in range(0, len(domains), chunk_size)]
        tasks = [
            client.post(
                "https://api.maango.io/v1/batch",
                headers={"Authorization": f"Bearer {MAANGO_KEY}", "Content-Type": "application/json"},
                json={"domains": chunk}
            )
            for chunk in chunks
        ]
        responses = await asyncio.gather(*tasks)
        for r in responses:
            results.extend(r.json().get("domains", []))
    return results

Integration Examples

Python: Pre-flight check for a LangChain agent

Add Maango as a policy checker before every web fetch in a LangChain tool. The agent won't visit blocked domains.

import httpx
from langchain.tools import tool

MAANGO_KEY = "maango_sk_xxxxx"

async def is_domain_allowed(url: str) -> bool:
    """Check if the domain allows AI access before visiting."""
    domain = url.split("//")[-1].split("/")[0].replace("www.", "")
    r = await httpx.get(
        f"https://api.maango.io/v1/domain/{domain}",
        headers={"Authorization": f"Bearer {MAANGO_KEY}"}
    )
    policy = r.json()

    if not policy.get("found"):
        return True  # Not in registry, proceed with caution

    # Block if the domain blocks all AI
    if policy["stance"] == "blocks_all_ai":
        return False

    # Check specific use case
    use_cases = policy.get("use_cases", {})
    if use_cases.get("inference") == "blocked":
        return False

    return True

@tool
async def fetch_webpage(url: str) -> str:
    """Fetch a webpage, respecting AI policies."""
    if not await is_domain_allowed(url):
        return f"Cannot access {url}: domain blocks AI agents."

    async with httpx.AsyncClient() as client:
        r = await client.get(url, follow_redirects=True)
        return r.text[:5000]

JavaScript: Check policy in a Node.js agent

A simple wrapper for any Node.js agent that needs to check domain policies.

const MAANGO_KEY = "maango_sk_xxxxx";

async function checkPolicy(url) {
  const domain = new URL(url).hostname.replace("www.", "");

  const res = await fetch(
    `https://api.maango.io/v1/domain/${domain}`,
    { headers: { Authorization: `Bearer ${MAANGO_KEY}` } }
  );

  if (!res.ok) {
    console.warn(`Maango API error: ${res.status}`);
    return { allowed: true, reason: "api_error" };
  }

  const policy = await res.json();

  if (!policy.found) {
    return { allowed: true, reason: "not_in_registry" };
  }

  if (policy.stance === "blocks_all_ai") {
    return { allowed: false, reason: policy.stance, domain };
  }

  if (policy.use_cases?.inference === "blocked") {
    return { allowed: false, reason: "inference_blocked", domain };
  }

  return { allowed: true, reason: policy.stance, domain };
}

// Usage
const result = await checkPolicy("https://nytimes.com/article/...");
if (!result.allowed) {
  console.log(`Skipping ${result.domain}: ${result.reason}`);
} else {
  // Proceed with fetch
}

Python: Batch check domains from a list

Loop through a list of URLs, check each domain, and filter out blocked ones.

import httpx
import asyncio

MAANGO_KEY = "maango_sk_xxxxx"

async def batch_check(urls: list[str]) -> dict:
    """Check a list of URLs and categorize by policy."""
    allowed = []
    blocked = []
    unknown = []

    async with httpx.AsyncClient() as client:
        for url in urls:
            domain = url.split("//")[-1].split("/")[0].replace("www.", "")
            try:
                r = await client.get(
                    f"https://api.maango.io/v1/domain/{domain}",
                    headers={"Authorization": f"Bearer {MAANGO_KEY}"}
                )
                policy = r.json()

                if not policy.get("found"):
                    unknown.append({"url": url, "domain": domain})
                elif policy["stance"] == "blocks_all_ai":
                    blocked.append({"url": url, "domain": domain, "stance": policy["stance"]})
                else:
                    allowed.append({"url": url, "domain": domain, "stance": policy["stance"]})
            except Exception as e:
                unknown.append({"url": url, "domain": domain, "error": str(e)})

    return {"allowed": allowed, "blocked": blocked, "unknown": unknown}

# Usage
urls = [
    "https://nytimes.com/article/example",
    "https://github.com/some/repo",
    "https://wikipedia.org/wiki/AI",
    "https://reddit.com/r/tech",
]

results = asyncio.run(batch_check(urls))
print(f"Allowed: {len(results['allowed'])}")
print(f"Blocked: {len(results['blocked'])}")
print(f"Unknown: {len(results['unknown'])}")

curl: Quick domain lookup

For people who just want to check a domain from the terminal.

# Check a single domain
curl -s https://api.maango.io/v1/domain/nytimes.com \
  -H "Authorization: Bearer maango_sk_xxxxx" | python3 -m json.tool

# Just get the stance
curl -s https://api.maango.io/v1/domain/nytimes.com \
  -H "Authorization: Bearer maango_sk_xxxxx" | python3 -c "
import json, sys
d = json.load(sys.stdin)
print(f'{d["domain"]}: {d["stance"]}')
print(f'  Training: {d["use_cases"]["training"]}')
print(f'  Search:   {d["use_cases"]["search"]}')
print(f'  Inference:{d["use_cases"]["inference"]}')
"

# Search for news sites that block AI
curl -s "https://api.maango.io/v1/search?q=news&stance=blocks_all_ai&limit=10" \
  -H "Authorization: Bearer maango_sk_xxxxx" | python3 -m json.tool

Batch compare policies across competitors

Use the /batch endpoint to see how different sites in the same industry handle AI (up to 25 domains per request).

import httpx

MAANGO_KEY = "maango_sk_xxxxx"

def compare_industry(domains: list[str]):
    """Compare AI policies across a set of competing domains."""
    r = httpx.post(
        "https://api.maango.io/v1/batch",
        headers={
            "Authorization": f"Bearer {MAANGO_KEY}",
            "Content-Type": "application/json"
        },
        json={"domains": domains}
    )
    data = r.json()

    print(f"{'Domain':<25} {'Stance':<20} {'Training':<12} {'Search':<12} {'Inference'}")
    print("-" * 80)

    for d in data["domains"]:
        uc = d.get("use_cases", {})
        print(f"{d['domain']:<25} {d['stance']:<20} {uc.get('training', '?'):<12} {uc.get('search', '?'):<12} {uc.get('inference', '?')}")

    if data.get("not_found"):
        print(f"\nNot in registry: {', '.join(data['not_found'])}")

# Compare major news sites
compare_industry([
    "nytimes.com",
    "washingtonpost.com",
    "theguardian.com",
    "bbc.com",
    "reuters.com"
])

# Compare social media platforms
compare_industry([
    "twitter.com",
    "facebook.com",
    "reddit.com",
    "linkedin.com",
    "tiktok.com"
])

Error Format

All errors return a consistent JSON format:

{
  "error": "rate_limit_exceeded",
  "message": "Rate limit exceeded. Try again in 23 seconds.",
  "retry_after": 23,
  "limit_type": "minute"
}
CodeErrorDescription
400invalid_domainThe domain format is invalid
400invalid_paramsRequest parameters are invalid
401unauthorizedMissing or invalid API key
429rate_limit_exceededRate limit hit (includes retry_after)
500internal_errorServer error

Questions? Reach out to contact@maango.io