AI assistants now act like your most important site visitors. When someone asks a question in ChatGPT, Perplexity, Gemini, or Meta AI, those platforms dispatch bots to retrieve, parse, and cite content in real time. If they can’t access or understand your pages, you won’t be cited—and you’ll miss high‑intent demand.
This guide explains which AI user agents to allow, which to consider blocking, and the practical steps to make your site machine‑readable and citable in AI answers. It also includes a checklist you can use with your dev team and features to look for in an AI visibility tool.
| Platform | robots.txt identifier | Example User-Agent header |
|---|---|---|
| ChatGPT | ChatGPT-User |
Mozilla/5.0 ...; compatible; ChatGPT-User/1.0; +https://openai.com/bot |
| Meta AI | meta-externalagent meta-externalfetcher facebookexternalhit |
facebookexternalhit/1.1; meta-externalagent/1.1; meta-externalfetcher/1.1 |
| Perplexity | PerplexityBot |
Mozilla/5.0 ...; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot |
| Google AI Overviews | Googlebot |
Mozilla/5.0 ...; compatible; Googlebot/2.1; +http://www.google.com/bot.html |
| Google Gemini | Googlebot-extended |
Uses Googlebot UA; control via robots.txt |
Also allow standard search indexers (Googlebot, Bingbot) so your content is discoverable.
AI traffic isn’t one thing. Different bots do different jobs:
Optimizing for retrieval should come first: it’s where real‑time citations and conversions originate.
Use this as a practical, non-nested checklist with your web, SEO, and security teams.
Verify that these agents receive HTTP 200 (not 403/429) and aren’t blocked by geo/IP rules, login walls, or cookie consent gates.
Rendering and structure
Use clear headings (H1–H3), short paragraphs, and descriptive anchor text.
Content coverage that answers prompts
Include comparison content that neutrally explains trade‑offs in your category.
Metadata and schemas
Maintain clean Open Graph and Twitter Card tags (Meta AI often fetches these).
Sitemaps and freshness
Timestamp pages and show last‑updated in HTML to help retrieval agents trust freshness.
Performance and reliability
Avoid blocking interstitials and lazy‑loaded critical text.
Media and documents
Use alt text and captions to expose meaning where possible.
Internationalization and canonicals
Use rel=canonical and hreflang correctly to avoid duplicate/fragmented signals.
Testing and validation
Ask live questions in ChatGPT (with browsing), Perplexity, and Gemini to see if you’re cited and what text is extracted.
Monitoring and iteration
If you want automated checks and ongoing visibility, see how Scrunch’s Monitoring & Insights helps you track citations and AI bot activity across ChatGPT, Perplexity, Gemini, and more: Monitoring & Insights. To deliver AI‑optimized versions of key pages without redesigning your site, learn about AXP.
If content monetization isn’t your primary revenue stream, consider allowing at least some training access to evergreen brand and product information to improve long‑term representation. If you prefer to limit training while preserving citations:
GPTBot (OpenAI training)ClaudeBot (Anthropic training)CCBot (Common Crawl; widely used for training by third parties)
Use caution:
There are two main control points.
User-agent: ChatGPT-User
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: meta-externalagent
Allow: /
User-agent: meta-externalfetcher
Allow: /
User-agent: facebookexternalhit
Allow: /
User-agent: Googlebot
Allow: /
User-agent: Googlebot-extended
Allow: / # Consider risks before disallowing
User-agent: GPTBot
Disallow: / # Optional: block training
User-agent: ClaudeBot
Disallow: / # Optional: block training
User-agent: CCBot
Disallow: / # Optional: block training
ChatGPT-User, OAI-SearchBot (verify with OpenAI’s published IP ranges)PerplexityBot, Perplexity-User (verify with Perplexity’s IP ranges)meta-externalagent, meta-externalfetcher, facebookexternalhitGooglebot, and consider Googlebot-extendedNote: Some platforms may fetch a specific URL provided by a user and bypass robots.txt in those user‑initiated contexts.
Unlike Googlebot, most AI bots do not execute JavaScript. If your key information (pricing, packaging, feature lists, FAQs) only renders client‑side, it likely won’t be seen or cited. Prefer server‑rendered HTML or provide static HTML fallbacks.
ChatGPT-User (highest priority)OAI-SearchBot (future‑proof ChatGPT Search)GPTBot (can be blocked without affecting retrieval)OpenAI states it does not train on ChatGPT-User or OAI-SearchBot.
Meta AI
facebookexternalhit/1.1, meta-externalagent/1.1, meta-externalfetcher/1.1Behaviors are evolving; some user‑provided URL fetches may bypass robots.txt.
Perplexity
PerplexityBot (indexing and retrieval), Perplexity-UserStates bot‑collected data isn’t used for model training.
Google: Gemini and AI Overviews
Googlebot-extended (details)AI Overviews follows standard Googlebot access.
Anthropic (Claude)
ClaudeBot (docs)Claude currently doesn’t fetch web content for live queries.
Common Crawl
CCBot (docs)When evaluating tools for SEO/content in an AI‑first world, prioritize capabilities that map to the new goal—citability in live AI answers:
If you need these capabilities out of the box, explore Scrunch’s Monitoring & Insights and the AI‑friendly delivery option via AXP.
curl -A "ChatGPT-User" -I https://yourdomain.com/pricingcurl -A "PerplexityBot" -I https://yourdomain.com/faqBy making your content accessible, structured, and fast for AI retrievers—and by measuring citability instead of just clicks—you’ll meet customers where decisions now begin.