Guide to AI User Agents

AI assistants now act like your most important site visitors. When someone asks a question in ChatGPT, Perplexity, Gemini, or Meta AI, those platforms dispatch bots to retrieve, parse, and cite content in real time. If they can’t access or understand your pages, you won’t be cited—and you’ll miss high‑intent demand.

This guide explains which AI user agents to allow, which to consider blocking, and the practical steps to make your site machine‑readable and citable in AI answers. It also includes a checklist you can use with your dev team and features to look for in an AI visibility tool.

tl;dr — Allowlist these user agents to be cited and get traffic from AI platforms

Platform robots.txt identifier Example User-Agent header
ChatGPT ChatGPT-User Mozilla/5.0 ...; compatible; ChatGPT-User/1.0; +https://openai.com/bot
Meta AI meta-externalagent meta-externalfetcher facebookexternalhit facebookexternalhit/1.1; meta-externalagent/1.1; meta-externalfetcher/1.1
Perplexity PerplexityBot Mozilla/5.0 ...; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot
Google AI Overviews Googlebot Mozilla/5.0 ...; compatible; Googlebot/2.1; +http://www.google.com/bot.html
Google Gemini Googlebot-extended Uses Googlebot UA; control via robots.txt

Also allow standard search indexers (Googlebot, Bingbot) so your content is discoverable.

The goal has changed: Be citable, not just findable

AI traffic isn’t one thing. Different bots do different jobs:

Optimizing for retrieval should come first: it’s where real‑time citations and conversions originate.

How to make your website show up in AI platforms (checklist)

Use this as a practical, non-nested checklist with your web, SEO, and security teams.

If you want automated checks and ongoing visibility, see how Scrunch’s Monitoring & Insights helps you track citations and AI bot activity across ChatGPT, Perplexity, Gemini, and more: Monitoring & Insights. To deliver AI‑optimized versions of key pages without redesigning your site, learn about AXP.

What user agents is it safe to block? How to stop AI training without losing citations

If content monetization isn’t your primary revenue stream, consider allowing at least some training access to evergreen brand and product information to improve long‑term representation. If you prefer to limit training while preserving citations:

How to allow or block AI user agents

There are two main control points.

User-agent: ChatGPT-User
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: meta-externalagent
Allow: /

User-agent: meta-externalfetcher
Allow: /

User-agent: facebookexternalhit
Allow: /

User-agent: Googlebot
Allow: /

User-agent: Googlebot-extended
Allow: /              # Consider risks before disallowing

User-agent: GPTBot
Disallow: /           # Optional: block training

User-agent: ClaudeBot
Disallow: /           # Optional: block training

User-agent: CCBot
Disallow: /           # Optional: block training

Note: Some platforms may fetch a specific URL provided by a user and bypass robots.txt in those user‑initiated contexts.

User agents and JavaScript or dynamic content

Unlike Googlebot, most AI bots do not execute JavaScript. If your key information (pricing, packaging, feature lists, FAQs) only renders client‑side, it likely won’t be seen or cited. Prefer server‑rendered HTML or provide static HTML fallbacks.

User agent reference

What to look for in an AI search visibility tool

When evaluating tools for SEO/content in an AI‑first world, prioritize capabilities that map to the new goal—citability in live AI answers:

If you need these capabilities out of the box, explore Scrunch’s Monitoring & Insights and the AI‑friendly delivery option via AXP.

How to allow or block: quick testing steps

By making your content accessible, structured, and fast for AI retrievers—and by measuring citability instead of just clicks—you’ll meet customers where decisions now begin.