AI now mediates a growing share of discovery. Users ask questions in ChatGPT, Perplexity, and Google AI Overviews—and those agents crawl and summarize your site in real time. If they can’t see or understand your content, you won’t be cited, summarized, or recommended.
Below are the five foundational problems we see most often, followed by practical ways to evaluate tools, measure your AI visibility, and run an AI-first site audit.
AI search platforms start with traditional indexes (Google, Bing) and then retrieve pages on demand. We frequently see firewalls, anti-bot tools, or robots.txt rules that inadvertently block real‑time AI retrievers (while sometimes allowing training crawlers you actually meant to block).
What to do: - Review robots.txt, WAF, CDN, and anti‑bot rules for AI user agents used for real‑time retrieval. - Allow retrieval bots you want, and separately manage training crawlers. - Validate by running branded and unbranded questions in ChatGPT and Perplexity to see if they can cite your pages. - For a reference list of major agents, see the Scrunch guide to AI user agents.
Most AI retrievers read server‑sent HTML only. If your key content renders via JavaScript, agents won’t see it. SPAs and app‑shell patterns are the usual culprits.
What to do: - Ensure meaningful content is present in server responses (pre‑render/SSR), even if you enhance with JS later. - Test pages with JS disabled (DevTools → Disable JavaScript) and compare to the human view. - If a full replatform isn’t feasible, consider serving simplified, agent‑optimized HTML to known AI user agents at the edge/CDN while keeping your human experience unchanged. If you need help with targeting and delivery, see Scrunch AXP (Agent Experience Platform).
AI search is still primarily driven by text. Pages that are only videos, diagrams, or audio players usually won’t be usable by agents. Conversely, extremely long single pages can dilute signal or overflow practical token budgets during retrieval.
What to do: - Add transcripts, captions, and concise prose summaries to media pages. - Aim for information‑dense, well‑scoped pages. Break ultra‑long content into logical sections or subpages. - Avoid pagination of a single article solely for AI—agents typically fetch one URL per source.
Even with SSR, content can get mangled by complex markup. Heavily styled tables, nested divs, or visual builders may collapse into jumbled text when stripped to plain HTML/Markdown—leading to misread pricing, hours, or plan details.
What to do: - Inspect server‑sent HTML (with JS off). Use your browser’s Reader Mode as a quick proxy for agent view. - Favor clean headings, lists, and simple tables. Ensure critical facts are readable linearly in the source, not just visually. - For highly structured info (e.g., pricing), repeat essentials in prose (e.g., a short “Pricing at a glance” FAQ).
JSON‑LD and schema help classic SEO, but AI agents still rely primarily on unstructured text. If key facts exist only in schema, don’t expect reliable AI answers.
What to do: - Mirror critical structured data (pricing, availability, specs, bios, locations, hours) in the on‑page text. - Keep schema, but make sure the page itself communicates the same information clearly.
If you’re evaluating platforms to monitor and improve how agents see your brand, prioritize:
Query testing for branded and unbranded questions with share‑of‑answer visibility.
First‑party evidence
Detection of WAF/robots misconfigurations blocking real‑time retrieval.
AI‑focused site auditing
Verification that critical schema facts exist in readable text.
Citation and answer quality insights
Mapping of citations to canonical URLs (avoid AI‑specific or sessionized links).
Experimentation and controls
Change tracking, A/B tests for agent experiences, and alerts when visibility shifts.
Reporting and workflow
If you want to see how this looks in practice, Scrunch offers Monitoring & Insights and an edge‑layer delivery approach via AXP.
Use both first‑party data and in‑agent testing:
Look for red flags: terms pages over‑fetched, high‑rank pages never retrieved, or spikes/drops tied to releases.
Structured agent testing
Track which sources are cited, what’s summarized, and whether facts match your pages.
Share‑of‑answer and coverage
Measure the percentage of answers that include you, how prominently, and for which queries and intents.
Page‑level AI previews
Compare the “AI view” (server‑sent text) to the human view. If your must‑know facts don’t show up clearly, fix the page.
Change impact analysis
A modern AI‑focused audit emphasizes how agents actually consume content:
Inventory what’s visible with JS off. Flag pages missing core content without JS.
Retrieval performance under time budgets
Ensure content is available fast in the first response (no heavy client hydration gates).
Content clarity and density
Reduce chrome/noise ahead of main content. Favor scannable headings and succinct, factual prose.
Markup resilience
Validate that tables, pricing, hours, specs, and FAQs survive HTML→text normalization without losing meaning.
Schema redundancy
Confirm that key schema facts are also present in human‑readable text.
Bot access controls
Review robots.txt, WAF/CDN rules for real‑time retrieval agents vs. training crawlers.
URL hygiene and consistency
Ensure citations resolve to stable, canonical URLs (not previews, sessionized, or AI‑only paths).
Experience split‑path (optional)
Pull HTTP logs. Identify AI user agents and what they fetch. Fix obvious robots/WAF blocks.
Week 2: Fix rendering and clarity
Ensure SSR for key pages. Trim excessive chrome before main content. Add text alternatives to media pages.
Week 3: Strengthen answers
Add/expand FAQs. Restate pricing, specs, and policies in simple prose. Mirror schema facts in text.
Week 4: Control and measure
Think of AI agents as high‑volume, no‑nonsense readers. If your pages are fast, clear, text‑forward, and easy to parse in server‑sent HTML—and you can verify access, measure results, and iterate—you’ll show up more often and be quoted more accurately in AI answers.