Platforms like Perplexity, ChatGPT (Search), and Google Gemini don’t just "read" pages; they ingest them into RAG (Retrieval-Augmented Generation) systems. These systems chop your content into "chunks" to generate answers.

By Collins • December 1, 2025
For the last decade, SEO was primarily a linguistic game. You wrote words to match the keywords users typed. But in the age of Answer Engines (AEO), the game has shifted from matching keywords to feeding data pipelines.
Platforms like Perplexity, ChatGPT (Search), and Google Gemini don’t just "read" pages; they ingest them into RAG (Retrieval-Augmented Generation) systems. These systems chop your content into "chunks" to generate answers.
Here is the problem: Text is easy for AI to digest. Everything else is hard.
If your most valuable data—your pricing comparisons, technical specs, and proprietary research—is locked inside a PNG image or a flat PDF, you are effectively invisible to the AI models of 2025. This post outlines the technical framework for optimizing non-text assets to ensure they aren’t just seen, but cited.
For years, designers loved using screenshots of Excel tables because they looked consistent across devices. For AEO, this is a disaster.
Recent research into "HtmlRAG" suggests that preserving HTML structure is critical for LLM comprehension. When an AI scrapes a page, it often converts content into plain text. If your data is in a standard semantic HTML <table>, the relationships between rows (e.g., "Pro Plan") and columns (e.g., "$99/mo") are preserved. If it’s a <div> soup or an image, that relationship breaks.
The Data-Backed Reality:
LLMs struggle with "spatial" reasoning in plain text. If you present a pricing matrix as a screenshot, a vision model might read it, but it is computationally expensive and prone to hallucinations (error rates in unstructured data extraction can hover around 30-40% without schema).
The Fix:
<table>, <tr>, and <th> tags. Avoid using CSS grid or Flexbox to visually mimic tables without the underlying semantic structure.<caption> tag immediately following the <table> tag. This acts as a "title" for the data chunk, helping the RAG system retrieve the table when a user asks a relevant question.rowspan or colspan attributes confuse parsers. Keep data flat and simple where possible.
PDFs are the "black holes" of the internet. While Google has been indexing them for years, LLMs find them notoriously difficult to process reliably.
In a typical RAG pipeline, a PDF is converted into text before being analyzed. This conversion often strips away headers, footers, and layout logic, mashing distinct sections together. This leads to what engineers call "Context Window Contamination"—where the AI mixes up data from Page 1 (Executive Summary) with data from Page 50 (Appendix), leading to inaccurate citations.
The "Recursive" Strategy:
To get your whitepapers cited by Perplexity or ChatGPT, you need to adopt a Recursive Retrieval strategy:
FAQPage schema to explicitly state the core questions your PDF answers.Pro Tip: If your PDF contains a killer chart, extract the data points and list them as bullet points on the download page. Give the AI the "answer key" so it doesn't have to guess.
With the rise of GPT and Gemini , search is becoming multimodal. These models can "see" images, but they still rely on text signals to find them first.
If you have a proprietary infographic (e.g., "The 2026 Marketing Funnel"), you want the AI to cite your brand when a user asks, "Show me a diagram of a modern marketing funnel."
How to Win Visual Citations:
marketing-funnel-2026-lantern.jpg tells the AI exactly what the image is before it even processes the pixels. IMG_592.jpg tells it nothing.ImageObject schema. Crucially, include the license and creator fields. This is the primary signal AI engines use to attribute the source of a visual.The content that wins in 2026 will be content that answers three simultaneous demands:
For too long, these demands were in tension. Optimizing for one meant compromising the other. In the multimodal, RAG-driven era, they align perfectly.
By optimizing your data tables, tagging your PDFs, and enriching your images with proper schema, you're not "gaming" the system. You're making it easier for AI to understand what you know, and easier for your audience to find you when they need that knowledge.
The brands that master multimodal, machine-readable content will define the next era of search.