ChatGPT retrieves roughly 37 pages for every search prompt. It cites about 5 or 6 of them. The other 31 get evaluated and discarded. An AirOps analysis of 548,534 pages across 15,000 prompts found that only 15% of retrieved content makes it into the final answer. The gap between "found" and "cited" is where most brands lose.
The industry calls the practice of closing that gap GEO (Generative Engine Optimization), AEO (Answer Engine Optimization), LLM SEO, or AI SEO. Different names, same work. This post walks through the actual pipeline so you can see where your content is likely getting filtered out.
The four stages of a ChatGPT search
When someone types "best sunscreen for oily skin" into ChatGPT, the system does not just Google it and summarize the top results. It runs a retrieval-augmented generation (RAG) pipeline with four distinct stages. Your content can fail at any of them.
Stage 1: Query decomposition
ChatGPT doesn't search for your question verbatim. It breaks it into sub-queries. "Best sunscreen for oily skin" might become "sunscreen ingredients for acne-prone skin," "SPF 50 vs SPF 30 for Indian climate," and "dermatologist-recommended sunscreens oily skin reviews."
The same AirOps study found that 89.6% of prompts triggered two or more follow-up searches. The original 15,000 prompts expanded to 43,233 queries. That is nearly a 3x multiplier. And 95% of these fan-out queries had zero traditional search volume, meaning they don't show up in any keyword research tool.
This matters because a third of all cited pages were discovered through these fan-out queries, not the original prompt. If your content only targets the obvious keyword ("best sunscreen"), you miss the sub-queries where ChatGPT is actually finding its sources.
Stage 2: Retrieval
For each sub-query, ChatGPT pulls pages from web indexes. The system converts content into vector embeddings (numerical representations of meaning) and matches pages by semantic similarity, not just keyword matching. A page about "lightweight moisturizers with sun protection for humid climates" can match a query about "best sunscreen for oily skin" even without those exact words.
This stage is where most brands assume the process stops. It doesn't. Getting retrieved is table stakes. Being in the pool of 37 pages means nothing if you get cut in the next stage.
Stage 3: Ranking
This is the stage that determines citation. ChatGPT evaluates retrieved pages across several signals:
Relevance. Does the page actually answer the decomposed sub-query? A page about CRM features in general won't score as well as one comparing CRM options for the specific company size the user asked about.
Authority. An SE Ranking study of 129,000 domains across 216,524 pages found that referring domains are the single strongest predictor of ChatGPT citation. Sites with over 350,000 referring domains averaged 8.4 citations per study period. Sites with under 2,500 averaged 1.6 to 1.8.
Content structure. The same study found that fact-dense pages with statistical data points, expert quotes, and well-organized sections consistently outperformed pages without them by significant margins.
Freshness. ChatGPT consistently favors newer content over the pages Google ranks for the same queries. Stale content gets retrieved but not cited. The freshness dynamics vary significantly by engine.
This is the stage where content format is decisive. Your page might have the right information, but if it's buried in marketing copy without statistics, without clear headings, without verifiable claims, the ranking model scores it below a competitor's page that has all of those.
Stage 4: Citation selection
After ranking, ChatGPT synthesizes an answer from the top-scoring pages and attaches citations. But even here, not every high-ranking page gets cited. The model cites pages that contributed a specific, extractable fact to the response. If your page was used for general context but a competitor's page provided the specific data point, the competitor gets the footnote.
Research on ChatGPT's citation patterns found that citations pull disproportionately from the opening sections of a page. If your answer is buried below marketing copy, it probably will not get cited even if it is the best answer on the page.
What this looks like in practice
We audited a mid-size preschool chain that had comprehensive, accurate content about their programs. Their domain authority was solid. They ranked on page one of Google for most of their target keywords.
ChatGPT retrieved their pages on 8 out of 10 target queries. It cited them on 2. The other 6 times, the citation went to an education aggregator or a parenting blog that had fewer words but better structure for answering the questions parents were asking AI.
The school's pages opened with brand messaging. The aggregator's pages opened with direct answers to the query. One format gets cited. The other gets retrieved and discarded. The pattern is consistent across industries: the brand has the expertise but the wrong content format for how AI retrieval works.
Why Google rank doesn't predict ChatGPT citation
The RAG pipeline explains why pages at Google position #1 have only a 43.2% ChatGPT citation rate. Google ranking gets you past Stage 2 (retrieval). It does not get you past Stage 3 (ranking) or Stage 4 (citation selection), which use different signals entirely: fact density, answer-first formatting, structured sections, and freshness instead of backlinks and keyword optimization.
Each AI engine compounds this divergence. The RAG architecture is similar across engines, but the retrieval indexes and ranking signals differ. A page that passes Stage 3 on one engine can fail on another, which is why CiteGap audits test across ChatGPT, Google AI, and Claude independently.
The fan-out problem most brands miss
Fan-out query expansion is the least understood part of the pipeline. A third of cited pages in the AirOps dataset were discovered only through fan-out queries, not the original prompt. And 95% of those fan-out queries had zero search volume in traditional keyword tools. Your keyword research doesn't cover them. Your SEO strategy doesn't target them.
The pages that win fan-out citations tend to be comprehensive answer pages that cover pricing tiers, feature comparisons for specific segments, integration details, and implementation timelines on a single URL. This is why aggregators consistently beat brand sites in AI citation: their content naturally answers the sub-queries that brands never think to target.
Three things to take away
Retrieval is not citation. Your page being found by ChatGPT means nothing if it gets filtered out at the ranking stage. The 85% discard rate means most content that enters the pipeline doesn't survive it.
Content format is the bottleneck. The ranking stage weighs structure, fact density, freshness, and answer-first formatting. If your pages lack these, they will consistently lose to competitors who have them, regardless of domain authority.
Fan-out queries are invisible but decisive. A third of citations come from sub-queries that no keyword tool tracks. The only way to know if you're winning or losing these is to test your actual queries against the engines and see who gets cited instead of you.
FAQ
How many pages does ChatGPT retrieve per search query? An AirOps analysis of 15,000 prompts found ChatGPT retrieved 548,534 pages total, roughly 37 per prompt. Only 15% of those retrieved pages made it into the final cited response. The gap between retrieval and citation is where content structure and fact density determine the outcome.
What is a fan-out query in ChatGPT search? Fan-out queries are the sub-queries ChatGPT generates internally when processing your prompt. 89.6% of prompts trigger two or more follow-up searches, expanding a single question into 2-3 sub-queries. A third of all cited pages are discovered through these fan-out queries, and 95% of them have zero traditional search volume.
Does ranking #1 on Google guarantee ChatGPT will cite my page? No. Pages at Google position #1 have a 43.2% ChatGPT citation rate. Google ranking helps with retrieval (Stage 2) but not with ChatGPT's internal ranking (Stage 3), which weighs different signals than Google's algorithm.
What content signals make ChatGPT more likely to cite a page? Fact-dense pages with verifiable data points, expert quotes, and well-structured sections consistently outperform pages without them. Citations pull disproportionately from the opening sections of a page, so answer-first content is cited more than content that buries the answer below marketing copy.
Is the RAG pipeline the same across all AI search engines? The architecture is similar (retrieve, rank, synthesize, cite) but the retrieval indexes and ranking signals differ across engines. Each engine has its own source preferences, which is why visibility on one engine tells you nothing about the others. Optimizing for one does not guarantee visibility on another.
CiteGap tests your content against the full RAG pipeline across ChatGPT, Google AI, and Claude, showing you exactly where in the process your pages get filtered out. Request a consultation.