AI Skills

How to Get Cited by ChatGPT and Perplexity

Arpit TripathiArpit TripathiLinkedIn·June 20, 2026·12 min read

A concrete GEO checklist to get cited by ChatGPT and Perplexity: crawler access, extractable formats, and the citation-rate numbers.

To get cited by ChatGPT and Perplexity, do three things: let their crawlers in via robots.txt, put a 2 to 4 sentence direct answer at the top of every page, and format the body so machines can lift it out (at least one HTML table, one numbered list, a named author, a visible date, and inline statistics with linked sources). Answer engines do not quote vibes. They quote the extractable, attributable chunks of a page, so getting cited is mostly the work of making those chunks obvious. Most GEO checklists skip the two steps that decide everything: whether the crawlers can reach you at all, and whether Bing has your page, because ChatGPT search reads Bing, not Google.

This is the discipline now called Generative Engine Optimization (GEO). The term comes from a 2023 paper by researchers at Princeton, Georgia Tech, the Allen Institute for AI, and IIT Delhi, presented at ACM SIGKDD 2024. They built a large benchmark of queries and tested which page edits actually moved a source up in AI answers. The finding that matters: a handful of structural and evidentiary changes raised visibility by roughly 30 to 40 percent, while writing more marketing copy did nothing.

The rest of this guide is the checklist, in priority order. Access and indexing come first because they quietly decide whether your page is even a candidate. Format and evidence come second because they decide whether a candidate actually gets quoted. Work top to bottom and you fix the silent failures before the cosmetic ones.

Step 1: Let the crawlers in (the part most sites get wrong)

If GPTBot, ClaudeBot, and PerplexityBot cannot fetch your page, nothing else on this list matters. The most common reason a good page never gets cited is a robots.txt that blocks the AI user agents. A CDN or a privacy plugin often adds that block, and nobody notices. Allow the indexing and answer bots explicitly rather than relying on a blanket wildcard that a security rule might override.

OpenAI runs three bots. GPTBot crawls for model training, OAI-SearchBot surfaces pages in ChatGPT search, and ChatGPT-User fetches a page when a user action triggers it. For ChatGPT search citations, OAI-SearchBot is the one to allow.

Anthropic also splits its crawlers three ways: ClaudeBot for training, Claude-User for user-triggered fetches, and Claude-SearchBot for search indexing. Perplexity uses PerplexityBot to index content for its cited answers and Perplexity-User when its assistant browses live for a user.

One distinction is worth getting right, because it trips up teams that try to block AI entirely. The training crawlers and the search crawlers are separate agents. You can allow OAI-SearchBot and Claude-SearchBot so your pages stay eligible for citation while still disallowing GPTBot and ClaudeBot if you do not want your content used for model training. Blocking everything in one sweep is the quiet mistake that removes a site from answer results without removing it from anything a reader would notice.

The exact user agents to allow

  • OpenAI: GPTBot (training), OAI-SearchBot (ChatGPT search indexing), ChatGPT-User (user-triggered fetch).
  • Anthropic: ClaudeBot (training), Claude-SearchBot (search indexing), Claude-User (user-triggered fetch).
  • Perplexity: PerplexityBot (search indexing), Perplexity-User (live browsing for a user).
  • Note: ChatGPT-User, Claude-User, and Perplexity-User are user-initiated, so robots.txt rules may not govern them the same way as the automated indexers.
Insight

A minimal robots.txt that admits the answer engines: a User-agent line for each of GPTBot, OAI-SearchBot, ClaudeBot, Claude-SearchBot, and PerplexityBot, each followed by Allow: / . Then confirm no upstream firewall or bot-management rule silently drops them.

Step 2: ChatGPT visibility starts in Bing, not Google

Here is the part most guides bury. ChatGPT search retrieves web results through Bing's index. If Bing has not crawled and indexed a page, ChatGPT generally cannot surface it, no matter how well it ranks on Google. That makes Bing Webmaster Tools a direct lever for ChatGPT visibility: verify your domain there and submit your XML sitemap so Bing discovers and indexes your pages.

This is the single most decisive step that GEO guides routinely skip. Submit the sitemap, watch the index coverage report, and fix any pages Bing reports as excluded before you optimize anything else. A perfectly written page that is missing from Bing's index gives ChatGPT nothing to cite. Confirm indexing first, then move on.

Pro Tip

Perplexity weights recency. It tends to prefer a recently updated article over an otherwise identical stale one. Keep a visible last-updated date in the page and genuinely refresh the content, statistics, and links on a schedule rather than letting a post calcify after publish.

Step 3: Lead with the answer, then make it extractable

Answer engines compose a response by lifting short, self-contained spans from a few sources. Two structural moves make your spans easy to lift. First, open the page with a 2 to 4 sentence direct answer to the query, before any preamble. Second, render your evidence in formats a parser can isolate: a table for any comparison, a numbered list for any procedure, and short declarative sentences for any standalone fact.

One 2026 study of content structure put numbered lists at roughly 1.8 times the citation rate of plain prose and comparison tables at about 2.5 times the rate of the same data buried in a narrative paragraph. The pattern is consistent across studies even where the exact multipliers differ: pages that combine extractable formats tend to be cited meaningfully more often than walls of prose. Treat these as reported study figures and directional guidance, not guarantees.

Picture how an engine reads a page and the mechanism is obvious. A numbered list hands it discrete, ordered steps it can quote without rewriting. A table gives it labeled cells it can map straight onto the comparison a user asked for. A single declarative sentence with a number in it is a citable fact on its own. Bury that same fact in a paragraph and the engine has to guess where the useful part starts, so it pulls from a competitor who already chunked it cleanly. Say the query is which plan includes priority support: a sentence stranded mid-paragraph forces the engine to extract a fragment and hope it caught the condition, while a pricing table whose rows are plans and whose columns include a priority-support cell hands it an exact, labeled answer. Same information, two very different odds of being the source that gets named.

What the Princeton GEO study found actually works

The original GEO paper tested nine optimization methods on its benchmark. The winners were not stylistic. Adding relevant statistics, adding credible quotations, and citing sources produced the biggest gains in how prominently a source appeared in generated answers, on the order of a 30 to 40 percent relative improvement on the paper's visibility metric. Keyword stuffing and fluffier phrasing did little or nothing.

  • Add statistics: replace soft claims with specific, sourced numbers.
  • Cite sources inline: link each factual claim to a primary source the engine can corroborate.
  • Add quotations from named, credible voices instead of anonymous assertions.
  • Improve fluency: clear, well-structured sentences extract more cleanly than dense jargon.
  • Skip the promotional tone: a salesy register tends to suppress citation rather than help it.

The reason these edits beat the cosmetic ones is mechanical. A model choosing what to cite is weighing whether a span will hold up when it repeats that span to a user. A sourced statistic, a named quotation, and an inline link to a primary source each give the model something it can stand behind. Keyword density and an authoritative tone change none of that, which is why the paper found they moved the needle far less than evidence did.

Step 4: Make the page trustworthy to a machine

Answer engines bias toward content that signals accountability. Three signals are cheap to add, and each one tracks with higher citation. A page that states who wrote it, when, and on what evidence reads as something the model can stand behind when it quotes you.

  • A named author with a real bio, so the page has an accountable voice rather than an anonymous one.
  • A visible publish or update date, so the engine can judge freshness and prefer current sources.
  • Inline citations to primary sources, so each factual claim can be corroborated before it is repeated.

There is a self-referential test here. The page you are reading opens with a direct answer, carries a named author and a date, uses a table and numbered lists, and links every statistic to a primary source. Those are exactly the features that make a page quotable. The checklist is not abstract advice; it is the shape of a page an answer engine is more likely to cite.

The GEO checklist, side by side

ElementLow-citability pageHigh-citability page
Crawler accessGPTBot / PerplexityBot blocked or unverifiedAI indexers explicitly allowed in robots.txt
Bing indexingNot in Bing's index, invisible to ChatGPTSitemap submitted and indexed in Bing
OpeningSlow intro before the answer2 to 4 sentence direct answer up top
FormatWall of proseAt least one HTML table and one numbered list
EvidenceVague claims, no sourcesInline statistics linked to primary sources
Trust signalsNo author, no dateNamed author, visible date, clear update cadence

Step 5: Measure citations, not just rankings

Classic SEO metrics undercount your AI visibility. A page can rank modestly on Google and still be cited often by an answer engine, or rank well and never get quoted. Track the queries you care about by actually asking ChatGPT, Perplexity, and Claude those questions on a schedule and recording whether your domain appears as a cited source. Server log analysis adds the other half: watch for GPTBot, OAI-SearchBot, ClaudeBot, Claude-SearchBot, and PerplexityBot hits to confirm the crawlers are reaching your new pages.

Turn that into a simple decision rule. If the crawlers are not in your logs, you have an access problem, so return to Step 1 and Step 2. If the crawlers are hitting the page and it is indexed in Bing but still loses to a competitor, you have a format or evidence problem, so compare the extractable parts. Usually the cited competitor has a cleaner direct answer, a real table, or stronger sourced statistics. Close that gap and re-test the query a week later, because re-crawl and re-evaluation are not instant.

How the same principle applies to your own AI memory

GEO is about making your public pages quotable to strangers' AI assistants. The mirror image is making your own documents quotable to your AI. MemX is a consumer AI memory app, an external memory layer over your own documents, photos, and notes across Android, iOS, and WhatsApp. The same principle holds on both sides: structured, well-labeled, retrievable content gets surfaced, and an undifferentiated pile does not. MemX is private by architecture, with per-user keys, encryption at rest, and an on-device first pass, so your personal corpus stays yours while still being instantly recallable.

Frequently Asked Questions
01how do I get my website cited by ChatGPT

Allow OAI-SearchBot in robots.txt, get the page indexed in Bing via Bing Webmaster Tools (ChatGPT search uses Bing's index), then lead with a 2 to 4 sentence direct answer and add a table, a numbered list, and sourced statistics so ChatGPT can extract and attribute your content.

02what is GEO and how is it different from SEO

GEO, or Generative Engine Optimization, optimizes pages to be cited inside AI answers from tools like ChatGPT and Perplexity. SEO optimizes for ranked links on a results page. GEO leans harder on extractable structure, inline statistics, and trust signals than on traditional keyword ranking.

03which crawlers do I need to allow in robots.txt for AI search

Allow GPTBot and OAI-SearchBot (OpenAI), ClaudeBot and Claude-SearchBot (Anthropic), and PerplexityBot (Perplexity). The user-initiated fetchers ChatGPT-User, Claude-User, and Perplexity-User act on user requests, so robots.txt may not apply to them the same way.

04does adding statistics really increase AI citations

The Princeton GEO research found that adding relevant statistics, credible quotations, and source citations were among the most effective edits, raising visibility in generated answers by roughly 30 to 40 percent on its metric. Treat that as a reported study figure, not a guarantee for every page.

05how do I get cited by Perplexity specifically

Allow PerplexityBot in robots.txt, then favor freshness. Perplexity weights recency and tends to prefer recently updated articles over stale ones. Keep a visible update date, refresh facts and links on a schedule, and back claims with sourced statistics it can verify.

Read Next

Or try MemX to access 40+ AI models in one place — including Claude Sonnet 4.6 and GPT-5.4 — and get your questions answered today.

Was this article helpful?

Found this useful? Share it with someone who needs it.

Free · iOS, Android & WhatsApp

Stop losing what you save.
Let MemX remember it for you.

Every screenshot, photo, PDF and voice note — captured, encrypted, and instantly searchable. Ask in plain English, get the answer in seconds.

  • Reads text inside images and handwriting
  • Private and encrypted by default
  • Free to start, no credit card

Takes under a minute to set up. Your data stays yours.

Arpit Tripathi
Written by
Arpit TripathiLinkedIn

Founder of MemX. Ex-Google Staff Tech Lead Manager, ex-AWS Senior SDE (Elastic Block Store). Writes about practical AI on the MemX blog.

Keep reading

More guides for AI-powered students.