Decide which AI bots can read your site. Toggle each crawler to allow or block, then copy a
clean robots.txt. Defaults are set for GEO — allow the bots that cite you.
For most sites, allow everything. If your goal is to be cited by ChatGPT, Perplexity,
and Gemini, blocking their bots makes you invisible to AI answers. Block selectively only if you
have a real reason to opt out of training.
Crawlers
robots.txt
Add to https://yoursite.com/robots.txt — merge with your existing rules.
Training vs retrieval — the distinction that matters
Blocking a training bot keeps your content out of the next model version. Blocking a
retrieval bot keeps you out of today's AI answers — which is the opposite of what you want
if you're optimizing for citations. The "Block training, allow retrieval" preset above is the middle
path most publishers reach for. More context in
how to get cited in ChatGPT answers.
AI robots.txt questions.
Should I block AI crawlers?
Usually no, if you want AI engines to cite you. There are two distinct purposes: training crawlers (e.g. GPTBot, ClaudeBot, Google-Extended) that ingest your content for model training, and live-retrieval / search bots (e.g. OAI-SearchBot, ChatGPT-User, PerplexityBot) that fetch pages to answer a user right now. Blocking the retrieval bots removes you from AI answers — that is almost always the wrong move for SEO/GEO. Some publishers block training bots while allowing retrieval bots.
Where does robots.txt go?
At the root of your domain: https://yoursite.com/robots.txt, served as text/plain. Add these directives to your existing file rather than replacing it — keep your sitemap line and any rules for Googlebot/Bingbot.
Do AI crawlers actually obey robots.txt?
The major, named bots from OpenAI, Anthropic, Google, and Perplexity document their tokens and honor robots.txt. Some scrapers ignore it entirely — robots.txt is a request, not an enforcement mechanism. For hard blocking you need server-side or WAF rules.
What is Google-Extended?
A separate token Google uses for Gemini training and grounding. Blocking it does not affect normal Google Search indexing (that is Googlebot) — so you can keep ranking in Search while opting out of Gemini training.
Not sure what to allow?
The audit sets your crawler policy in line with your GEO goals — and fixes everything else
stopping AI engines from citing you.