Question 1

Should I block AI crawlers?

Accepted Answer

Usually no, if you want AI engines to cite you. There are two distinct purposes: training crawlers (e.g. GPTBot, ClaudeBot, Google-Extended) that ingest your content for model training, and live-retrieval / search bots (e.g. OAI-SearchBot, ChatGPT-User, PerplexityBot) that fetch pages to answer a user right now. Blocking the retrieval bots removes you from AI answers — that is almost always the wrong move for SEO/GEO. Some publishers block training bots while allowing retrieval bots.

Question 2

Where does robots.txt go?

Accepted Answer

At the root of your domain: https://yoursite.com/robots.txt, served as text/plain. Add these directives to your existing file rather than replacing it — keep your sitemap line and any rules for Googlebot/Bingbot.

Question 3

Do AI crawlers actually obey robots.txt?

Accepted Answer

The major, named bots from OpenAI, Anthropic, Google, and Perplexity document their tokens and honor robots.txt. Some scrapers ignore it entirely — robots.txt is a request, not an enforcement mechanism. For hard blocking you need server-side or WAF rules.

Question 4

What is Google-Extended?

Accepted Answer

A separate token Google uses for Gemini training and grounding. Blocking it does not affect normal Google Search indexing (that is Googlebot) — so you can keep ranking in Search while opting out of Gemini training.

AI crawler
robots.txt

Crawlers

robots.txt

Training vs retrieval — the distinction that matters

AI robots.txt questions.

Not sure what to allow?

AI crawlerrobots.txt