Alejandro Rioja.
GEO SEO

llms.txt Explained: Does It Actually Move Citations?

Alejandro Rioja
Alejandro Rioja
7 min read
TL;DR

llms.txt is a plain-text file at yoursite.com/llms.txt that tells AI crawlers which pages to prioritize. Perplexity actively reads it; ChatGPT and Bing Copilot probably do not yet. It takes 20 minutes to implement and costs nothing — do it, but don't expect a citation spike next week.

Free newsletter

Every Wednesday. 28,400+ operators. Zero fluff.

Table of contents

Open Table of contents

What llms.txt actually is

Think of it as a robots.txt for AI crawlers, but inverted. robots.txt says “don’t crawl this.” llms.txt says “when you’re building context about my site, here’s what matters most.”

The spec was proposed in late 2024 by Jeremy Howard (of fast.ai). The idea: put a file at yoursite.com/llms.txt that lists your most important pages in plain Markdown. An AI crawler scraping your site for context can read that file and know immediately what to prioritize — instead of guessing by PageRank or crawl depth.

There’s also an optional llms-full.txt variant that includes the full text of your key pages concatenated into one document. Some crawlers prefer that format because it reduces round-trips.

Neither file is a W3C standard yet. It’s a community proposal with growing adoption among technical founders and content teams.

What the file looks like

Here’s the llms.txt I run for alejandrorioja.com:

markdown
# Alejandro Rioja

> Operator, AI consultant, and founder of Pickleland. I write about GEO, AI agents, and growth for founders.

## Core pages

- [About](https://alejandrorioja.com/about/): Background, consulting services, and how to work with me.
- [Blog](https://alejandrorioja.com/blog/): All posts on GEO, SEO, AI agents, and founder growth.
- [Consultation](https://alejandrorioja.com/consultation/30/): Book a 30-minute paid session.

## Top posts

- [How to Get Cited in ChatGPT Answers](https://alejandrorioja.com/blog/how-to-get-cited-in-chatgpt-answers/): The GEO playbook I use across client sites.
- [AI Agent Architecture for Founders](https://alejandrorioja.com/blog/ai-agent-architecture-for-founders/): How to design multi-agent systems without a full eng team.
- [GEO vs SEO](https://alejandrorioja.com/blog/geo-vs-seo/): What changes when Google is no longer the only search engine that matters.

## Optional: ignore

- /drafts/
- /admin/

A few things to notice:

Which AI engines actually read it

This is where I have to be honest with you: the landscape is fragmented and partially undocumented.

Perplexity — Yes, confirmed. Perplexity’s crawler (PerplexityBot) reads llms.txt when it indexes sites. Their engineering team has referenced the spec publicly. If Perplexity is a meaningful referral source for you, implementing llms.txt has a clear path to impact. Perplexity is also the highest-ROI GEO target for independent operators right now — see where to spend your GEO effort across Perplexity, ChatGPT, and Google AI Overviews for the full platform comparison.

ChatGPT / OpenAI — Not confirmed. OpenAI’s crawler (GPTBot) does not appear to read llms.txt as of mid-2026. Their crawl behavior is governed by robots.txt and their own internal prioritization. There’s no public statement from OpenAI acknowledging the spec.

Bing Copilot / Microsoft — Not confirmed. Similar situation to OpenAI. Bing’s AI crawler (BingBot) follows robots.txt but there’s no signal that it reads llms.txt.

Google AI Overviews / Gemini — Not confirmed. Google has their own structured-data ecosystem (schema.org, sitemaps) and has not indicated they’ll adopt third-party specs.

Anthropic — Anthropic’s crawler (ClaudeBot) crawls the web for training data. There’s no public documentation that it reads llms.txt, but several GEO practitioners report better Claude citations after implementing it. Correlation, not causation — but worth noting.

Smaller AI search engines — You.com, Phind, and several vertical AI search tools have stated or implied they read llms.txt. The spec is easier for smaller teams to adopt because they don’t have years of crawl infrastructure to refactor.

The honest summary: right now, llms.txt is a Perplexity optimization with some speculative benefit elsewhere. That ratio will probably shift as the spec matures.

How to implement it in 20 minutes

If you’re on a static site (Astro, Next.js with static export, Hugo, etc.), create the file at public/llms.txt. It will be served at the root.

For a Next.js app router site, you can generate it dynamically:

ts
// app/llms.txt/route.ts
import { allPosts } from "@/lib/content";

export async function GET() {
  const topPosts = allPosts
    .filter((p) => p.featured || p.views > 1000)
    .slice(0, 10);

  const lines = [
    "# Alejandro Rioja",
    "",
    "> Operator, AI consultant, founder of Pickleland. I write about GEO, AI agents, and growth for founders.",
    "",
    "## Top posts",
    "",
    ...topPosts.map(
      (p) => `- [${p.title}](https://alejandrorioja.com/blog/${p.slug}/): ${p.description}`
    ),
    "",
    "## Core pages",
    "",
    "- [About](https://alejandrorioja.com/about/): Services and background.",
    "- [Consultation](https://alejandrorioja.com/consultation/30/): Book a session.",
  ];

  return new Response(lines.join("\n"), {
    headers: { "Content-Type": "text/plain; charset=utf-8" },
  });
}

For an Astro site, the equivalent is a .txt.ts endpoint in src/pages/:

ts
// src/pages/llms.txt.ts
import type { APIRoute } from "astro";
import { getCollection } from "astro:content";

export const GET: APIRoute = async () => {
  const posts = await getCollection("posts", (p) => p.data.lang === "en");
  const top = posts
    .sort((a, b) => b.data.pubDate.valueOf() - a.data.pubDate.valueOf())
    .slice(0, 10);

  const body = [
    "# Alejandro Rioja",
    "",
    "> AI consultant and operator. Writing about GEO, AI agents, and founder growth.",
    "",
    "## Recent posts",
    "",
    ...top.map(
      (p) =>
        `- [${p.data.title}](https://alejandrorioja.com/blog/${p.slug}/): ${p.data.description}`
    ),
  ].join("\n");

  return new Response(body, {
    headers: { "Content-Type": "text/plain; charset=utf-8" },
  });
};

After deploying, verify it at curl -s https://yoursite.com/llms.txt. If you see Markdown, you’re done.

Should you also create llms-full.txt?

Maybe. llms-full.txt is a concatenated dump of your key pages — title, URL, and full body text, one page after another, separated by ---. The idea is that a crawler can grab the whole thing in one request and have enough context to answer questions about your site without crawling individual pages.

The tradeoff: it’s a large file. Mine runs about 400KB for the top 30 posts. Some crawlers may time out or truncate it. Others may weight it more heavily because the content is pre-digested.

My current approach: generate llms-full.txt but cap it at the 15 highest-performing posts by traffic. Keep it under 250KB. Regenerate on every deploy.

What the data actually shows

I’ve been monitoring Perplexity citations for this site and three client sites since January 2026. Here’s what I’ve observed:

The honest interpretation: llms.txt probably helps with Perplexity. The mechanism is clear — Perplexity reads it. Whether the lift is from llms.txt specifically or from the general GEO improvements that tend to accompany it, I can’t say yet.

What to put in the blockquote

The one-line description in the blockquote is the part I’d spend the most time on. This is the text an LLM will use to summarize who you are in a RAG context. It needs to be:

Bad: > Helping businesses grow with AI.

Better: > Alejandro Rioja — AI consultant in Austin TX, founder of Pickleland, writing about GEO, AI agents, and founder growth since 2019.

The operator’s bottom line

llms.txt takes 20 minutes to implement, costs nothing to serve, and has a confirmed read path with Perplexity. Do it. The spec will either become a real standard (in which case early adopters win) or fade out (in which case you’ve lost 20 minutes). The asymmetry is obvious. Just don’t let it distract you from the higher-ROI GEO work: structured data, clear entity signals, and answers formatted for snippet extraction. Those move every AI engine. llms.txt currently moves one. For the broader playbook, see how to get your brand cited in ChatGPT answers and which schema types punch above their weight for AI engines.


Related: Perplexity vs ChatGPT vs Google AI Overviews: where to spend your GEO effort · Get your brand cited in ChatGPT answers · Schema types AI engines actually read

Want help setting up your GEO stack? Get in touch — I run GEO consulting projects and can audit your citation surface across Perplexity, ChatGPT, and Google AI Overviews.

Keep reading

Get the AI playbook in your inbox

Every Wednesday. 28,400+ operators. Zero fluff.

↵ to see all results esc esc to close