Alejandro Rioja.
GEO SEO

Schema Markup for AI Engines: Types That Punch Above Their Weight

Alejandro Rioja
Alejandro Rioja
8 min read
TL;DR

FAQPage and HowTo schema give the highest GEO citation lift per hour of work because AI engines parse them as pre-answered questions and step-by-step procedures. Article/BlogPosting signals authorship credibility. Person and Organization anchor your entity graph so models stop confusing you with someone else. Skip the long tail of obscure types — they don't move the needle in 2026.

Free newsletter

Every Wednesday. 28,400+ operators. Zero fluff.

Table of contents

Open Table of contents

Why AI engines read schema differently than Google does

Traditional Google crawlers use schema mainly for rich results — those star ratings and FAQ dropdowns in the SERP. That’s a rendering concern. The schema either qualifies for a feature or it doesn’t.

AI engines — ChatGPT, Perplexity, Gemini, Claude — use schema differently. They’re not rendering a SERP. They’re parsing your page to extract discrete, citable facts. Schema markup is a shortcut. Instead of inferring what a block of text means, the model can read the @type field and know: “this is a question-answer pair,” or “this is a structured procedure,” or “this is the author.”

That changes which types matter. Types that serialize your content into clean, extractable units win. Types that mainly help Google display a rich result are less valuable in the GEO context.

The crawlers that feed AI training data and real-time retrieval (Common Crawl, Bing’s index, Google’s crawl) all process JSON-LD. If the markup is valid and semantically accurate, it gets ingested. If it’s stuffed with fake FAQs or mismatched types, models learn to distrust it — or ignore it.

Article and BlogPosting: the authorship anchor

Every post you publish should have Article or BlogPosting schema. This isn’t glamorous but it’s foundational.

The two fields that matter most for GEO are author and dateModified. AI engines weight freshness and named authorship when deciding whether to surface a citation. A page with no declared author and a two-year-old publish date competes poorly against a page with a named expert and a recent update.

json
{
  "@context": "https://schema.org",
  "@type": "BlogPosting",
  "headline": "Schema Markup for AI Engines: Types That Punch Above Their Weight",
  "author": {
    "@type": "Person",
    "name": "Alejandro Rioja",
    "url": "https://alejandrorioja.com/about/"
  },
  "datePublished": "2026-05-31",
  "dateModified": "2026-05-31",
  "publisher": {
    "@type": "Organization",
    "name": "Alejandro Rioja",
    "url": "https://alejandrorioja.com"
  },
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://alejandrorioja.com/blog/schema-markup-for-ai-engines-the-types-that-punch-above-their-weight/"
  }
}

Keep dateModified accurate. I’ve seen sites park a fake “updated today” date on every page — models catch the pattern and discount it. Update the date when you actually update the content.

FAQPage: the highest GEO lift per hour

If I had to pick one schema type to add to every informational page right now, it’s FAQPage. The reason is structural: AI engines already want to answer questions. FAQPage hands them a labeled question and a labeled answer in a single node. There’s no inference required.

The lift shows up in featured snippets too, but the GEO effect is more reliable. When a user asks Perplexity a question that matches one of your FAQ entries, the model can cite your answer almost verbatim because you’ve already formatted it as a citation. (Perplexity is currently the most citation-generous AI search platform for independent operators — see where to focus your GEO effort across platforms.)

Rules I follow for FAQ schema that actually works:

  1. Each question must reflect how a real user phrases it — not how you’d phrase it as a marketer.
  2. Each answer must be self-contained. If the answer only makes sense after reading the article, it won’t get cited.
  3. Three to six questions per page is the sweet spot. Padding with ten weak questions hurts more than it helps.
json
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "Which schema types do AI engines prioritize?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "AI engines prioritize FAQPage, HowTo, Article/BlogPosting, Person, and Organization. These types serialize content into clean, extractable units that models can cite directly without needing to parse prose."
      }
    },
    {
      "@type": "Question",
      "name": "Does schema markup still help with SEO in 2026?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Yes. Schema markup helps both traditional crawlers (for rich results) and AI crawlers (for citation extraction). FAQPage and HowTo provide the highest return per hour of implementation work."
      }
    },
    {
      "@type": "Question",
      "name": "How many FAQ items should I include per page?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Three to six self-contained question-answer pairs is the sweet spot. More than six dilutes quality; fewer than three reduces the surface area for citation."
      }
    }
  ]
}

HowTo: procedures AI engines love to quote

HowTo schema is underused. Most people implement it on recipe-style content and stop there. But any procedural content — setup guides, audits, frameworks — is a candidate.

The reason it punches above its weight for GEO: AI engines regularly respond to “how do I…” queries by listing steps. When your page has HowTo schema with named steps, the model can reproduce your structure almost exactly. It’s not summarizing you — it’s quoting your procedure.

json
{
  "@context": "https://schema.org",
  "@type": "HowTo",
  "name": "How to Add FAQPage Schema to a Blog Post",
  "step": [
    {
      "@type": "HowToStep",
      "position": 1,
      "name": "Identify three to six real user questions",
      "text": "Pull questions from Google Search Console queries, Reddit threads, and your own customer emails. Each question should reflect natural language, not marketer language."
    },
    {
      "@type": "HowToStep",
      "position": 2,
      "name": "Write self-contained answers",
      "text": "Each answer must make sense in isolation — no references to 'as mentioned above' or 'see section 3'. Aim for 40–120 words per answer."
    },
    {
      "@type": "HowToStep",
      "position": 3,
      "name": "Add the JSON-LD block to your page head or body",
      "text": "Paste the FAQPage JSON-LD into a <script type='application/ld+json'> tag. Validate with Google's Rich Results Test and Schema.org Validator before publishing."
    }
  ]
}

One practical note: keep HowToStep text short and scannable. AI engines excerpt step text at roughly sentence-level granularity. A 400-word essay in a step field mostly gets ignored.

Person and Organization: entity disambiguation

This is the unsexy one that prevents real problems. AI engines maintain entity graphs — internal maps of who people and organizations are. If your Person schema is absent or inconsistent, models may conflate you with someone else who shares your name, or simply label you as an unknown entity and deprioritize your content.

Person schema on your about page and author pages does three things:

  1. Declares your canonical name and URL
  2. Links to your social profiles via sameAs (these function as entity anchors)
  3. Associates your expertise via knowsAbout
json
{
  "@context": "https://schema.org",
  "@type": "Person",
  "name": "Alejandro Rioja",
  "url": "https://alejandrorioja.com/about/",
  "sameAs": [
    "https://www.linkedin.com/in/alejandrioja/",
    "https://twitter.com/alejandrorioja",
    "https://github.com/alejandrorioja"
  ],
  "knowsAbout": [
    "AI agents",
    "Generative Engine Optimization",
    "SEO",
    "growth marketing"
  ],
  "jobTitle": "Founder",
  "worksFor": {
    "@type": "Organization",
    "name": "Alejandro Rioja",
    "url": "https://alejandrorioja.com"
  }
}

Organization schema belongs on your homepage. The sameAs links here are especially important — they let models verify that your website and your LinkedIn page and your Crunchbase profile are all the same entity.

BreadcrumbList is frequently skipped because it looks like a UX nicety rather than a content signal. That’s a mistake.

AI engines use breadcrumbs to understand where a piece of content sits in your site’s taxonomy. A post on “schema markup” that lives inside /blog/seo/ is contextualized differently than a standalone page. That hierarchy helps models classify your content accurately, which affects which queries it gets surfaced for.

json
{
  "@context": "https://schema.org",
  "@type": "BreadcrumbList",
  "itemListElement": [
    {
      "@type": "ListItem",
      "position": 1,
      "name": "Home",
      "item": "https://alejandrorioja.com/"
    },
    {
      "@type": "ListItem",
      "position": 2,
      "name": "Blog",
      "item": "https://alejandrorioja.com/blog/"
    },
    {
      "@type": "ListItem",
      "position": 3,
      "name": "Schema Markup for AI Engines",
      "item": "https://alejandrorioja.com/blog/schema-markup-for-ai-engines-the-types-that-punch-above-their-weight/"
    }
  ]
}

This is a 10-minute implementation that most sites can add in a single template edit. Add it to your CMS layout and every page gets it automatically.

Types I’m skipping in 2026

A non-exhaustive list of schema types that I don’t spend time on for GEO:

The pattern: if a type doesn’t serialize your actual content into discrete, citable units, it’s not going to move your GEO metrics. Stick to the types that reformat what you’re already saying into structured data a model can extract cleanly.

Validating your schema before you publish

Two tools I always run:

  1. Google’s Rich Results Testsearch.google.com/test/rich-results. Checks eligibility for Google’s rich result features and surfaces syntax errors.
  2. Schema.org Validatorvalidator.schema.org. More permissive than Google’s tool; catches structural issues that Google ignores.

One workflow I use: paste the JSON-LD into the validator before adding it to the page. Fix errors. Then add it to the page and run the Rich Results Test on the live URL. This prevents publishing broken markup that sits undetected for months.

A common mistake: putting multiple @type blocks in separate <script> tags works fine, but nesting unrelated types inside a single block causes validation failures. Keep each type in its own script tag.

The operator’s bottom line

FAQPage and HowTo are the two types I add to every informational page I publish. They take 20–40 minutes to write well and they create structured citation surface area that AI engines can use directly. Article/BlogPosting, Person, Organization, and BreadcrumbList are table stakes — get them into your templates once and forget about them. Everything else is noise until you’ve nailed these five. For the complementary GEO signals beyond schema — citation surfaces, llms.txt, entity anchoring — see how to get your brand cited in ChatGPT answers and llms.txt explained.


Related: Get your brand cited in ChatGPT answers · Perplexity vs ChatGPT vs Google AI Overviews: where to spend your GEO effort · llms.txt: does it actually move citations?

Want a schema audit for your site? Get in touch — I run SEO + GEO audits that include structured data validation across all pages.

Keep reading

Get the AI playbook in your inbox

Every Wednesday. 28,400+ operators. Zero fluff.

↵ to see all results esc esc to close