SEO and AIO: Optimizing for Search Engines and LLMs

You optimize for Google. ChatGPT scrapes your site. Perplexity indexes your content. Claude reads your documentation.

The game changed. SEO still matters but AIO (AI Optimization) is the new frontier. Your content needs to work for both traditional search engines and large language models.

I spent time understanding what actually helps machines parse your site quickly. Here's what matters.

The Shift from SEO to SEO + AIO

Traditional SEO optimized for one crawler: Googlebot. You cared about PageRank, backlinks, keyword density, and Core Web Vitals.

Now your content gets consumed by dozens of LLM crawlers. GPTBot from OpenAI. ClaudeBot from Anthropic. Meta's crawler for Llama.

They don't care about backlinks. They care about structure and clarity.

Your optimization strategy splits into two tracks:

Search engines want fast pages, clean HTML, proper metadata, and authority signals.

LLMs want structured content, semantic markup, clear hierarchy, and machine-readable context.

The overlap is larger than you think. Good structure helps both.

JSON-LD Schema: Structured Data for Machines

JSON-LD is how you tell machines what your content means. Not just what it says - what it IS.

Google uses it for rich snippets. LLMs use it to understand context without parsing prose.

Core schema types that matter:

Person for author information
Article or BlogPosting for content
BreadcrumbList for navigation context
WebSite with siteNavigationElement for site structure
Organization for company information

Implementation in Next.js is simple. Add a script tag with type="application/ld+json":

<script
  type="application/ld+json"
  dangerouslySetInnerHTML={{
    __html: JSON.stringify({
      "@context": "https://schema.org",
      "@type": "BlogPosting",
      headline: post.title,
      datePublished: post.publishedAt,
      author: {
        "@type": "Person",
        name: "Your Name",
      },
    }),
  }}
/>

This gives machines structured context without changing your visible HTML.

Why it matters for LLMs:

LLMs scraping your site can extract facts faster from JSON-LD than from parsing prose. They know the publish date, author, and article type without inference.

MDX and Markdown: LLM-Friendly Content

Markdown is the most machine-readable content format. Clean hierarchy. Clear structure. No presentation noise.

MDX takes this further by allowing components while keeping the content structure parseable.

Why LLMs prefer Markdown:

Headings define clear hierarchy with #, ##, ###
Lists are unambiguous: - item or 1. item
Code blocks are explicit with triple backticks
No CSS or JavaScript to parse
Semantic meaning is in the syntax

When an LLM scrapes your MDX blog, it gets clean structure. No div soup. No CSS class guessing. Just content with clear hierarchy.

Optimization tips:

Use descriptive headings that work standalone
Keep paragraph structure clean
Use lists for enumerated information
Add code blocks with language tags
Include alt text for images even in MDX

llms.txt: The New robots.txt for AI

llms.txt is an emerging standard for telling LLMs how to understand your site. It sits at the root like robots.txt but serves a different purpose.

While robots.txt controls what crawlers CAN access, llms.txt tells them what they SHOULD prioritize and how to understand it.

Example llms.txt structure:

# llms.txt - AI crawling instructions

# Site Information
Name: Your Site Name
Description: What your site does in one line
Primary Topics: Web Development, SEO, Next.js

# Content Priority
High Priority:
- /docs/*
- /blog/*

Low Priority:
- /legal/*
- /changelog/*

# Structured Endpoints
API Docs: /api-reference
Sitemap: /sitemap.xml
RSS: /rss.xml

# Context
This site focuses on web development education.
All code examples use TypeScript and Next.js 15.

This gives LLMs context before they start scraping. They know what matters and how to interpret your content.

Implementation:

Create a static file at /public/llms.txt in Next.js. Serve it at the root domain.

Structured Content and Semantic HTML

HTML5 semantic tags aren't just for accessibility. They help machines understand content structure.

Use semantic tags:

<article> for blog posts and standalone content
<section> for logical content divisions
<nav> for navigation
<aside> for tangential content
<header> and <footer> for what they say
<time> with datetime attribute for dates
<figure> and <figcaption> for images

Avoid:

Generic <div> for everything
Class-based semantics that machines ignore
Headings out of order (h1, h3, h2)
Empty headings or headings that are just CSS targets

Content hierarchy:

One h1 per page. Nest headings properly. Each section should have a descriptive heading that works out of context.

LLMs extract information by heading structure. If your h2 says "Benefits" with no context, the LLM doesn't know benefits of WHAT.

Better: "Benefits of JSON-LD Schema" as a complete thought.

Metadata, Sitemaps, and Machine-Readable Formats

Traditional SEO formats still matter for LLMs.

XML Sitemaps:

LLMs use sitemaps to discover all your content efficiently. Include lastmod dates so they know what changed.

<url>
  <loc>https://example.com/blog/post</loc>
  <lastmod>2026-02-07</lastmod>
  <priority>0.8</priority>
</url>

RSS/Atom Feeds:

RSS feeds are machine-readable by design. Many LLM crawlers check for feeds first.

Provide full content in feeds when possible, not just summaries. This lets LLMs index everything without hitting every page.

OpenGraph and Twitter Cards:

These meta tags help LLMs understand content when they encounter social shares:

<meta property="og:type" content="article" />
<meta property="og:title" content="Your Title" />
<meta property="article:published_time" content="2026-02-07" />
<meta property="article:author" content="Your Name" />

Core Web Vitals Still Matter

LLMs don't care about page speed. Google does.

You still need to optimize:

LCP (Largest Contentful Paint): Under 2.5 seconds
FID (First Input Delay): Under 100ms
CLS (Cumulative Layout Shift): Under 0.1

Use Next.js Image optimization. Lazy load below the fold. Minimize JavaScript. Implement proper caching.

LLMs bypass all of this by scraping HTML directly. But users and search rankings still depend on performance.

The Content Strategy

For search engines:

Fast pages with Core Web Vitals optimization
Proper heading hierarchy
Internal linking for crawlability
Backlinks for authority
Regular updates for freshness signals

For LLMs:

Clear, structured content with semantic HTML
JSON-LD schema for context
Descriptive headings that work standalone
MDX or Markdown for clean parsing
llms.txt for crawling guidance

The overlap:

Most optimizations help both. Clean structure, semantic HTML, and proper metadata benefit all consumers.

When to Invest Where

Immediate priorities:

JSON-LD schema for articles and site structure
Semantic HTML throughout your site
Proper heading hierarchy
RSS feed with full content
Basic llms.txt file

As you scale:

Comprehensive schema for all content types
Advanced structured data for products, events, etc.
Programmatic sitemap generation with priorities
Content optimization for featured snippets
A/B testing for LLM citation rates

Don't bother with:

Over-optimization for keyword density (works for neither)
Hiding content from crawlers (harms both)
Duplicate content with minor variations
Thin content pages for SEO gaming

What I Learned

The web has two audiences now and they want different things.

Search engines still care about links, speed, and social signals. LLMs care about structure and clarity.

But the fundamentals align. Clean semantic HTML helps both. Good structure helps both. Clear hierarchy helps both.

JSON-LD feels like extra work until you realize it makes your content citeable. When ChatGPT references your article with full context, that JSON-LD is why.

llms.txt is early but worth implementing. You're giving crawlers a map instead of making them explore blindly.

The shift to optimize for AI feels like SEO all over again. New rules. New crawlers. New ways to think about content.

But unlike SEO's black-box algorithms, LLMs want what humans want: clarity, structure, and context.

Make your content clear for humans. Add structure for machines. Both audiences win.