The Citation Core 11 — High-Impact AEO Factors

FACTOR 01

JSON-LD Structured Data

Established Effort · Medium Layer 2 · Understanding

What it is

JSON-LD (JavaScript Object Notation for Linked Data) is the structured-data format that AI systems and search engines parse to understand what your page is about, who wrote it, what entities it discusses, and how the page relates to other content. It lives in a <script type="application/ld+json"> block, typically in the page <head>. The format is JSON; the schema is Schema.org.

Why it matters for citation

JSON-LD is the highest-leverage signal in Layer 2. AI systems parse it before they parse the rest of the page — it's the most reliable way to communicate page identity, author, entity grounding, and content type to a machine. Pages with comprehensive, accurate JSON-LD are dramatically more likely to be cited than pages without it, even when the underlying content is comparable.

What compliance looks like

A typical implementation includes Organization schema for the publisher, Person schema for the author, type-appropriate schema (Article/FAQPage/HowTo/etc.) for the page itself, and sameAs linking to authoritative profiles.

{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Page title here",
  "datePublished": "2026-05-01T00:00:00Z",
  "dateModified": "2026-05-01T00:00:00Z",
  "author": {
    "@type": "Person",
    "name": "Author Name",
    "sameAs": ["https://www.linkedin.com/in/author-handle"]
  },
  "publisher": {
    "@type": "Organization",
    "name": "Publisher Name",
    "sameAs": ["https://www.wikidata.org/wiki/Q1234567"]
  }
}

Common mistakes

No JSON-LD at all (most common at the failing end)
JSON-LD with errors that pass display tests but fail semantic parsing
Hardcoded JSON-LD that drifts from page content over time
Missing sameAs links (Organization or Person without external grounding)
Using @type strings that don't match Schema.org's actual type vocabulary

How to fix

Audit current JSON-LD with Google's Rich Results Test and Schema.org's validator
Implement Organization and Person schemas sitewide if missing
Add type-appropriate schema for each page (Article for blog posts, FAQPage for FAQs)
Add sameAs linking for authoritative profile grounding
Set up automated schema-content drift detection

FACTOR 02

Front-Loaded Direct Answers

Strongly Inferred Effort · Low (per page) Layer 3 · Extractability

What it is

The first 40–60 words of any answer-bearing page should be the answer to the question the page targets. Not the introduction. Not the context-setting. The actual answer.

Why it matters for citation

AI systems prefer to extract citation passages from the front of pages — front-loaded answers are statistically more likely to be the answer they're looking for. When the answer is buried in paragraph 4 or 5, the system has to traverse more content to find it; in many cases, it gives up and cites a different source.

What compliance looks like

A page targeting the question "What is AEO?" should answer that question in the first paragraph:

Answer Engine Optimization is the practice of structuring web content so AI answer engines — ChatGPT, Google AI Overviews, Perplexity, Gemini, Copilot, voice assistants — cite you as a source when generating answers. It's not SEO. It overlaps with SEO. The signals weighted differently. (That's the answer. Subsequent sections elaborate. The first 40–60 words are the answer.)

Common mistakes

Opening with context-setting prose ("In an increasingly crowded digital landscape...")
Burying the answer in paragraph 4 after the introduction
Front-loading commentary instead of definition
Front-loading the question instead of the answer ("What is AEO? AEO is...")

How to fix

Identify pages that target a specific answerable question
Move the actual answer to the first 40–60 words
Move context-setting, history, and elaboration to subsequent sections
Test: read only the first paragraph. Could a reader extract a useful, complete answer? If no, refactor.

FACTOR 03

Concise Answer Blocks

Strongly Inferred Effort · Medium Layer 3 · Extractability

What it is

Beyond the front-loaded opening, individual answer blocks within a page should target the 40–60 word range. The 40–60 word range is the citation sweet spot. Shorter blocks (under 30 words) often lack enough context to stand alone in a generated answer. Longer blocks (over 80 words) resist clean extraction.

Why it matters for citation

AI systems extract clean, citable passages by identifying candidate blocks in the source content. A block of 40–60 words is the right size: long enough to carry context, short enough to extract whole. Pages with multiple well-sized blocks expose multiple citation candidates per query.

What compliance looks like

What's the difference between AEO and SEO?

AEO targets citation outcomes in AI-generated answers; SEO targets ranked positions in search results. The signals overlap (both depend on quality content and crawlable infrastructure) but weight differently — structured data and entity grounding matter more for AEO, while keyword optimization and click-through rate matter more for SEO. (53 words)

Common mistakes

Answer blocks that run 200+ words — too long for clean extraction
Answer blocks that run 15 words — too short for context
Inconsistent block sizing across the page
Long-form prose with no internal block structure at all

How to fix

For each answer block, count the words
Refactor 200+ word blocks into multiple 40–60 word blocks
Refactor 15–30 word blocks by adding context and one example
Test: each block should be a complete, citable passage that makes sense out of context

FACTOR 04

Proper Heading Hierarchy

Established Effort · Low to Medium Layer 2 Layer 3

What it is

Heading levels (H1 → H2 → H3 → H4) should reflect the actual semantic structure of the document. Skipped levels (H1 → H3 with no H2) break parsing. Decorative headings that don't reflect the underlying structure mislead AI systems. Multiple H1s on one page confuse top-level identification.

Why it matters for citation

AI systems use heading hierarchy as a primary structural cue. They identify the page's main topic from the H1, sub-topics from H2s, and sub-sub-topics from H3s. When the hierarchy is broken or misleading, the system's understanding of the page degrades.

What compliance looks like

<h1>The page's main topic</h1>
  <h2>First major section</h2>
    <h3>Sub-section under the first major section</h3>
    <h3>Another sub-section</h3>
  <h2>Second major section</h2>
    <h3>Sub-section under the second major section</h3>

Single H1. H2s for major sections. H3s only under H2s. No skipped levels.

Common mistakes

Multiple H1s on one page
Skipped levels (H1 → H3, with no H2)
Decorative headings used for visual emphasis rather than semantic structure
Heading text that doesn't describe the section it heads
Inconsistent heading style across a content portfolio

How to fix

Audit each page for the heading hierarchy
Confirm one H1 per page, matching the page's main topic
Confirm H2s reflect actual major sections
Eliminate skipped levels by inserting an H2 where there's a missing-mid-hierarchy
Convert decorative-only headings to non-heading elements

FACTOR 05

Definition & Summary Density

Strongly Inferred Effort · Medium Layer 3 · Extractability

What it is

Pages with explicit definitions of key terms — at the top of the page or in dedicated definition blocks — outperform pages without them in citation outcomes for explanatory queries. Same for summary blocks: an explicit "What this article covers" or "Key takeaways" section dramatically improves extractability for skim-reading AI passes.

Why it matters for citation

When a user asks AI a definitional question ("What is X?"), the AI extracts from sources that have clean, identifiable definitions. Pages without definitions force the AI to infer the definition from elaboration — and many sources offer cleaner definitional candidates.

What compliance looks like

Answer Engine Optimization (AEO): The practice of structuring web content so AI answer engines cite the page as a source in generated responses.

Key takeaways: · First main point
· Second main point
· Third main point

Common mistakes

Long-form articles with no definition of the key term anywhere
Definitions buried in paragraph 5 instead of front-loaded
Summary blocks that are actually marketing copy rather than actual summaries
Definitions that elaborate before defining

How to fix

Audit whether an explicit definition appears in the first 100 words
Add definition blocks where missing, using a recognizable formatting pattern (bold term + colon + definition)
For long-form pages, add a "What this article covers" or "Key takeaways" block
Use DefinedTerm JSON-LD schema for proprietary or technical terms

FACTOR 06

Statistics with Sources

Strongly Inferred Effort · Medium Layer 3 Layer 2 · Trust

What it is

Numerical claims — percentages, counts, ranges, rates — significantly outperform unsourced claims in citation outcomes when properly attributed to a source. A claim with a number and a citation is treated by AI systems as substantially more credible than a claim without one.

Why it matters for citation

AI systems treat citation-formatted statistics as high-confidence claim candidates. When generating an answer, the AI prefers to ground its response in specific numbers from credible sources rather than vague qualitative descriptions.

What compliance looks like

AI-generated search results now appear on a substantial portion of Google query results pages [Source: Google AI Overviews launch report, 2025].

73% of B2B buyers report using AI tools in their research process before contacting a vendor (Gartner B2B Buyer Survey, 2025).

Common mistakes

Vague numbers without sources ("studies show that most people...")
Sources that don't verify the claim
Outdated stats with current-language framing
Numbers without context (claiming "73%" without saying 73% of what)

How to fix

Audit numerical claims across the content portfolio
For each claim, identify the original source and link to it
Date-stamp the source ("as of 2025" or include the year in citation)
Format citations consistently — pick a format and use it everywhere
For frequently-changing statistics, set a re-validation reminder

FACTOR 07

Bullet and Numbered Lists

Strongly Inferred Effort · Low to Medium Layer 3 · Extractability

What it is

Content that appears as a list (<ul> or <ol>) is dramatically easier for AI systems to extract than the same content in prose form. AI prefers structures it can preserve — and lists preserve cleanly across extraction.

Why it matters for citation

When generating an answer, AI systems frequently want to produce list-formatted output (numbered steps, bullet-point recommendations, enumerated criteria). Source content already in list format is a direct match: extract, format, cite.

What compliance looks like

<h3>Three steps to implement AEO</h3>
<ol>
  <li>Audit your current AI Visibility Score</li>
  <li>Fix Layer 1 (Access) issues first</li>
  <li>Move to Layer 2 and Layer 3 work in sequence</li>
</ol>

Common mistakes

Procedural content in prose ("First do X, then do Y, then do Z") instead of <ol>
Bullet-point content in prose with sentence breaks instead of <ul>
Lists without proper HTML markup — using line breaks and bullet characters in plain text
Mixing list types — using <ul> for ordered/sequential content where <ol> is correct

How to fix

Audit pages for procedural or enumerated content currently in prose
Convert procedural content (steps, sequences) to <ol> markup
Convert enumerated content (criteria, examples, options) to <ul> markup
Verify HTML lists render with proper tags, not div-based custom styles
Don't over-correct: lists that flow better as prose should stay as prose

FACTOR 08

HTML Tables for Comparisons

Strongly Inferred Effort · Low to Medium Layer 3 · Extractability

What it is

Comparison content in HTML <table> markup outperforms the same content in prose form. AI systems can preserve the table structure across extraction; prose comparisons get flattened and often misrepresented.

Why it matters for citation

Comparison queries — "X vs Y," "best X for Y," "differences between X and Y" — are extremely common in AI answer generation. AI systems frequently produce comparison-table outputs in their responses. Source content already in table format is a direct extraction candidate.

What compliance looks like

<table>
  <thead>
    <tr><th>Feature</th><th>AEO</th><th>SEO</th></tr>
  </thead>
  <tbody>
    <tr><td>Optimization target</td><td>Citation outcomes</td><td>Ranked positions</td></tr>
    <tr><td>Primary signals</td><td>Schema, entity grounding</td><td>Keywords, backlinks</td></tr>
  </tbody>
</table>

Common mistakes

"X vs Y" comparison content written as alternating prose paragraphs instead of a table
Tables built with <div> styling instead of semantic <table> markup
Tables without proper <thead> and <tbody> sectioning
Comparison content spread across multiple sections instead of consolidated

How to fix

Identify comparison-format pages
Convert comparison sections from prose to <table> markup
Use proper semantic table structure
Don't over-correct: pages that aren't comparison-shaped don't benefit from artificial table structures

FACTOR 09

Citation Formatting Quality

Strongly Inferred Effort · Medium Layer 2 Layer 3

What it is

How you cite your own sources is a trust signal. Properly formatted citations — with consistent format, identifiable source attribution, accessible links, and dates — signal a quality content producer.

Why it matters for citation

AI systems infer source quality partly from how the source treats its own sources. A page that cites its references properly is treated as more credible — and is more likely to be cited itself. The signal compounds.

What compliance looks like

Recent research from Stanford's HAI Institute (2025) shows...

Recent research shows AI citation behavior varies significantly by platform.¹

¹ Stanford HAI Institute. AI Citation Patterns Report. 2025.

Common mistakes

Inconsistent citation format across the page
Vague attributions ("studies show," "research suggests," "according to experts")
Broken links — citations to URLs that no longer resolve
Undated citations — references without a year or publication date
Self-citation only (the page cites only the publishing site's other content)

How to fix

Pick a citation format and document it as an editorial standard
Audit existing content for citation quality; fix vague or broken citations
Build citation-validation into the editorial pipeline
Mix self-citation and external citation appropriately

FACTOR 10

Entity Density

Strongly Inferred Effort · Medium Layer 2 · Understanding

What it is

Entity density is the rate at which a page names recognizable entities — people, organizations, places, products, concepts — properly grounded in identifiable references (Schema.org sameAs, consistent naming, or contextual disambiguation). Entity-rich pages are more visible to AI systems than entity-thin pages of comparable length.

Why it matters for citation

AI systems use entity recognition as a primary topic-modeling signal. A page that names specific entities is easier to topic-classify, easier to entity-link, and easier to retrieve in entity-keyed queries.

What compliance looks like

Specific names instead of generic descriptors:

"Anthropic" instead of "the AI company"
"ChatGPT" instead of "the chatbot"
"Stanford's HAI Institute" instead of "a research institute"
"the 93-Factor AEO Taxonomy" instead of "our framework"

Common mistakes

Generic descriptors used where specific names are available
Same entity referenced inconsistently — pick one canonical form and use it consistently
Entity references without grounding (no sameAs, no Wikidata link)
Over-correction: artificially stuffing entity names where they don't naturally fit

How to fix

Audit existing content for entity-naming patterns
Replace generic descriptors with specific entity names where appropriate
Establish canonical naming for repeating entities and use consistently
Add sameAs JSON-LD linking for entities with authoritative external grounding
Don't over-correct — natural-language flow is better than artificial entity-stuffing

FACTOR 11

Named Author Presence

Strongly Inferred (Established for Google AI surfaces) Effort · Low (per page) Layer 2 · Understanding

What it is

Pages with a named, identifiable author — with proper Person schema, a bio, credentials, and sameAs linking — outperform anonymous content in citation outcomes. The effect is especially pronounced in YMYL (Your Money or Your Life) categories: medical, financial, legal, real estate.

Why it matters for citation

AI systems weight author identity as a trust signal. An anonymous page is treated as lower-trust than the same content under a named, credentialed author. In YMYL categories, the elevation is more pronounced. Medical content from a named clinician outperforms anonymous medical content by a wide margin.

What compliance looks like

{
  "@type": "Person",
  "name": "Author Name",
  "jobTitle": "Editor in Chief",
  "url": "https://example.com/authors/author-name",
  "sameAs": [
    "https://www.linkedin.com/in/author-handle",
    "https://www.wikidata.org/wiki/Q1234567"
  ]
}

Common mistakes

Anonymous content (no author attribution at all)
Generic editorial team attribution ("by Editorial Team")
Named author but no bio page or credentials
Bio page exists but no sameAs linking
Inconsistent author identity across the site (the same person credited under three different name spellings)

How to fix

Audit content for unsigned or generically-attributed pages
Implement named author attribution for every meaningful page
Build author bio pages with credentials, publication history, and external profile links
Add Person schema to author bio pages and to article pages via author references
Standardize canonical name spelling per author across the site
In YMYL categories, ensure author credentials are visible and verifiable (license numbers, professional affiliations)

Eleven factors. Most of the work.

Why these eleven, and not eleven others.

Implication 01 · Start here

Implication 02 · Not a substitute

Implication 03 · Confidence varies

Each factor: what, why, how to implement, common mistakes.

JSON-LD Structured Data

Front-Loaded Direct Answers

Concise Answer Blocks

Proper Heading Hierarchy

Definition & Summary Density

Statistics with Sources

Bullet and Numbered Lists

HTML Tables for Comparisons

Citation Formatting Quality

Entity Density

Named Author Presence

You don't have to do all eleven at once.

Phase 01 · Foundation (week 1–2)

Start with the structural enablers.

Phase 02 · Structure (week 3–6)

Refactor structure on the highest-traffic answer-bearing pages.

Phase 03 · Trust & Context (week 7+)

Layer the trust and richness factors.

See where you stand.