Sign in Run Free Scan
The Citation Core 11

Eleven factors. Most of the work.

Within the 93-factor AEO taxonomy, eleven specific factors carry disproportionate impact on whether AI systems select your page for citation. Closing gaps in some subset of these eleven is where most AEO score improvements come from. This page provides per-factor depth on each one.

In place Partial Missing Sample status — your scan will surface real per-factor pass/fail.
The Methodology Position

Why these eleven, and not eleven others.

The Citation Core 11 isn't the eleven highest-weighted factors in the AIVZ scoring model. It's the eleven factors most directly correlated with citation outcomes in observed AI generation — the factors where presence-or-absence makes the largest difference to whether a page ends up cited in a generated answer.

These eleven span four of the nine taxonomy categories: Structured Data & Machine Readability, Content Structure & Extractability, Entity & Knowledge Graph Signals, and E-E-A-T & Trust Signals. What unifies them is empirical: in citation-outcome studies against real AI platforms, the gap between pages that pass these factors and pages that don't is larger than for any other comparable subset.

Implication 01 · Start here

If you're prioritizing AEO work and don't yet know where to start, the Core 11 is the answer. Most score improvements come from closing gaps in some subset of these eleven.

Implication 02 · Not a substitute

The other 82 factors aren't filler. They matter — that's why they're in the taxonomy. The Core 11 is where the leverage concentrates; it's not where leverage ends.

Implication 03 · Confidence varies

Some Core 11 factors are Established. Others are Strongly Inferred. One — Speakable schema — is Emerging. The confidence label per factor is surfaced below.

Per-Factor Depth

Each factor: what, why, how to implement, common mistakes.

FACTOR 01

JSON-LD Structured Data

Established Effort · Medium Layer 2 · Understanding
What it is

JSON-LD (JavaScript Object Notation for Linked Data) is the structured-data format that AI systems and search engines parse to understand what your page is about, who wrote it, what entities it discusses, and how the page relates to other content. It lives in a <script type="application/ld+json"> block, typically in the page <head>. The format is JSON; the schema is Schema.org.

Why it matters for citation

JSON-LD is the highest-leverage signal in Layer 2. AI systems parse it before they parse the rest of the page — it's the most reliable way to communicate page identity, author, entity grounding, and content type to a machine. Pages with comprehensive, accurate JSON-LD are dramatically more likely to be cited than pages without it, even when the underlying content is comparable.

What compliance looks like

A typical implementation includes Organization schema for the publisher, Person schema for the author, type-appropriate schema (Article/FAQPage/HowTo/etc.) for the page itself, and sameAs linking to authoritative profiles.

{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Page title here",
  "datePublished": "2026-05-01T00:00:00Z",
  "dateModified": "2026-05-01T00:00:00Z",
  "author": {
    "@type": "Person",
    "name": "Author Name",
    "sameAs": ["https://www.linkedin.com/in/author-handle"]
  },
  "publisher": {
    "@type": "Organization",
    "name": "Publisher Name",
    "sameAs": ["https://www.wikidata.org/wiki/Q1234567"]
  }
}
Common mistakes
  • No JSON-LD at all (most common at the failing end)
  • JSON-LD with errors that pass display tests but fail semantic parsing
  • Hardcoded JSON-LD that drifts from page content over time
  • Missing sameAs links (Organization or Person without external grounding)
  • Using @type strings that don't match Schema.org's actual type vocabulary
How to fix
  1. Audit current JSON-LD with Google's Rich Results Test and Schema.org's validator
  2. Implement Organization and Person schemas sitewide if missing
  3. Add type-appropriate schema for each page (Article for blog posts, FAQPage for FAQs)
  4. Add sameAs linking for authoritative profile grounding
  5. Set up automated schema-content drift detection
FACTOR 02

Front-Loaded Direct Answers

Strongly Inferred Effort · Low (per page) Layer 3 · Extractability
What it is

The first 40–60 words of any answer-bearing page should be the answer to the question the page targets. Not the introduction. Not the context-setting. The actual answer.

Why it matters for citation

AI systems prefer to extract citation passages from the front of pages — front-loaded answers are statistically more likely to be the answer they're looking for. When the answer is buried in paragraph 4 or 5, the system has to traverse more content to find it; in many cases, it gives up and cites a different source.

What compliance looks like

A page targeting the question "What is AEO?" should answer that question in the first paragraph:

Answer Engine Optimization is the practice of structuring web content so AI answer engines — ChatGPT, Google AI Overviews, Perplexity, Gemini, Copilot, voice assistants — cite you as a source when generating answers. It's not SEO. It overlaps with SEO. The signals weighted differently. (That's the answer. Subsequent sections elaborate. The first 40–60 words are the answer.)
Common mistakes
  • Opening with context-setting prose ("In an increasingly crowded digital landscape...")
  • Burying the answer in paragraph 4 after the introduction
  • Front-loading commentary instead of definition
  • Front-loading the question instead of the answer ("What is AEO? AEO is...")
How to fix
  1. Identify pages that target a specific answerable question
  2. Move the actual answer to the first 40–60 words
  3. Move context-setting, history, and elaboration to subsequent sections
  4. Test: read only the first paragraph. Could a reader extract a useful, complete answer? If no, refactor.
FACTOR 03

Concise Answer Blocks

Strongly Inferred Effort · Medium Layer 3 · Extractability
What it is

Beyond the front-loaded opening, individual answer blocks within a page should target the 40–60 word range. The 40–60 word range is the citation sweet spot. Shorter blocks (under 30 words) often lack enough context to stand alone in a generated answer. Longer blocks (over 80 words) resist clean extraction.

Why it matters for citation

AI systems extract clean, citable passages by identifying candidate blocks in the source content. A block of 40–60 words is the right size: long enough to carry context, short enough to extract whole. Pages with multiple well-sized blocks expose multiple citation candidates per query.

What compliance looks like
What's the difference between AEO and SEO?

AEO targets citation outcomes in AI-generated answers; SEO targets ranked positions in search results. The signals overlap (both depend on quality content and crawlable infrastructure) but weight differently — structured data and entity grounding matter more for AEO, while keyword optimization and click-through rate matter more for SEO. (53 words)
Common mistakes
  • Answer blocks that run 200+ words — too long for clean extraction
  • Answer blocks that run 15 words — too short for context
  • Inconsistent block sizing across the page
  • Long-form prose with no internal block structure at all
How to fix
  1. For each answer block, count the words
  2. Refactor 200+ word blocks into multiple 40–60 word blocks
  3. Refactor 15–30 word blocks by adding context and one example
  4. Test: each block should be a complete, citable passage that makes sense out of context
FACTOR 04

Proper Heading Hierarchy

Established Effort · Low to Medium Layer 2 Layer 3
What it is

Heading levels (H1 → H2 → H3 → H4) should reflect the actual semantic structure of the document. Skipped levels (H1 → H3 with no H2) break parsing. Decorative headings that don't reflect the underlying structure mislead AI systems. Multiple H1s on one page confuse top-level identification.

Why it matters for citation

AI systems use heading hierarchy as a primary structural cue. They identify the page's main topic from the H1, sub-topics from H2s, and sub-sub-topics from H3s. When the hierarchy is broken or misleading, the system's understanding of the page degrades.

What compliance looks like
<h1>The page's main topic</h1>
  <h2>First major section</h2>
    <h3>Sub-section under the first major section</h3>
    <h3>Another sub-section</h3>
  <h2>Second major section</h2>
    <h3>Sub-section under the second major section</h3>

Single H1. H2s for major sections. H3s only under H2s. No skipped levels.

Common mistakes
  • Multiple H1s on one page
  • Skipped levels (H1 → H3, with no H2)
  • Decorative headings used for visual emphasis rather than semantic structure
  • Heading text that doesn't describe the section it heads
  • Inconsistent heading style across a content portfolio
How to fix
  1. Audit each page for the heading hierarchy
  2. Confirm one H1 per page, matching the page's main topic
  3. Confirm H2s reflect actual major sections
  4. Eliminate skipped levels by inserting an H2 where there's a missing-mid-hierarchy
  5. Convert decorative-only headings to non-heading elements
FACTOR 05

Definition & Summary Density

Strongly Inferred Effort · Medium Layer 3 · Extractability
What it is

Pages with explicit definitions of key terms — at the top of the page or in dedicated definition blocks — outperform pages without them in citation outcomes for explanatory queries. Same for summary blocks: an explicit "What this article covers" or "Key takeaways" section dramatically improves extractability for skim-reading AI passes.

Why it matters for citation

When a user asks AI a definitional question ("What is X?"), the AI extracts from sources that have clean, identifiable definitions. Pages without definitions force the AI to infer the definition from elaboration — and many sources offer cleaner definitional candidates.

What compliance looks like
Answer Engine Optimization (AEO): The practice of structuring web content so AI answer engines cite the page as a source in generated responses.
Key takeaways: · First main point
· Second main point
· Third main point
Common mistakes
  • Long-form articles with no definition of the key term anywhere
  • Definitions buried in paragraph 5 instead of front-loaded
  • Summary blocks that are actually marketing copy rather than actual summaries
  • Definitions that elaborate before defining
How to fix
  1. Audit whether an explicit definition appears in the first 100 words
  2. Add definition blocks where missing, using a recognizable formatting pattern (bold term + colon + definition)
  3. For long-form pages, add a "What this article covers" or "Key takeaways" block
  4. Use DefinedTerm JSON-LD schema for proprietary or technical terms
FACTOR 06

Statistics with Sources

Strongly Inferred Effort · Medium Layer 3 Layer 2 · Trust
What it is

Numerical claims — percentages, counts, ranges, rates — significantly outperform unsourced claims in citation outcomes when properly attributed to a source. A claim with a number and a citation is treated by AI systems as substantially more credible than a claim without one.

Why it matters for citation

AI systems treat citation-formatted statistics as high-confidence claim candidates. When generating an answer, the AI prefers to ground its response in specific numbers from credible sources rather than vague qualitative descriptions.

What compliance looks like
AI-generated search results now appear on a substantial portion of Google query results pages [Source: Google AI Overviews launch report, 2025].
73% of B2B buyers report using AI tools in their research process before contacting a vendor (Gartner B2B Buyer Survey, 2025).
Common mistakes
  • Vague numbers without sources ("studies show that most people...")
  • Sources that don't verify the claim
  • Outdated stats with current-language framing
  • Numbers without context (claiming "73%" without saying 73% of what)
How to fix
  1. Audit numerical claims across the content portfolio
  2. For each claim, identify the original source and link to it
  3. Date-stamp the source ("as of 2025" or include the year in citation)
  4. Format citations consistently — pick a format and use it everywhere
  5. For frequently-changing statistics, set a re-validation reminder
FACTOR 07

Bullet and Numbered Lists

Strongly Inferred Effort · Low to Medium Layer 3 · Extractability
What it is

Content that appears as a list (<ul> or <ol>) is dramatically easier for AI systems to extract than the same content in prose form. AI prefers structures it can preserve — and lists preserve cleanly across extraction.

Why it matters for citation

When generating an answer, AI systems frequently want to produce list-formatted output (numbered steps, bullet-point recommendations, enumerated criteria). Source content already in list format is a direct match: extract, format, cite.

What compliance looks like
<h3>Three steps to implement AEO</h3>
<ol>
  <li>Audit your current AI Visibility Score</li>
  <li>Fix Layer 1 (Access) issues first</li>
  <li>Move to Layer 2 and Layer 3 work in sequence</li>
</ol>
Common mistakes
  • Procedural content in prose ("First do X, then do Y, then do Z") instead of <ol>
  • Bullet-point content in prose with sentence breaks instead of <ul>
  • Lists without proper HTML markup — using line breaks and bullet characters in plain text
  • Mixing list types — using <ul> for ordered/sequential content where <ol> is correct
How to fix
  1. Audit pages for procedural or enumerated content currently in prose
  2. Convert procedural content (steps, sequences) to <ol> markup
  3. Convert enumerated content (criteria, examples, options) to <ul> markup
  4. Verify HTML lists render with proper tags, not div-based custom styles
  5. Don't over-correct: lists that flow better as prose should stay as prose
FACTOR 08

HTML Tables for Comparisons

Strongly Inferred Effort · Low to Medium Layer 3 · Extractability
What it is

Comparison content in HTML <table> markup outperforms the same content in prose form. AI systems can preserve the table structure across extraction; prose comparisons get flattened and often misrepresented.

Why it matters for citation

Comparison queries — "X vs Y," "best X for Y," "differences between X and Y" — are extremely common in AI answer generation. AI systems frequently produce comparison-table outputs in their responses. Source content already in table format is a direct extraction candidate.

What compliance looks like
<table>
  <thead>
    <tr><th>Feature</th><th>AEO</th><th>SEO</th></tr>
  </thead>
  <tbody>
    <tr><td>Optimization target</td><td>Citation outcomes</td><td>Ranked positions</td></tr>
    <tr><td>Primary signals</td><td>Schema, entity grounding</td><td>Keywords, backlinks</td></tr>
  </tbody>
</table>
Common mistakes
  • "X vs Y" comparison content written as alternating prose paragraphs instead of a table
  • Tables built with <div> styling instead of semantic <table> markup
  • Tables without proper <thead> and <tbody> sectioning
  • Comparison content spread across multiple sections instead of consolidated
How to fix
  1. Identify comparison-format pages
  2. Convert comparison sections from prose to <table> markup
  3. Use proper semantic table structure
  4. Don't over-correct: pages that aren't comparison-shaped don't benefit from artificial table structures
FACTOR 09

Citation Formatting Quality

Strongly Inferred Effort · Medium Layer 2 Layer 3
What it is

How you cite your own sources is a trust signal. Properly formatted citations — with consistent format, identifiable source attribution, accessible links, and dates — signal a quality content producer.

Why it matters for citation

AI systems infer source quality partly from how the source treats its own sources. A page that cites its references properly is treated as more credible — and is more likely to be cited itself. The signal compounds.

What compliance looks like
Recent research from Stanford's HAI Institute (2025) shows...
Recent research shows AI citation behavior varies significantly by platform.¹

¹ Stanford HAI Institute. AI Citation Patterns Report. 2025.
Common mistakes
  • Inconsistent citation format across the page
  • Vague attributions ("studies show," "research suggests," "according to experts")
  • Broken links — citations to URLs that no longer resolve
  • Undated citations — references without a year or publication date
  • Self-citation only (the page cites only the publishing site's other content)
How to fix
  1. Pick a citation format and document it as an editorial standard
  2. Audit existing content for citation quality; fix vague or broken citations
  3. Build citation-validation into the editorial pipeline
  4. Mix self-citation and external citation appropriately
FACTOR 10

Entity Density

Strongly Inferred Effort · Medium Layer 2 · Understanding
What it is

Entity density is the rate at which a page names recognizable entities — people, organizations, places, products, concepts — properly grounded in identifiable references (Schema.org sameAs, consistent naming, or contextual disambiguation). Entity-rich pages are more visible to AI systems than entity-thin pages of comparable length.

Why it matters for citation

AI systems use entity recognition as a primary topic-modeling signal. A page that names specific entities is easier to topic-classify, easier to entity-link, and easier to retrieve in entity-keyed queries.

What compliance looks like

Specific names instead of generic descriptors:

  • "Anthropic" instead of "the AI company"
  • "ChatGPT" instead of "the chatbot"
  • "Stanford's HAI Institute" instead of "a research institute"
  • "the 93-Factor AEO Taxonomy" instead of "our framework"
Common mistakes
  • Generic descriptors used where specific names are available
  • Same entity referenced inconsistently — pick one canonical form and use it consistently
  • Entity references without grounding (no sameAs, no Wikidata link)
  • Over-correction: artificially stuffing entity names where they don't naturally fit
How to fix
  1. Audit existing content for entity-naming patterns
  2. Replace generic descriptors with specific entity names where appropriate
  3. Establish canonical naming for repeating entities and use consistently
  4. Add sameAs JSON-LD linking for entities with authoritative external grounding
  5. Don't over-correct — natural-language flow is better than artificial entity-stuffing
FACTOR 11

Named Author Presence

Strongly Inferred (Established for Google AI surfaces) Effort · Low (per page) Layer 2 · Understanding
What it is

Pages with a named, identifiable author — with proper Person schema, a bio, credentials, and sameAs linking — outperform anonymous content in citation outcomes. The effect is especially pronounced in YMYL (Your Money or Your Life) categories: medical, financial, legal, real estate.

Why it matters for citation

AI systems weight author identity as a trust signal. An anonymous page is treated as lower-trust than the same content under a named, credentialed author. In YMYL categories, the elevation is more pronounced. Medical content from a named clinician outperforms anonymous medical content by a wide margin.

What compliance looks like
{
  "@type": "Person",
  "name": "Author Name",
  "jobTitle": "Editor in Chief",
  "url": "https://example.com/authors/author-name",
  "sameAs": [
    "https://www.linkedin.com/in/author-handle",
    "https://www.wikidata.org/wiki/Q1234567"
  ]
}
Common mistakes
  • Anonymous content (no author attribution at all)
  • Generic editorial team attribution ("by Editorial Team")
  • Named author but no bio page or credentials
  • Bio page exists but no sameAs linking
  • Inconsistent author identity across the site (the same person credited under three different name spellings)
How to fix
  1. Audit content for unsigned or generically-attributed pages
  2. Implement named author attribution for every meaningful page
  3. Build author bio pages with credentials, publication history, and external profile links
  4. Add Person schema to author bio pages and to article pages via author references
  5. Standardize canonical name spelling per author across the site
  6. In YMYL categories, ensure author credentials are visible and verifiable (license numbers, professional affiliations)
Order of Operations

You don't have to do all eleven at once.

The eleven factors aren't a checklist that has to be completed in one pass. Most teams should sequence the work — partly to keep the scope manageable, partly because some factors enable others.

Phase 01 · Foundation (week 1–2)

Start with the structural enablers.

  1. JSON-LD Structured Data (Factor 1) — set up Organization and Person schema sitewide; add type-appropriate schema per page
  2. Named Author Presence (Factor 11) — add named authors with bios and Person schema
  3. Proper Heading Hierarchy (Factor 4) — audit and fix heading-hierarchy issues across high-traffic pages

These factors enable everything else. Without JSON-LD, the entity and author work doesn't compound. Without proper heading hierarchy, front-loading and concise-blocks work doesn't extract cleanly.

Phase 02 · Structure (week 3–6)

Refactor structure on the highest-traffic answer-bearing pages.

  1. Front-Loaded Direct Answers (Factor 2) — move actual answers to the first 40–60 words
  2. Concise Answer Blocks (Factor 3) — refactor 200+ word blocks into 40–60 word units
  3. Definition & Summary Density (Factor 5) — add explicit definitions and summary blocks

This is where score gains start showing up. Factors 2, 3, and 5 are the dominant Layer 3 levers and pay off relatively fast — typically within a few weeks of AI re-crawl.

Phase 03 · Trust & Context (week 7+)

Layer the trust and richness factors.

  1. Statistics with Sources (Factor 6) — audit and source numerical claims
  2. Bullet and Numbered Lists (Factor 7) — convert prose-formatted enumerations to list markup
  3. HTML Tables for Comparisons (Factor 8) — convert prose comparisons to tables
  4. Citation Formatting Quality (Factor 9) — standardize citation format
  5. Entity Density (Factor 10) — replace generic descriptors with specific entity names

These compound the gains from Phase 2 and continue paying off across subsequent quarters as the editorial discipline takes hold.

This sequence isn't sacred. Some teams will benefit from re-ordering — for example, if your existing JSON-LD is already strong but author attribution is weak, start with Factor 11 instead of Factor 1. The point is to sequence deliberately, not to attempt all eleven simultaneously.

Your Next Step

See where you stand.

See exactly which of the Citation Core 11 factors are passing and failing on your real content. The free scan surfaces factor-by-factor results across the highest-impact subset.

Enter your domain
Free No signup Results in 60 seconds

Or — for the dependency model that explains why these eleven matter so much: Read the AI Visibility Stack →

Methodology · Citation Core 11