
LLMs Find You Through Entities, Not Keywords
When a user asks ChatGPT "what agency specializes in AI product development," the model does not run a keyword search. It performs entity resolution: matching structured data from its index against the query to find authoritative sources.
Most businesses optimize for keywords. We optimized for entities. We built a knowledge graph that defines 18 entities across two domains, connects them through JSON-LD @graph enrichment, and gives LLMs something concrete to cite. Here is exactly what we built, why we built it this way, and what happened in the first week.
TL;DR
- LLMs use entity resolution (not keyword matching) to decide which sources to cite in generated responses
- We built a knowledge graph with 18 entities across 2 domains (pixelmojo.io + lloydpilapil.com) connected via sameAs URIs
- The LLM Visibility Stack has 5 layers: Entity Foundation, Knowledge Graph, Content Signals, Machine-Readable Context, Multi-Source Authority
- Real analytics: +18.1% active users, +21.8% new users, referral traffic above GA4 forecasts, key events 3x above predicted range
- Correlation is not causation. We made other improvements simultaneously. But the referral and key event spikes align specifically with the knowledge graph deployment
- You can build this with a TypeScript entity registry, a schema engine, and auto-mapping from blog post tags. No external tools required
The Problem: Why Traditional SEO Alone Falls Short for LLMs
We have covered the tactical side of generative engine optimization across our 5-part GEO series. That series walks through how AI search is shifting traffic patterns, the technical playbook for getting cited, what actually changed our own AI search results, dynamic llms.txt implementation, and building brands that AI search engines recommend.
But all of those posts focus on content and delivery mechanisms. None of them address the foundational layer: how LLMs actually resolve entities and decide what is authoritative.
Here is the gap we identified:
- robots.txt and llms.txt tell crawlers what to index and summarize. They are access control and context delivery, not identity.
- FAQ schema and Article markup describe individual pages. They do not define the entities those pages are about.
- Topical authority through content clusters signals expertise, but it is implicit. LLMs have to infer your authority from content patterns rather than reading it directly from structured data.
What was missing was the infrastructure layer: a knowledge graph that explicitly defines who we are, what we do, and how everything connects. Not for Google (though it helps there too), but specifically for the retrieval and entity resolution phases of LLM pipelines.
How LLMs Actually Discover Sources
To understand why knowledge graphs matter for LLM citations, you need to understand how retrieval-augmented generation (RAG) actually works. Most explanations oversimplify this, so let us walk through the pipeline step by step.
HOW LLMS DISCOVER SOURCES
Simplified RAG pipeline showing where structured data matters most
User Query
"What agency does AI product development?"
Retrieval
LLM searches its index for relevant documents
Entity Resolution
YOUR DATA MATTERS HEREMatches structured data to resolve who/what entities are
Context Assembly
YOUR DATA MATTERS HERERanks and combines sources by authority signals
Response Generation
Synthesizes answer with citations from top-ranked sources
Steps 3 and 4 are where knowledge graphs win.
Without structured entity data, the LLM treats your site like every other page. With it, you give the model something concrete to resolve against: named entities, defined relationships, and explicit authority signals. This is the difference between "possibly relevant" and "authoritative source."
The critical insight is that steps 3 and 4 are where structured data creates separation. During entity resolution, the LLM is trying to match query concepts against its index. If your site has explicit DefinedTerm schemas for "Thread-Based Engineering" or "Generative Engine Optimization," the model can resolve those entities directly instead of inferring them from unstructured content.
During context assembly, the LLM ranks sources by authority signals. A site with a connected @graph of Organization, Person, Service, Product, and DefinedTerm entities provides stronger signals than a site with just Article schema on each page.
This is not speculation. The Princeton GEO study found that authoritative citations and structured claims improved visibility in generative engines by 30-40%. Knowledge graphs are how you make those signals machine-readable at scale.
The LLM Visibility Stack
After building our knowledge graph and analyzing what moved the needle across our GEO work, we identified five layers that determine whether AI search engines cite your content. We call this the LLM Visibility Stack.
THE LLM VISIBILITY STACK
Five layers that determine whether AI search engines cite your content
Cross-site entity linking, sameAs connections, consistent signals across domains
llms.txt, robots.txt AI directives, FAQ schema, structured data signals
Authoritative citations, statistical claims, quotable passages, topical depth
Entity definitions, @graph enrichment, topic clusters, relationship mapping
DefinedTerm schemas, Organization identity, Person profiles, sameAs URIs
Each layer amplifies the ones below it.
Most teams jump to Layer 4 (llms.txt) without Layers 1-2. That is like building a house starting from the roof. The entity foundation and knowledge graph are what make everything else meaningful to LLMs.
Layer 1: Entity Foundation
This is where most teams need to start and where most teams skip to Layer 4 instead. The entity foundation defines the core things your site is about using schema.org types:
- Organization with complete identity (name, description, url, logo, sameAs to social profiles)
- Person entities for key authors with knowsAbout, jobTitle, and worksFor connections
- DefinedTerm for proprietary methodologies or frameworks you have created
- SoftwareApplication for products
- Service for service offerings
Each entity gets a stable @id URI (like https://www.pixelmojo.io/#thread-based-engineering) that can be referenced from anywhere in your schema.
Layer 2: Knowledge Graph
This layer connects the entities from Layer 1 into a web of meaning. It is not enough to define entities in isolation. You need to express:
- Which entities are about which blog posts (and vice versa)
- Which entities mention other entities
- Which entities are related to each other
- How entities across different domains connect via sameAs
This is what transforms isolated schema markup into a knowledge graph. The @graph array on each page includes all entity definitions, and each Article schema gets enriched with about and mentions references to relevant entities.
Layers 3-5: Content, Machine-Readable Context, Multi-Source Authority
These layers build on the entity foundation. We covered them in detail across our GEO series:
- Layer 3 (Content Signals): Authoritative citations, statistical claims, and topical depth. See our GEO playbook.
- Layer 4 (Machine-Readable Context): llms.txt, AI crawler directives, and FAQ schema. See our llms.txt implementation guide.
- Layer 5 (Multi-Source Authority): Cross-site entity linking, which we will cover in the architecture section below.
What We Built: Architecture Walkthrough
Our knowledge graph connects two domains: pixelmojo.io (the agency) and lloydpilapil.com (the founder's personal site). Here is how the pieces fit together.
CROSS-SITE ENTITY LINKING
Two domains, one knowledge graph, connected via sameAs and shared @id URIs
Thread-Based Engineering
@type: DefinedTerm
AX Design
@type: DefinedTerm
GEO
@type: DefinedTerm
Lakbay AI
@type: SoftwareApplication
The sameAs bridge is bidirectional. When an LLM encounters "Lloyd Pilapil" on either site, JSON-LD connects it to the same Person entity. The organization, methodologies, and products all resolve to the same @id URIs regardless of which domain the LLM indexed first.
The Entity Registry
The core of the system is a TypeScript file (knowledge-graph.ts) that defines every entity as a structured object:
export const entities: Record<string, Entity> = {
'thread-based-engineering': {
id: 'thread-based-engineering',
name: 'Thread-Based Engineering',
type: 'Methodology',
schemaType: 'DefinedTerm',
description: 'Productivity and governance framework...',
relatedEntities: ['ai-technical-debt', 'claude-code-development'],
primaryPosts: ['thread-based-engineering-scaling-ai-development'],
mentionedInPosts: ['vibe-coding-technical-debt-crisis-2026-2027'],
keywords: ['thread-based-engineering', 'ai-governance'],
},
// ... 17 more entities
}
Each entity has:
- A stable @id that becomes its URI in JSON-LD
- primaryPosts and mentionedInPosts for manual relationship overrides
- keywords that enable automatic tag-based matching (more on this below)
- relatedEntities for the relationship graph
The Schema Engine
A resolver (schema-engine.ts) takes the entity registry and converts it into JSON-LD output. It does three things:
- Generates DefinedTerm fragments for the global @graph (included on every page)
- Auto-matches blog posts to entities based on tag overlap (2+ keyword matches =
about, 1 match =mentions) - Enriches Article schemas with
aboutandmentionsreferences
The auto-matching is the key feature that keeps the system maintainable. When we write a new blog post, we just include relevant tags. The schema engine automatically connects the post to the right entities. No manual editing of the knowledge graph file required for routine posts.
Cross-Site Linking
The personal site (lloydpilapil.com) has its own structured data with a Person entity that includes:
{
"@type": "Person",
"sameAs": [
"https://www.linkedin.com/in/lloydpilapil",
"https://www.pixelmojo.io/author/lloyd-pilapil"
],
"worksFor": {
"@type": "Organization",
"@id": "https://www.pixelmojo.io/#organization"
},
"knowsAbout": [
"Thread-Based Engineering",
"Generative Engine Optimization",
"AX Design"
]
}
The sameAs and worksFor properties create the bridge. When an LLM encounters "Lloyd Pilapil" on either domain, it can resolve both references to the same entity. The knowsAbout array connects the person to the exact DefinedTerm entities defined in the pixelmojo.io knowledge graph.
This is bidirectional: the pixelmojo.io Article schemas include an author reference that links back to the same Person @id. Two domains, one entity graph.
Dynamic llms.txt Integration
Our dynamic llms.txt consumes the knowledge graph at build time. The "Entity Context" section of llms.txt is generated directly from the entity registry, giving AI crawlers a plain-text summary of every entity, its relationships, and its primary content. When we add a new entity to the knowledge graph, llms.txt updates automatically on the next build.
The Results: Real Analytics
We deployed the cross-site knowledge graph on February 12, 2026. Here is what GA4 showed in the first seven days.
WHAT THE DATA SHOWS
GA4 analytics for pixelmojo.io, 7 days after deploying the knowledge graph
Last 7 days vs previous period
Last 7 days vs previous period
GA4 predicted 1-5 users; actual exceeded range
GA4 predicted 0-6 events; actual was 19
Honest caveat: These numbers correlate with deploying the knowledge graph, but correlation is not causation. We also published new content and made technical SEO improvements during the same period. The referral and key event anomalies specifically align with the knowledge graph deployment timeline, which is why we highlight them separately.
Let us be specific about what these numbers mean:
What went up:
- Active users increased 18.1% over the previous 7-day period (248 vs 210)
- New users increased 21.8% (229 vs 188)
- Referral traffic spiked above GA4's forecasted range: GA4 predicted 1-5 referral users, we got 6
- Key events through the Direct channel hit 19, where GA4 predicted 0-6 (over 3x the upper bound of the forecast)
What we cannot claim:
- We cannot isolate the knowledge graph's impact from other changes we made during the same period
- We published new content and made technical SEO updates simultaneously
- Referral and key event anomalies align with the deployment timeline, but that is correlation, not proof
What we think is happening: The referral traffic anomaly is the most interesting signal. Referral traffic means users coming from other sites that link to us. We did not build new backlinks during this period. The increase could indicate AI-assisted tools or platforms starting to surface our content, which gets counted as referral traffic in GA4. The key event spike through Direct traffic could indicate users arriving via AI chat interfaces (which often show as Direct in analytics).
We will continue monitoring and will update this post as the data matures.
To measure whether your own knowledge graph and entity definitions are making a difference, see our complete guide to free AI visibility tools. The AI Citation Tracker and llms.txt Validator are especially useful for tracking the impact of entity-level optimizations.
What Did Not Work
Other things that did not work as expected:
- Over-connecting entities. Our first version had every entity related to every other entity. This diluted the signal. We trimmed relationships to only meaningful connections (5-6 per entity, not 12).
- Manual post mapping. We started by manually adding every blog slug to entity
primaryPostsarrays. This became stale within a week. The auto-matching system based on tag keywords was the fix. - Expecting immediate indexing. LLMs do not re-index on your schedule. Some of our entities may not be in any LLM's index yet. This is infrastructure that compounds, not a launch-day win.
The Playbook: How to Build Your Own Knowledge Graph
If you want to replicate this approach, here is the sequence that worked for us:
-
Audit your entities. List every methodology, product, service, and project that your business is known for. If it has a name and could be a Wikipedia article, it is probably an entity.
-
Choose your schema types. Organization and Person are mandatory. Add DefinedTerm for frameworks, SoftwareApplication for products, Service for offerings. Avoid types that require data you do not have (aggregateRating without real reviews, for example).
-
Create stable @id URIs. Each entity needs a permanent identifier. We use the pattern
https://www.pixelmojo.io/#entity-id. These URIs do not need to resolve to a page. They are identifiers, not URLs. -
Define relationships. Map which entities relate to which. Keep it honest: 3-6 related entities per item is more useful than connecting everything to everything.
-
Build the auto-matcher. Define keywords for each entity. When a blog post's tags match 2+ keywords, the post automatically gets connected as
aboutthat entity. One match =mentions. This keeps the system alive without manual maintenance. -
Inject into every page. The @graph array with all entity DefinedTerm schemas should appear on every page, not just the homepage. Article pages additionally get
aboutandmentionsproperties linking to relevant entities. -
Connect your personal brand. If a founder or key person has their own domain, add sameAs and worksFor links that bridge the two sites. This creates the multi-source authority signal from Layer 5 of the stack.
-
Wire into llms.txt. If you have a dynamic llms.txt (and you should), generate the entity section from the same source of truth. One data source, multiple outputs.
Where This Goes Next
The knowledge graph is infrastructure, not a finished product. Here is what we are building on top of it:
- Analytics correlation tracking. We are building dashboards to track which entities appear in AI search citations over time, and correlating that with the GA4 anomaly data.
- Automated entity expansion. When we launch new products or define new methodologies, the knowledge graph should grow automatically from the same TypeScript source of truth.
- Cross-platform verification. Testing whether the same entities get cited differently across ChatGPT, Perplexity, Claude, and Gemini, and adjusting the schema based on what each model responds to.
If you are serious about AI search visibility, start with the entities. Build the foundation. The content and delivery layers (llms.txt, FAQ schema, topical depth) all work better when they have something real to stand on.
Want to see the knowledge graph in action? Check out our Vector lead qualification engine and Hive multi-agent platform, both of which are defined as entities in the graph. Or explore the full GEO series for the tactical playbook that sits on top of this infrastructure layer.
Continue the AI Search Playbook
Understand the three disciplines competing for AI search visibility
Tactical guide to the content and technical signals that drive AI citations
Build a dynamic llms.txt that updates automatically with your content
