Should you block AI training bots but allow search bots?

Our approach evolved. Initially we blocked all training bots to protect IP. As of 2026-05-16 we selectively allow training bots that feed AI engines we want citing us (CCBot, Google-Extended, anthropic-ai, Applebot-Extended) while still blocking data brokers and adversarial scrapers (cohere-ai, Meta-ExternalAgent, Bytespider, Diffbot, Omgili). Browsing bots (GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot) are always allowed. For a brand still earning ground in AI answers, brand recognition in future model generations matters more than blanket IP protection. Each AI platform offers independently controllable bots, so you can make this distinction.

How do you track AI referral traffic?

We built a custom AI referrer tracker that detects visits from ChatGPT, Perplexity, Claude, Gemini, and Copilot by matching the document.referrer against known AI domains. We also created a GA4 custom channel group with regex matching for these domains. Since June 2025, ChatGPT appends utm_source=chatgpt.com to links, which helps. The key insight: true AI influence is 2-3x what analytics shows, because mobile app visits and zero-click interactions do not pass attribution.

What mistakes did you make during GEO implementation?

Three main mistakes. First, we initially did not include OAI-SearchBot and Claude-SearchBot in our robots.txt, only adding the training bots. The search-specific bots are the ones that matter most for citations. Second, we underinvested in measurement early on and had to retrofit AI tracking after seeing unexplained traffic patterns. Third, we treated GEO as a one-time project rather than an ongoing discipline, which meant some optimizations went stale.

How many blog posts do you need for GEO to work?

Quality matters more than quantity. The Rank Masters saw 8,337% ChatGPT referral growth with just 42 pages published over 3 months. Our site has 39 posts, but the 13 posts organized into 4 series perform disproportionately better than the standalone posts. Start with one well-structured series of 3-4 posts covering a topic comprehensively, then expand from there.

Published: February 17, 2026•17 min read

We Optimized for AI Search. Here's What Changed.

Q: What is the most impactful GEO change you made?

Adding FAQPage JSON-LD schema to 37 of 39 blog posts was the highest-ROI change. It took approximately 15 minutes per post and research shows pages with FAQ schema are 3.2x more likely to appear in Google AI Overviews. The second most impactful change was restructuring content into multi-part series, which creates topical authority clusters that AI platforms treat as comprehensive resources.

We implemented every GEO tactic from our own playbook. Here is exactly what we built, what we changed, and what we learned about optimizing a real site for AI search citations.

by Lloyd Pilapil

We optimized Pixelmojo for AI search, showing growth trajectory and neural network data visualization

We Took Our Own Advice. Here Is Everything That Happened.

Throughout this series, we presented the data (Part 1), the framework (Part 2), and the tactical playbook (Part 3). Now we are showing our work.

This is not a theoretical exercise. We implemented every GEO tactic we recommended on our own site, pixelmojo.io. 39 blog posts. 15 service and product pages. A custom AI referrer tracking system. A 262-line llms.txt file. FAQPage schema on 37 posts. The deepest of those changes was retrofitting 21 posts for AI citation in one session.

This post documents exactly what we built, what we changed, and the lessons we learned doing it. No cherry-picked metrics. No fabricated case studies. Just the honest implementation story of a small AI product studio trying to get cited by the same platforms we build products on.

TL;DR

We implemented GEO across 39 blog posts and 15 service pages over 4 weeks
Our robots.txt now explicitly manages 18 AI crawler bots (10 allowed: 6 browsing + 4 curated training, 8 blocked: data brokers + adversarial scrapers)
FAQPage JSON-LD schema was added to 37 of 39 posts, the highest-ROI change at ~15 min per post
We created a 262-line llms.txt covering products, services, portfolio, and use policy
A custom AI referrer tracker detects ChatGPT, Perplexity, Claude, and Gemini traffic
Multi-part series (4 series, 13 posts) outperform standalone posts for AI citation potential

GEO implementation is not a weekend project and it is not a one-time project. It is a systematic discipline that touches your robots.txt, schema markup, content structure, and measurement stack. The good news: every individual change is small. The compound effect is significant.

The Audit: Where We Started vs Where We Ended

Before implementing GEO, we ran a full audit of our site against the checklist from Part 3. The results were humbling. We had been publishing content for over a year without any AI-specific optimization.

GEO Implementation Audit: Before vs After

Crawl Access (robots.txt)

Before

Basic robots.txt, no AI-specific rules

After

6 AI bots explicitly allowed, training bots blocked, key pages specified

llms.txt

Before

Did not exist

After

262-line Markdown file with products, services, portfolio, URLs, use policy

AI Capabilities Factsheet

Before

Did not exist

After

Explicit corrections for common AI misinterpretations of our services

FAQ Schema (JSON-LD)

Before

0 posts with FAQPage schema

After

37 of 39 posts with BlogFAQ components and FAQPage JSON-LD

Article Schema

Before

Basic metadata only

After

Full Article JSON-LD on every post (author, publisher, dates, images)

AI Referrer Tracking

Before

No AI traffic tracking

After

Custom tracker detecting ChatGPT, Perplexity, Claude, Gemini, Copilot referrals

Content Structure

Before

Mixed quality, no consistent format

After

Answer-first, TLDR boxes, comparison tables, 3,000-5,000 words per post

Series-Based Content

Before

Standalone posts

After

4 multi-part series (4+4+3+2 = 13 interlinked posts)

The audit revealed eight categories where we had zero or partial coverage. None of this is unusual for a site that grew organically. But it meant that AI crawlers were either being blocked, or finding our content without the structural signals they need to select it as a citation source.

Change 1: Rebuilding robots.txt for the AI Crawler Ecosystem

Our original robots.txt was basic. It allowed Googlebot, disallowed admin pages, and that was about it. We had no AI-specific rules.

The rebuild took less than a day but required careful decisions. The key distinction we initially made: allow search bots, block training bots. That position evolved in May 2026 to allow search bots, selectively allow training bots that feed AI engines we want citing us, block data brokers.

Here is the current logic.

Browsing bots — always allowed (these power AI search results):

GPTBot and ChatGPT-User (OpenAI search)
ClaudeBot and Claude-Web (Anthropic)
PerplexityBot (Perplexity search)
GoogleOther (Google AI features)

Training bots we now allow (these feed AI engines we want citing us):

CCBot (Common Crawl, used by most LLMs as training input)
Google-Extended (Gemini training)
anthropic-ai (Anthropic training)
Applebot-Extended (Apple Intelligence training)

Bots we still block (data brokers and adversarial scrapers):

cohere-ai, Meta-ExternalAgent, FacebookBot, Bytespider, Diffbot, Omgili

Why the shift: when we first published this, we blocked all training bots to protect IP. As Pixelmojo's citation strategy matured, we recognized brand recognition in future model generations matters more than blanket IP protection for a new brand still earning ground in AI answers. The four allowed training bots feed the AI engines we want to be cited by; the blocked bots either resell scraped data (Diffbot, Omgili) or train models we have no strategic alignment with.

We also explicitly surfaced our most important pages. Instead of just Allow: /, we listed the specific directories that contain our highest-value content: /blogs/, /services/, /projects/, /about/, /pricing/, and our product pages (/vector, /hive).

Lesson learned: We initially missed OAI-SearchBot and Claude-SearchBot (the dedicated search bots), only listing the general-purpose bots. These search-specific bots are arguably the most important ones to allow, since they are the ones that power citation results. Always check OpenAI's crawler documentation and Perplexity's bot guide for the latest bot names.

Change 2: Creating llms.txt from Scratch

Before we started, our llms.txt file did not exist. Now it is a 262-line Markdown document that serves as a curated guide for AI systems trying to understand what Pixelmojo does.

The file includes:

Company overview: What we build, how we work, our architecture model
Products: Vector (lead qualification) and Hive (AI co-workers) with pricing, features, and URLs
Services: Sprint packages, retainers, and custom development
Technology stack: Exact frameworks, databases, AI tools we use
Portfolio: Six projects with descriptions and links
Featured content: Our blog series organized by topic
Use policy: Explicitly stating what is allowed (citing with attribution) and what is not (model training)

The use policy section is worth highlighting. We explicitly told AI systems:

Allowed: Citing content with attribution. Including in AI-assisted answers with source links. Indexing for search with attribution.

Not allowed: Model training or fine-tuning on our content. Verbatim republishing. Commercial redistribution.

This is not legally binding in the way robots.txt is technically respected. But it sets a clear expectation for how we want our content used.

We also created an ai-capabilities-factsheet.txt that directly addresses a problem we noticed: AI systems sometimes describe Pixelmojo as "primarily a marketing/branding agency," which is incomplete. The factsheet explicitly corrects this misinterpretation with structured data about our full software development, AI, and infrastructure capabilities.

Time investment: About 2 hours for llms.txt, 1 hour for the capabilities factsheet.

“The honest caveat on llms.txt: no major AI platform has officially confirmed reading these files. Over 844,000 websites have adopted it. The implementation cost is near zero. We think of it as cheap insurance with asymmetric upside.”

Pixelmojo Implementation Notes, 2026

Change 3: FAQ Schema on (Almost) Every Post

This was the highest-ROI change we made. Adding FAQPage JSON-LD schema to our blog posts took approximately 15 minutes per post, and the research from Part 3 shows pages with FAQ schema are 3.2x more likely to appear in Google AI Overviews.

We added FAQ sections to 37 of 39 blog posts (the two exceptions are very short posts that did not have natural FAQ material). Each FAQ section includes 6 to 10 questions with comprehensive answers, wrapped in both a visual BlogFAQ component for readers and FAQPage JSON-LD in the frontmatter for machines.

Every post also has Article JSON-LD with author, publisher, dates, and image information. This reinforces the E-E-A-T signals that AI platforms use for trust evaluation.

Schema Type	Coverage	Implementation Time	Impact
FAQPage JSON-LD	37/39 posts (95%)	~15 min per post	3.2x more likely in AI Overviews
Article JSON-LD	39/39 posts (100%)	Built into template	E-E-A-T signal for all AI platforms
Organization schema	Service pages	~30 min total	Brand entity recognition

The investment math is simple: 37 posts at 15 minutes each = approximately 9 hours of work. If even one additional AI citation per month leads to a qualified lead, the ROI is significant for a B2B service business.

Change 4: Series-Based Content Architecture

This was the most strategic change and the one we believe has the strongest compound effect.

Instead of publishing standalone blog posts on random topics, we reorganized our content into multi-part series. Each series covers a topic comprehensively across 2 to 4 interlinked posts.

Content Cluster Strategy: 13 Posts Across 4 Series

AI Search Playbook

4 parts

1. Traffic shift data

2. SEO/GEO/AEO framework

3. GEO tactics

4. Our results

AI Technical Debt

4 parts

1. Vibe coding crisis

2. Thread-based engineering

3. Claude Code guide

4. Case study

AI Native Agency

3 parts

1. Agency comparison

2. Why B2B brands move

3. Budget guide

Growth Marketing

2 parts

1. Traditional vs growth

2. AI growth definitive guide

Why series work for GEO: AI platforms treat interlinked series as topical authority clusters. Each post reinforces the others, creating compound citation potential.

Why series work for GEO: AI platforms treat interlinked content clusters as topical authority signals. When ChatGPT or Perplexity encounters a 4-part series on AI search optimization (this series), it infers that the publisher has deep expertise on the topic. A single standalone post on the same topic does not carry the same authority weight.

Each series follows a deliberate structure:

Part 1: Present the problem with data (hooks the reader and the AI)
Part 2: Framework or comparison (establishes authority through analysis)
Part 3: Tactical playbook (provides actionable, citable content)
Part 4: Case study or results (demonstrates real-world application)

This mirrors how AI research workflows function. When a user asks ChatGPT "how do I optimize for AI search?", the AI breaks that into sub-queries: "what is the problem?", "what are the options?", "how do I do it?", "who has done it?". A 4-part series that answers each of these sub-queries has a structural advantage over a single post trying to cover everything.

We currently have 4 series totaling 13 posts, plus 26 standalone posts covering other topics. The series posts consistently perform better for topic-relevant queries.

Change 5: Building Custom AI Referrer Tracking

You cannot optimize what you cannot measure. We built a custom AI referrer tracking system that detects visits from specific AI platforms and categorizes them separately from generic referral traffic.

The tracker matches document.referrer against known AI domains:

Domain	Source Category	Notes
chatgpt.com	ChatGPT	Also matches chat.openai.com
perplexity.ai	Perplexity	Answer engine traffic
claude.ai	Claude	Anthropic assistant traffic
gemini.google.com	Gemini	Formerly bard.google.com
bing.com/chat	Copilot	Microsoft Copilot chat

This data feeds into our analytics alongside standard traffic sources. Without it, AI-referred visits would be categorized as generic "Referral" or (worse) "Direct" traffic in GA4, invisible in aggregate reports.

Key insight from measurement: True AI influence on your traffic is likely 2-3x what analytics reports, according to Seer Interactive. Mobile app visits from ChatGPT, zero-click AI interactions where the AI summarizes your content without linking, and AI Overviews that do not pass referrer data all create blind spots. What you see in analytics is the floor, not the ceiling.

The Full Technical Stack

Here is the complete picture of what our GEO infrastructure looks like after implementation.

Our GEO Technical Stack

Crawl Layer

robots.txt: 6 AI bots allowed, training bots blocked
XML sitemap with accurate lastmod dates
Key pages explicitly surfaced (/blogs, /services, /projects)

Discovery Layer

llms.txt: 262 lines covering products, services, portfolio
ai-capabilities-factsheet.txt: Corrects AI misinterpretations
Structured URLs with descriptive slugs

Schema Layer

FAQPage JSON-LD on 37/39 blog posts
Article JSON-LD with author, publisher, dates
Organization schema on service pages

Measurement Layer

Custom AI referrer tracker (ChatGPT, Perplexity, Claude, Gemini)
UTM parameter detection for chatgpt.com sources
Session-level AI attribution in analytics

Every layer serves a distinct purpose. The crawl layer controls who can access our content. The discovery layer tells AI systems where our best content lives and how to interpret our business. The schema layer provides structured data that AI platforms can extract and present. The measurement layer tells us whether it is working.

What We Did Not Do (And Why)

Transparency matters. Here is what we deliberately chose not to implement:

We did not buy AI citation monitoring tools. Tools like Otterly.AI ($29/month) and Profound (enterprise, contact sales) exist, but for a small studio, manual testing across ChatGPT, Perplexity, and Claude once per week gives us sufficient signal. We will invest in tooling when our AI referral volume justifies it.

We did not create content specifically for AI training. Some GEO guides recommend creating "training-optimized content" designed to influence how AI models represent your brand. We think this is premature. We focused on making our existing content excellent, well-structured, and well-sourced. If the content is genuinely useful for humans, it will be useful for AI citations.

We did not chase every AI platform. We focused on ChatGPT, Perplexity, and Google AI Overviews because they account for 90%+ of AI search referral traffic. Claude, Gemini, and Copilot are growing, but the citation mechanics are similar enough that optimizing for the top three covers the rest.

“The best GEO strategy is also the best content strategy: create genuinely useful, well-structured, well-sourced content that answers real questions. AI platforms cite the same content that experts would recommend. There is no shortcut and no trick.”

Pixelmojo, 2026

6 Lessons We Learned

After implementing GEO across our entire site, here are the lessons that mattered most.

6 Lessons From Our GEO Implementation

Series beat standalone posts

AI platforms treat interlinked content clusters as topical authority. 4-part series generate more citations than 4 disconnected posts.

FAQ schema is the highest-ROI change

15 minutes per post to add FAQPage JSON-LD. 3.2x more likely to appear in AI Overviews. We added it to 37 posts.

llms.txt is cheap insurance

Took 2 hours to create. Zero confirmed AI platforms read it yet. But 844,000 sites adopted it, and the downside is zero.

Block training, allow search

Our robots.txt blocks training bots (CCBot, Google-Extended) but allows search bots (GPTBot, PerplexityBot). We keep our content ours while staying discoverable.

Track AI traffic separately from day one

Our custom AI referrer tracker differentiates ChatGPT, Perplexity, Claude, and Gemini traffic. Without this, AI visits disappear into "Direct" or "Referral."

Answer-first writing is a discipline

Putting the conclusion in paragraph 1 feels unnatural. But data-backed content with statistics in the opening gets cited 67% more by AI. We restructured every post.

Lesson 1: Series Beat Standalone Posts

We cannot overstate this. Our 13 series posts consistently outperform our 26 standalone posts for topic-relevant AI queries. The inter-linking between parts creates a topical authority cluster that AI platforms recognize and reward.

If you take one structural change from this post, it should be: reorganize your content into series.

Lesson 2: FAQ Schema Is the Highest-ROI Change

At 15 minutes per post, adding FAQPage JSON-LD to your existing content is the most time-efficient GEO optimization available. The 3.2x increase in AI Overview likelihood is a documented, measured effect. Every blog post you publish without FAQ schema is leaving citations on the table.

Lesson 3: Separate Search Bots from Training Bots

Your robots.txt should make a clear distinction between AI bots that power search results and AI bots that collect training data. You can maintain full visibility in AI search while protecting your content from being used to train competing models.

Lesson 4: Measure AI Traffic from Day One

We made the mistake of not implementing AI referrer tracking immediately. By the time we did, we had months of AI-referred visits categorized as generic referral or direct traffic. The data was not lost, but it was much harder to reconstruct. Set up tracking before you start optimizing.

Lesson 5: Answer-First Writing Is a Discipline

Restructuring every blog post to put the conclusion in the opening paragraph is uncomfortable. As writers, we are trained to build arguments gradually. But the data is clear: answer-first content gets cited 67% more often. Every post in this series starts with the key insight, then supports it with evidence.

Lesson 6: GEO Is Not a One-Time Project

This was our biggest misconception. We initially treated GEO as a project with a start and end date. In reality, it is an ongoing discipline. Content freshness matters for AI citations. New AI crawlers appear regularly. Platform citation mechanics evolve. Your robots.txt, llms.txt, and content need periodic review.

We now review our GEO implementation monthly. The checklist takes 30 minutes.

Free Tool

Can AI Bots Find Your Content?

Test how GPTBot, Claude, Perplexity, and 11 other bots see your website. Checks robots.txt, structured data, llms.txt, and content accessibility.

Try the AI Crawl Checker

If you want to replicate this audit process without building custom tracking, our free AI visibility tools cover bot access testing, citation tracking, Reddit monitoring, and llms.txt validation in a single workflow.

The Honest Assessment

We are not going to claim that GEO transformed our business overnight. That would be dishonest. Here is what we can say with confidence:

What we know worked: Our content is now structurally optimized for AI extraction. Every post has FAQ schema, answer-first structure, source citations, and verifiable statistics. Our robots.txt correctly manages AI crawler access. Our llms.txt gives AI systems a curated guide to our best content.

What we are still measuring: The long-term impact on AI referral volume and quality. Attribution is genuinely difficult, and as we noted throughout this series, true AI influence is 2-3x what analytics shows. We are tracking trends rather than absolute numbers.

What we would do differently: Start with measurement. We optimized content before we had proper AI traffic tracking in place, which meant we could not cleanly measure the before-and-after impact. If we were starting today, week 1 would be AI referrer tracking, week 2 would be technical infrastructure, and weeks 3-4 would be content optimization.

The case studies from Part 3 show what is possible: Go Fish Digital saw 3x lead growth in 3 months. The Rank Masters saw 8,337% ChatGPT referral growth. Seer Interactive documented 15.9% conversion rates from ChatGPT traffic versus 1.76% from Google organic.

We expect similar directional results as our implementation matures. The fundamentals are the same: make your content genuinely excellent, structurally accessible to AI, and measurable.

GEO Implementation: Questions Readers Ask

Common questions about this topic, answered.

Our implementation took approximately 4 weeks for a site with 39 blog posts and 15 service/product pages. Week 1 was technical infrastructure (robots.txt, llms.txt, AI referrer tracking). Weeks 2 and 3 were content optimization (FAQ schema on 37 posts, answer-first rewrites, series-based content). Week 4 was measurement setup and iteration. The robots.txt and llms.txt changes took less than a day each. FAQ schema was the biggest time investment at roughly 15 minutes per post.

Adding FAQPage JSON-LD schema to your blog posts. It takes approximately 15 minutes per post and research shows pages with FAQ schema are 3.2x more likely to appear in Google AI Overviews. The second most impactful change is restructuring content into multi-part series, which creates topical authority clusters that AI platforms treat as comprehensive resources on a topic.

This is the approach we took and recommend. Allow search and browsing bots (GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot) so your content appears in AI search results. Block training bots (CCBot, Google-Extended, anthropic-ai, Meta-ExternalAgent) to prevent your content from being used to train models without compensation. Each AI platform offers independently controllable bots, so you can make this distinction cleanly.

Our honest assessment: the implementation cost is near zero (we spent about 2 hours), and over 844,000 websites have adopted it. No major AI platform has officially confirmed reading these files. But the downside risk is literally zero, and if platforms start using it, early adopters will have an advantage. We think of it as cheap insurance with asymmetric upside.

We built a custom tracker that detects visits from ChatGPT (chatgpt.com, chat.openai.com), Perplexity (perplexity.ai), Claude (claude.ai), Gemini (gemini.google.com), and Copilot (bing.com/chat) by matching document.referrer. We also set up a GA4 custom channel group with regex matching. Since June 2025, ChatGPT appends utm_source=chatgpt.com to links. Key caveat: true AI influence is 2-3x what analytics shows due to mobile app visits and zero-click interactions.

Multi-part series with answer-first paragraphs, verifiable statistics, and FAQ sections. Our 4-part series perform better than standalone posts because AI platforms treat interlinked content clusters as topical authority. Each post should be 3,000 to 5,000 words with proper H2/H3 hierarchy, comparison tables, source citations, and expert quotations. The Princeton GEO study showed these elements each improve visibility by 30-40%.

Three main mistakes. First, we initially did not include OAI-SearchBot and Claude-SearchBot in our robots.txt, only adding the general-purpose bots. The search-specific bots matter most for citations. Second, we did not implement AI traffic tracking before starting optimization, so we could not cleanly measure the before-and-after impact. Third, we initially treated GEO as a one-time project rather than an ongoing discipline.

Quality matters more than quantity. The Rank Masters saw 8,337% ChatGPT referral growth with 42 pages over 3 months. Our site has 39 posts, but the 13 organized into series perform disproportionately better. Start with one well-structured series of 3-4 posts covering a topic comprehensively. Add FAQ schema, answer-first structure, and source citations. Then expand from there.

The Complete AI Search Playbook

This concludes our 4-part series. Here is the complete roadmap:

The shift from traditional SEO to AI search is not coming. It is here. The businesses that implement GEO now will build the topical authority and brand signals that compound over time. The fundamentals are simple: make your content genuinely useful, structurally accessible to AI, and measurable.

Ready to implement GEO for your business?

AI Lead Qualification

See how AI traffic converts with the right system

Get a GEO audit for your business

About the Author

Lloyd Pilapil

Founder & AI Product Architect at Pixelmojo

Lloyd Pilapil is the founder of Pixelmojo and a former Salesforce engineer who builds production AI systems for B2B companies. He writes about agentic AI, multi-agent orchestration, AX (Agentic Experience) design, GEO, and Thread-Based Engineering. His work focuses on shipping AI products that generate revenue, not prototypes.

Expertise

Agentic AI SystemsMulti-Agent OrchestrationAX DesignGEO & AI SearchThread-Based EngineeringAI Product DevelopmentGrowth MarketingUI/UX Design

We Took Our Own Advice. Here Is Everything That Happened.

Throughout this series, we presented the data (Part 1), the framework (Part 2), and the tactical playbook (Part 3). Now we are showing our work.

TL;DR

We implemented GEO across 39 blog posts and 15 service pages over 4 weeks
Our robots.txt now explicitly manages 18 AI crawler bots (10 allowed: 6 browsing + 4 curated training, 8 blocked: data brokers + adversarial scrapers)
FAQPage JSON-LD schema was added to 37 of 39 posts, the highest-ROI change at ~15 min per post
We created a 262-line llms.txt covering products, services, portfolio, and use policy
A custom AI referrer tracker detects ChatGPT, Perplexity, Claude, and Gemini traffic
Multi-part series (4 series, 13 posts) outperform standalone posts for AI citation potential

The Audit: Where We Started vs Where We Ended

GEO Implementation Audit: Before vs After

Crawl Access (robots.txt)

Before

Basic robots.txt, no AI-specific rules

After

6 AI bots explicitly allowed, training bots blocked, key pages specified

llms.txt

Before

Did not exist

After

262-line Markdown file with products, services, portfolio, URLs, use policy

AI Capabilities Factsheet

Before

Did not exist

After

Explicit corrections for common AI misinterpretations of our services

FAQ Schema (JSON-LD)

Before

0 posts with FAQPage schema

After

37 of 39 posts with BlogFAQ components and FAQPage JSON-LD

Article Schema

Before

Basic metadata only

After

Full Article JSON-LD on every post (author, publisher, dates, images)

AI Referrer Tracking

Before

No AI traffic tracking

After

Custom tracker detecting ChatGPT, Perplexity, Claude, Gemini, Copilot referrals

Content Structure

Before

Mixed quality, no consistent format

After

Answer-first, TLDR boxes, comparison tables, 3,000-5,000 words per post

Series-Based Content

Before

Standalone posts

After

4 multi-part series (4+4+3+2 = 13 interlinked posts)

Change 1: Rebuilding robots.txt for the AI Crawler Ecosystem

Our original robots.txt was basic. It allowed Googlebot, disallowed admin pages, and that was about it. We had no AI-specific rules.

Here is the current logic.

Browsing bots — always allowed (these power AI search results):

GPTBot and ChatGPT-User (OpenAI search)
ClaudeBot and Claude-Web (Anthropic)
PerplexityBot (Perplexity search)
GoogleOther (Google AI features)

Training bots we now allow (these feed AI engines we want citing us):

CCBot (Common Crawl, used by most LLMs as training input)
Google-Extended (Gemini training)
anthropic-ai (Anthropic training)
Applebot-Extended (Apple Intelligence training)

Bots we still block (data brokers and adversarial scrapers):

cohere-ai, Meta-ExternalAgent, FacebookBot, Bytespider, Diffbot, Omgili

Change 2: Creating llms.txt from Scratch

Before we started, our llms.txt file did not exist. Now it is a 262-line Markdown document that serves as a curated guide for AI systems trying to understand what Pixelmojo does.

The file includes:

Company overview: What we build, how we work, our architecture model
Products: Vector (lead qualification) and Hive (AI co-workers) with pricing, features, and URLs
Services: Sprint packages, retainers, and custom development
Technology stack: Exact frameworks, databases, AI tools we use
Portfolio: Six projects with descriptions and links
Featured content: Our blog series organized by topic
Use policy: Explicitly stating what is allowed (citing with attribution) and what is not (model training)

The use policy section is worth highlighting. We explicitly told AI systems:

Allowed: Citing content with attribution. Including in AI-assisted answers with source links. Indexing for search with attribution.

Not allowed: Model training or fine-tuning on our content. Verbatim republishing. Commercial redistribution.

This is not legally binding in the way robots.txt is technically respected. But it sets a clear expectation for how we want our content used.

Time investment: About 2 hours for llms.txt, 1 hour for the capabilities factsheet.

Pixelmojo Implementation Notes, 2026

Change 3: FAQ Schema on (Almost) Every Post

Every post also has Article JSON-LD with author, publisher, dates, and image information. This reinforces the E-E-A-T signals that AI platforms use for trust evaluation.

Schema Type	Coverage	Implementation Time	Impact
FAQPage JSON-LD	37/39 posts (95%)	~15 min per post	3.2x more likely in AI Overviews
Article JSON-LD	39/39 posts (100%)	Built into template	E-E-A-T signal for all AI platforms
Organization schema	Service pages	~30 min total	Brand entity recognition

Change 4: Series-Based Content Architecture

This was the most strategic change and the one we believe has the strongest compound effect.

Instead of publishing standalone blog posts on random topics, we reorganized our content into multi-part series. Each series covers a topic comprehensively across 2 to 4 interlinked posts.

Content Cluster Strategy: 13 Posts Across 4 Series

AI Search Playbook

4 parts

1. Traffic shift data

2. SEO/GEO/AEO framework

3. GEO tactics

4. Our results

AI Technical Debt

4 parts

1. Vibe coding crisis

2. Thread-based engineering

3. Claude Code guide

4. Case study

AI Native Agency

3 parts

1. Agency comparison

2. Why B2B brands move

3. Budget guide

Growth Marketing

2 parts

1. Traditional vs growth

2. AI growth definitive guide

Why series work for GEO: AI platforms treat interlinked series as topical authority clusters. Each post reinforces the others, creating compound citation potential.

Each series follows a deliberate structure:

Part 1: Present the problem with data (hooks the reader and the AI)
Part 2: Framework or comparison (establishes authority through analysis)
Part 3: Tactical playbook (provides actionable, citable content)
Part 4: Case study or results (demonstrates real-world application)

We currently have 4 series totaling 13 posts, plus 26 standalone posts covering other topics. The series posts consistently perform better for topic-relevant queries.

Change 5: Building Custom AI Referrer Tracking

The tracker matches document.referrer against known AI domains:

Domain	Source Category	Notes
chatgpt.com	ChatGPT	Also matches chat.openai.com
perplexity.ai	Perplexity	Answer engine traffic
claude.ai	Claude	Anthropic assistant traffic
gemini.google.com	Gemini	Formerly bard.google.com
bing.com/chat	Copilot	Microsoft Copilot chat

The Full Technical Stack

Here is the complete picture of what our GEO infrastructure looks like after implementation.

Our GEO Technical Stack

Crawl Layer

robots.txt: 6 AI bots allowed, training bots blocked
XML sitemap with accurate lastmod dates
Key pages explicitly surfaced (/blogs, /services, /projects)

Discovery Layer

llms.txt: 262 lines covering products, services, portfolio
ai-capabilities-factsheet.txt: Corrects AI misinterpretations
Structured URLs with descriptive slugs

Schema Layer

FAQPage JSON-LD on 37/39 blog posts
Article JSON-LD with author, publisher, dates
Organization schema on service pages

Measurement Layer

Custom AI referrer tracker (ChatGPT, Perplexity, Claude, Gemini)
UTM parameter detection for chatgpt.com sources
Session-level AI attribution in analytics

What We Did Not Do (And Why)

Transparency matters. Here is what we deliberately chose not to implement:

Pixelmojo, 2026

6 Lessons We Learned

After implementing GEO across our entire site, here are the lessons that mattered most.

6 Lessons From Our GEO Implementation

Series beat standalone posts

AI platforms treat interlinked content clusters as topical authority. 4-part series generate more citations than 4 disconnected posts.

FAQ schema is the highest-ROI change

15 minutes per post to add FAQPage JSON-LD. 3.2x more likely to appear in AI Overviews. We added it to 37 posts.

llms.txt is cheap insurance

Took 2 hours to create. Zero confirmed AI platforms read it yet. But 844,000 sites adopted it, and the downside is zero.

Block training, allow search

Our robots.txt blocks training bots (CCBot, Google-Extended) but allows search bots (GPTBot, PerplexityBot). We keep our content ours while staying discoverable.

Track AI traffic separately from day one

Our custom AI referrer tracker differentiates ChatGPT, Perplexity, Claude, and Gemini traffic. Without this, AI visits disappear into "Direct" or "Referral."

Answer-first writing is a discipline

Putting the conclusion in paragraph 1 feels unnatural. But data-backed content with statistics in the opening gets cited 67% more by AI. We restructured every post.

Lesson 1: Series Beat Standalone Posts

If you take one structural change from this post, it should be: reorganize your content into series.

Lesson 2: FAQ Schema Is the Highest-ROI Change

Lesson 3: Separate Search Bots from Training Bots

Lesson 4: Measure AI Traffic from Day One

Lesson 5: Answer-First Writing Is a Discipline

Lesson 6: GEO Is Not a One-Time Project

We now review our GEO implementation monthly. The checklist takes 30 minutes.

Free Tool

Can AI Bots Find Your Content?

Test how GPTBot, Claude, Perplexity, and 11 other bots see your website. Checks robots.txt, structured data, llms.txt, and content accessibility.

Try the AI Crawl Checker

The Honest Assessment

We are not going to claim that GEO transformed our business overnight. That would be dishonest. Here is what we can say with confidence:

We expect similar directional results as our implementation matures. The fundamentals are the same: make your content genuinely excellent, structurally accessible to AI, and measurable.

GEO Implementation: Questions Readers Ask

Common questions about this topic, answered.

The Complete AI Search Playbook

This concludes our 4-part series. Here is the complete roadmap:

Ready to implement GEO for your business?

AI Lead Qualification

See how AI traffic converts with the right system

Get a GEO audit for your business

About the Author

Lloyd Pilapil

Founder & AI Product Architect at Pixelmojo

Expertise

Agentic AI SystemsMulti-Agent OrchestrationAX DesignGEO & AI SearchThread-Based EngineeringAI Product DevelopmentGrowth MarketingUI/UX Design

We Took Our Own Advice. Here Is Everything That Happened.

Throughout this series, we presented the data (Part 1), the framework (Part 2), and the tactical playbook (Part 3). Now we are showing our work.

TL;DR

We implemented GEO across 39 blog posts and 15 service pages over 4 weeks
Our robots.txt now explicitly manages 18 AI crawler bots (10 allowed: 6 browsing + 4 curated training, 8 blocked: data brokers + adversarial scrapers)
FAQPage JSON-LD schema was added to 37 of 39 posts, the highest-ROI change at ~15 min per post
We created a 262-line llms.txt covering products, services, portfolio, and use policy
A custom AI referrer tracker detects ChatGPT, Perplexity, Claude, and Gemini traffic
Multi-part series (4 series, 13 posts) outperform standalone posts for AI citation potential

The Audit: Where We Started vs Where We Ended

GEO Implementation Audit: Before vs After

Crawl Access (robots.txt)

Before

Basic robots.txt, no AI-specific rules

After

6 AI bots explicitly allowed, training bots blocked, key pages specified

llms.txt

Before

Did not exist

After

262-line Markdown file with products, services, portfolio, URLs, use policy

AI Capabilities Factsheet

Before

Did not exist

After

Explicit corrections for common AI misinterpretations of our services

FAQ Schema (JSON-LD)

Before

0 posts with FAQPage schema

After

37 of 39 posts with BlogFAQ components and FAQPage JSON-LD

Article Schema

Before

Basic metadata only

After

Full Article JSON-LD on every post (author, publisher, dates, images)

AI Referrer Tracking

Before

No AI traffic tracking

After

Custom tracker detecting ChatGPT, Perplexity, Claude, Gemini, Copilot referrals

Content Structure

Before

Mixed quality, no consistent format

After

Answer-first, TLDR boxes, comparison tables, 3,000-5,000 words per post

Series-Based Content

Before

Standalone posts

After

4 multi-part series (4+4+3+2 = 13 interlinked posts)

Change 1: Rebuilding robots.txt for the AI Crawler Ecosystem

Our original robots.txt was basic. It allowed Googlebot, disallowed admin pages, and that was about it. We had no AI-specific rules.

Here is the current logic.

Browsing bots — always allowed (these power AI search results):

GPTBot and ChatGPT-User (OpenAI search)
ClaudeBot and Claude-Web (Anthropic)
PerplexityBot (Perplexity search)
GoogleOther (Google AI features)

Training bots we now allow (these feed AI engines we want citing us):

CCBot (Common Crawl, used by most LLMs as training input)
Google-Extended (Gemini training)
anthropic-ai (Anthropic training)
Applebot-Extended (Apple Intelligence training)

Bots we still block (data brokers and adversarial scrapers):

cohere-ai, Meta-ExternalAgent, FacebookBot, Bytespider, Diffbot, Omgili

Change 2: Creating llms.txt from Scratch

Before we started, our llms.txt file did not exist. Now it is a 262-line Markdown document that serves as a curated guide for AI systems trying to understand what Pixelmojo does.

The file includes:

Company overview: What we build, how we work, our architecture model
Products: Vector (lead qualification) and Hive (AI co-workers) with pricing, features, and URLs
Services: Sprint packages, retainers, and custom development
Technology stack: Exact frameworks, databases, AI tools we use
Portfolio: Six projects with descriptions and links
Featured content: Our blog series organized by topic
Use policy: Explicitly stating what is allowed (citing with attribution) and what is not (model training)

The use policy section is worth highlighting. We explicitly told AI systems:

Allowed: Citing content with attribution. Including in AI-assisted answers with source links. Indexing for search with attribution.

Not allowed: Model training or fine-tuning on our content. Verbatim republishing. Commercial redistribution.

This is not legally binding in the way robots.txt is technically respected. But it sets a clear expectation for how we want our content used.

Time investment: About 2 hours for llms.txt, 1 hour for the capabilities factsheet.

Pixelmojo Implementation Notes, 2026

Change 3: FAQ Schema on (Almost) Every Post

Every post also has Article JSON-LD with author, publisher, dates, and image information. This reinforces the E-E-A-T signals that AI platforms use for trust evaluation.

Schema Type	Coverage	Implementation Time	Impact
FAQPage JSON-LD	37/39 posts (95%)	~15 min per post	3.2x more likely in AI Overviews
Article JSON-LD	39/39 posts (100%)	Built into template	E-E-A-T signal for all AI platforms
Organization schema	Service pages	~30 min total	Brand entity recognition

Change 4: Series-Based Content Architecture

This was the most strategic change and the one we believe has the strongest compound effect.

Instead of publishing standalone blog posts on random topics, we reorganized our content into multi-part series. Each series covers a topic comprehensively across 2 to 4 interlinked posts.

Content Cluster Strategy: 13 Posts Across 4 Series

AI Search Playbook

4 parts

1. Traffic shift data

2. SEO/GEO/AEO framework

3. GEO tactics

4. Our results

AI Technical Debt

4 parts

1. Vibe coding crisis

2. Thread-based engineering

3. Claude Code guide

4. Case study

AI Native Agency

3 parts

1. Agency comparison

2. Why B2B brands move

3. Budget guide

Growth Marketing

2 parts

1. Traditional vs growth

2. AI growth definitive guide

Why series work for GEO: AI platforms treat interlinked series as topical authority clusters. Each post reinforces the others, creating compound citation potential.

Each series follows a deliberate structure:

Part 1: Present the problem with data (hooks the reader and the AI)
Part 2: Framework or comparison (establishes authority through analysis)
Part 3: Tactical playbook (provides actionable, citable content)
Part 4: Case study or results (demonstrates real-world application)

We currently have 4 series totaling 13 posts, plus 26 standalone posts covering other topics. The series posts consistently perform better for topic-relevant queries.

Change 5: Building Custom AI Referrer Tracking

The tracker matches document.referrer against known AI domains:

Domain	Source Category	Notes
chatgpt.com	ChatGPT	Also matches chat.openai.com
perplexity.ai	Perplexity	Answer engine traffic
claude.ai	Claude	Anthropic assistant traffic
gemini.google.com	Gemini	Formerly bard.google.com
bing.com/chat	Copilot	Microsoft Copilot chat

The Full Technical Stack

Here is the complete picture of what our GEO infrastructure looks like after implementation.

Our GEO Technical Stack

Crawl Layer

robots.txt: 6 AI bots allowed, training bots blocked
XML sitemap with accurate lastmod dates
Key pages explicitly surfaced (/blogs, /services, /projects)

Discovery Layer

llms.txt: 262 lines covering products, services, portfolio
ai-capabilities-factsheet.txt: Corrects AI misinterpretations
Structured URLs with descriptive slugs

Schema Layer

FAQPage JSON-LD on 37/39 blog posts
Article JSON-LD with author, publisher, dates
Organization schema on service pages

Measurement Layer

Custom AI referrer tracker (ChatGPT, Perplexity, Claude, Gemini)
UTM parameter detection for chatgpt.com sources
Session-level AI attribution in analytics

What We Did Not Do (And Why)

Transparency matters. Here is what we deliberately chose not to implement:

Pixelmojo, 2026

6 Lessons We Learned

After implementing GEO across our entire site, here are the lessons that mattered most.

6 Lessons From Our GEO Implementation

Series beat standalone posts

AI platforms treat interlinked content clusters as topical authority. 4-part series generate more citations than 4 disconnected posts.

FAQ schema is the highest-ROI change

15 minutes per post to add FAQPage JSON-LD. 3.2x more likely to appear in AI Overviews. We added it to 37 posts.

llms.txt is cheap insurance

Took 2 hours to create. Zero confirmed AI platforms read it yet. But 844,000 sites adopted it, and the downside is zero.

Block training, allow search

Our robots.txt blocks training bots (CCBot, Google-Extended) but allows search bots (GPTBot, PerplexityBot). We keep our content ours while staying discoverable.

Track AI traffic separately from day one

Our custom AI referrer tracker differentiates ChatGPT, Perplexity, Claude, and Gemini traffic. Without this, AI visits disappear into "Direct" or "Referral."

Answer-first writing is a discipline

Putting the conclusion in paragraph 1 feels unnatural. But data-backed content with statistics in the opening gets cited 67% more by AI. We restructured every post.

Lesson 1: Series Beat Standalone Posts

If you take one structural change from this post, it should be: reorganize your content into series.

Lesson 2: FAQ Schema Is the Highest-ROI Change

Lesson 3: Separate Search Bots from Training Bots

Lesson 4: Measure AI Traffic from Day One

Lesson 5: Answer-First Writing Is a Discipline

Lesson 6: GEO Is Not a One-Time Project

We now review our GEO implementation monthly. The checklist takes 30 minutes.

Free Tool

Can AI Bots Find Your Content?

Test how GPTBot, Claude, Perplexity, and 11 other bots see your website. Checks robots.txt, structured data, llms.txt, and content accessibility.

Try the AI Crawl Checker

The Honest Assessment

We are not going to claim that GEO transformed our business overnight. That would be dishonest. Here is what we can say with confidence:

We expect similar directional results as our implementation matures. The fundamentals are the same: make your content genuinely excellent, structurally accessible to AI, and measurable.

GEO Implementation: Questions Readers Ask

Common questions about this topic, answered.

The Complete AI Search Playbook

This concludes our 4-part series. Here is the complete roadmap:

Ready to implement GEO for your business?

AI Lead Qualification

See how AI traffic converts with the right system

Get a GEO audit for your business

We Took Our Own Advice. Here Is Everything That Happened.

TL;DR

The Audit: Where We Started vs Where We Ended

Change 1: Rebuilding robots.txt for the AI Crawler Ecosystem

Change 2: Creating llms.txt from Scratch

Change 3: FAQ Schema on (Almost) Every Post

Change 4: Series-Based Content Architecture

Change 5: Building Custom AI Referrer Tracking

The Full Technical Stack

Crawl Layer

Discovery Layer

Schema Layer

Measurement Layer

What We Did Not Do (And Why)

6 Lessons We Learned

Series beat standalone posts

FAQ schema is the highest-ROI change

llms.txt is cheap insurance

Block training, allow search

Track AI traffic separately from day one

Answer-first writing is a discipline

Lesson 1: Series Beat Standalone Posts

Lesson 2: FAQ Schema Is the Highest-ROI Change

Lesson 3: Separate Search Bots from Training Bots

Lesson 4: Measure AI Traffic from Day One

Lesson 5: Answer-First Writing Is a Discipline

Lesson 6: GEO Is Not a One-Time Project

Can AI Bots Find Your Content?

The Honest Assessment

GEO Implementation: Questions Readers Ask

The Complete AI Search Playbook

Ready to implement GEO for your business?

About the Author

Lloyd Pilapil

Related Reading

We Took Our Own Advice. Here Is Everything That Happened.

TL;DR

The Audit: Where We Started vs Where We Ended

Change 1: Rebuilding robots.txt for the AI Crawler Ecosystem

Change 2: Creating llms.txt from Scratch

Change 3: FAQ Schema on (Almost) Every Post

Change 4: Series-Based Content Architecture

Change 5: Building Custom AI Referrer Tracking

The Full Technical Stack

Crawl Layer

Discovery Layer

Schema Layer

Measurement Layer

What We Did Not Do (And Why)

6 Lessons We Learned

Series beat standalone posts

FAQ schema is the highest-ROI change

llms.txt is cheap insurance

Block training, allow search

Track AI traffic separately from day one

Answer-first writing is a discipline

Lesson 1: Series Beat Standalone Posts

Lesson 2: FAQ Schema Is the Highest-ROI Change

Lesson 3: Separate Search Bots from Training Bots

Lesson 4: Measure AI Traffic from Day One

Lesson 5: Answer-First Writing Is a Discipline

Lesson 6: GEO Is Not a One-Time Project

Can AI Bots Find Your Content?

The Honest Assessment

GEO Implementation: Questions Readers Ask

The Complete AI Search Playbook

Ready to implement GEO for your business?

About the Author

Lloyd Pilapil

Related Reading

We Took Our Own Advice. Here Is Everything That Happened.

TL;DR

The Audit: Where We Started vs Where We Ended

Change 1: Rebuilding robots.txt for the AI Crawler Ecosystem

Change 2: Creating llms.txt from Scratch

Change 3: FAQ Schema on (Almost) Every Post

Change 4: Series-Based Content Architecture

Change 5: Building Custom AI Referrer Tracking

The Full Technical Stack

Crawl Layer