
What does it actually mean to measure AI visibility?
AI visibility is what ChatGPT, Perplexity, Claude, and Gemini actually say about your brand when a real user asks them a real question. Not what your site looks like to a crawler. Not how many backlinks you have. What the models output. That is the only thing buyers see. That is the only thing worth measuring.
Most tools sold as "AI SEO" or "GEO platforms" do not measure this. They crawl your site, score your schema markup, check whether your llms.txt file exists, and produce a number that looks like a measurement. It is not a measurement. It is a hypothesis about what AI models might do based on inputs they might consider. Whether AI models actually know who you are remains untested.
This piece argues for a hard line. If your AI visibility tool has not run a live query against an LLM in the last 24 hours, your visibility score is fiction. The score might be a useful fiction. It might correlate with real visibility over time. But it is not data. It is inference from proxies. And the proxies have stopped being reliable as AI search has matured.
Why traditional SEO tools fail at this question
Traditional SEO tools were built for a static index. Google crawled the web, ranked pages, and served results. The tools that helped you rank in that system measured things you could control on your site: keyword density, internal linking, backlink quality, page speed, schema markup. The tools were good at this because Google was a deterministic system. The same query produced the same ranked list, and the inputs that influenced ranking were observable.
AI search is not deterministic and the inputs are not observable. ChatGPT does not publish its citation algorithm. Perplexity does not let you inspect its source weighting. Claude trains on a different corpus than Gemini. The output of each model is the result of a billion-parameter inference process running on a context window that includes retrieval-augmented sources, training data, and conversational history. None of that is visible from your domain.
The tools that have rebranded as "AI SEO" took their existing crawl-and-score architecture and added a few new fields: schema completeness, llms.txt detection, structured data validation. These are useful inputs to AI visibility. They are not measurements of it. The category leader who launched their AI SEO product in 2024 still cannot tell you whether ChatGPT cites you for the prompt your customers are typing. Their tool was never designed to ask that question.
This is the core mismatch. The tools sold as solutions to AI search were built to solve a different problem. They are repackaged. They have a marketing layer that says "GEO" and "AEO" and "answer engine optimization" on top of an architecture that has no API connection to any LLM. The architecture cannot do what the marketing promises.
Static versus live: the two types of AI visibility data
Every dimension of AI visibility falls into one of two categories. Static analysis runs against your site. Live analysis runs against the actual AI models. Static answers "how AI-ready is this domain?" Live answers "what does AI actually say about this brand?" These are different questions with different answers.
| Dimension | Static analysis | Live LLM queries |
|---|---|---|
| What it measures | Inputs on your domain | Outputs from AI models |
| Where it runs | Your site, robots.txt, llms.txt, schema | ChatGPT, Perplexity, Claude, Gemini APIs |
| Cost per audit | Cents in compute | Dollars in LLM tokens |
| What you control | All of it (it is your site) | None of it (it is the model output) |
| Update frequency needed | On site changes | Monthly minimum, weekly for competitive niches |
| Detects hallucinations | No | Yes |
| Detects competitor citations | No | Yes |
| Detects share of voice | No | Yes |
| Output type | Hypothesis | Measurement |
A complete AI visibility platform needs both. Static analysis tells you whether you have done the work to be ingestible. Live querying tells you whether the work paid off. A tool that only does one half is selling you half the picture and pricing it like the whole thing.
What live LLM queries reveal that static analysis misses
There are five questions that no amount of crawling your site can answer. Each requires running prompts against the actual models, recording what they say, and analyzing the output. These are the five live dimensions in the Radar methodology. They map directly to business questions that paid AI SEO tools have been asked for two years and have not answered.
The first is citation tracking. When a user asks ChatGPT "what are the best B2B email tools," is your brand in the response? You cannot find this out from your site. You have to ask the model. Across categories and across models, with multiple prompt variations to control for noise. Then you record the citation rate. That is data.
The second is citation testing on specific pages. You wrote a definitive guide to your topic. Does any AI model cite that specific page when asked the target question? Most of the time the answer is no, and the model cites a competitor. Without live testing, you would never know which page the model picked instead of yours.
The third is source influence. Which domains shape AI narratives in your category? When the model talks about "the leading vendors in X," which sources is it pulling from? This is not random. There are usually three or four domains that dominate AI citations for any given category. If those domains are competitor-aligned content farms, that is a problem you cannot diagnose from your own site.
The fourth is prompt SOV. Across a representative set of prompts your customers actually type, what percentage mention your brand? This is the AI search equivalent of share of voice in traditional media. It is the cleanest single metric for tracking AI visibility over time, and it is computable only by running the prompts and counting the mentions.
The fifth is hallucination detection. AI models confidently state things that are wrong. About your brand, your product, your pricing, your features. If you do not query the models periodically and check what they are claiming, you will only learn about hallucinations when a customer flags one in a meeting. By then the damage is done. Live monitoring is the only defense.
What 77 live audits actually showed
Pixelmojo has run 77 live audits across six core industries since January 2026, with new audits added every week. Every audit includes the full 12-dimension methodology, with five of those dimensions executed as live queries against ChatGPT, Perplexity, Claude, and Gemini. The first 63 audits were published as the State of AI Visibility 2026 benchmark report, and the live dashboard shows current numbers as more audits land.
The headline finding is that the average score is 45/100. Half the domains audited would receive a failing grade in any traditional academic system. This is not because these are bad sites. The audited set included recognizable category leaders and well-funded brands. The score reflects how immature AI search optimization is as a discipline. The infrastructure is new, the playbooks are still being written, and most teams have not adapted.
Only 2 domains have scored an A grade. Across 77 audits, just 2.6% of businesses have crossed the 90/100 threshold. The highest scorer hit 92/100. This is rare in benchmarking studies. Most categories produce a long tail with a handful of standouts. AI visibility produces a flat distribution clustered in the C and D range, with A grades remaining the exception. The implication is that any business that gets to a B grade today is in the top decile of their industry.
The industry-to-industry gap is 16 points. The leading industry in our dataset averaged 53/100. The trailing industry averaged 37/100. This means category-level competitive advantage is achievable through deliberate work because no industry has saturated. Even the leaders are leaving 47 points on the table. The competitive frontier is not "be best in class." It is "do the work that nobody is doing yet."
Why this matters for your business in 2026
The shift from search engine to answer engine is not theoretical. It is happening in your customers' workflows right now. A B2B buyer evaluating vendors no longer types a query into Google and clicks through ten results. They ask ChatGPT, Perplexity, or Claude. They paste the AI response into Slack. The decision to add a vendor to the shortlist often happens before any human visits any website.
If you are not in the AI response, you are not in the shortlist. The funnel collapsed. The previous funnel had visibility, click, evaluation, decision as four discrete stages. The AI-mediated funnel has three: prompt, response, decision. Visibility and click happen at the same instant inside the model output. Evaluation has been preempted by whatever the model said.
The teams that are catching up to this reality are running monthly visibility audits. They are tracking citation rates across the four major models. They are watching for hallucinations and correcting them through schema, llms.txt, and direct content updates. They are measuring prompt SOV as a leading indicator and adjusting content strategy on a six-week cycle.
The teams that are falling behind are still running their existing SEO tools, looking at their existing dashboards, and assuming the proxies still work. They are also running paid ads against keywords that fewer buyers type every quarter. The data they need to course-correct is sitting one live LLM query away. Most of them have not run that query yet.
What "good" looks like: the 12 dimensions of real AI visibility
A real AI visibility audit covers twelve dimensions, weighted by how much each contributes to whether AI models will cite your brand. The full breakdown lives on the Radar methodology page. The structure is below as a tease.
The seven static dimensions cover what your site offers to AI ingestion: AI Crawl Check (10% weight), Robots.txt Analysis (8%), llms.txt Validation (8%), AI Readiness Score (10%), AEO Page Auditor (8%), Schema Audit (8%), and Reddit Monitor (5%). These tell you whether your domain is technically ready to be cited. They are necessary, not sufficient.
The five live dimensions cover what AI models actually do with that ingestion: Citation Tracker (10%), Citation Tester (8%), Source Influence (8%), Prompt SOV (8%), and Hallucination Check (9%). These tell you whether the readiness work translated into real visibility in real model outputs. They are sufficient on their own as a measurement of current AI presence, but expensive to run because every dimension requires real LLM API calls.
A score above 70/100 across this 12-dimension methodology puts a brand in the top 10% of the 77 audits we have run. To get there, a business needs to do well on both the static and live halves. Static-only optimization caps at around 60/100 because the live dimensions are weighted to roughly 43% of the total. You can be perfectly schema-compliant and still score below average if no model is citing you.
The tools you need versus the tools being sold
The market is full of tools that look like AI visibility platforms and behave like SEO crawlers with new packaging. They will tell you your llms.txt is valid. They will score your schema markup. They will produce dashboards with green checkmarks and a number that looks like a measurement. None of them will tell you whether ChatGPT cites your brand for the prompt your customer typed yesterday.
The tools you actually need do two things together. They run static analysis against your domain to verify that AI ingestion is mechanically possible. Then they run live queries against the four major LLMs and record what those models actually say. Both halves are measured against transparent weighting math, and the methodology is published so you can verify the score is not inflated.
This is what Radar does. It is also what most other vendors in this category do not do, because doing it requires real LLM API budget per audit and an architecture that calls out to four model providers in parallel. Static analysis is cheap. Live querying is expensive. The economics of running a freemium tier on a real live-query platform are different from the economics of a freemium static-analysis tool. The price difference reflects the cost of producing real data instead of inferred data.
If you are evaluating an AI visibility tool, ask one question. "Does it run live queries against ChatGPT, Perplexity, Claude, and Gemini, or does it infer from my site?" Vendors that do the first thing will say so plainly. Vendors that do the second will pivot to talking about coverage, machine learning, or proprietary scoring algorithms. The pivot is the answer.
AI Visibility Measurement: Questions Buyers Ask
Common questions about this topic, answered.
Run a real audit, see the real number
If your AI visibility tool has never queried an LLM, you have never seen your real score. The number on the dashboard is a hypothesis. The 77-audit benchmark shows what real numbers look like, and they are lower than the dashboards imply.
The methodology page documents how the 12 dimensions are weighted, which five require live queries, and how the unified score is calculated. The benchmark report shows the distribution across 77 real audits, with the live dashboard tracking current numbers. Both are open for inspection.
Ready to see your real AI visibility score?
- Read the Radar methodology - See the full 12-dimension breakdown, weights, and which dimensions require live queries.
- Read the benchmark report - 77 audits, six core industries, the distribution that vendors will not publish.
- Run your audit - One credit. All 12 dimensions. The score the proxies cannot give you.
