Why a 70%-Accurate AI UX Tool Is Actually a Terrible Idea

On paper, 70% accuracy sounds… fine.
Better than a coin toss.
Good enough to “get started,” right?
In UX, it’s not just not good enough — it’s actively dangerous.
This uncomfortable truth came up in a recent Nielsen Norman Group UX Podcast conversation featuring Baymard Institute co-founders Christian Holst and Jamie Holst, alongside Kate Moran from Nielsen Norman Group.
What they unpacked is something many teams are quietly struggling with right now:
AI tools are getting very good at sounding right — without being reliably right.
And in UX, that gap matters more than most people realize.
The 10-Suggestion Trap
Imagine this scenario:
You run your interface through an AI UX audit tool.
It gives you 10 recommendations.
- ✅ 7 are genuinely good
- ❌ 3 are subtly but seriously wrong
Here’s the real problem:
You can’t tell which is which.
And if you could reliably tell them apart…
you wouldn’t need the AI tool in the first place.
That’s not a hypothetical risk. That’s the core issue with AI-driven UX analysis today.
UX is full of small decisions with outsized impact:
- A thumbnail vs dots under a product image
- Button placement on a checkout page
- Error handling copy that appears once every 200 sessions
One “minor” UI detail can shift conversion by millions of dollars at scale.
So when an AI tool is wrong 30% of the time — even politely wrong — it can quietly cancel out the gains from everything it got right.
You move fast.
You ship confidently.
And you end up exactly where you started — or worse.
Why This Is a UX-Specific Problem
In many fields, a 70% success rate might be acceptable.
UX is different.
Because UX decisions are:
- Interconnected
- Context-dependent
- Often irreversible once scaled
A bad recommendation doesn’t just fail.
It can actively degrade performance.
Baymard shared real examples:
- Replacing tiny dot indicators with image thumbnails on a product page → +1% conversion for a Fortune 500 retailer
- Duplicating the “Place Order” button at both top and bottom of checkout → $10M annual revenue lift
Now imagine an AI confidently suggesting the opposite — because “minimalism” sounds cleaner.
It sounds smart.
It sounds reasonable.
It’s catastrophically wrong.
The Accuracy Question Nobody Is Asking
One of the most essential points from the discussion:
Most AI UX tools don’t publish their accuracy rates at all.
And when accuracy is measured, recent independent studies show:
- Many tools land between 50–70% accuracy
- Higher accuracy dramatically reduces what the tool can safely evaluate
Baymard made a deliberate decision with their tool, UX Ray:
- Cover fewer heuristics
- But deliver ~95% accuracy
That choice wasn’t easy — or cheap.
Documenting accuracy alone costs six figures.
But it protects something far more valuable than features or hype:
Trust.
Why “Looks Right” Is the Most Dangerous Phrase in UX
Generative AI excels at producing outputs that are:
- Well-phrased
- Confident
- Structurally convincing
They’re shaped like good insights.
But UX isn’t about sounding correct.
It’s about being correct in context.
When teams implement AI-generated UX suggestions without deep validation, they often say:
“It looked right, so we shipped it.”
That’s how organizations end up:
- Iterating endlessly
- Chasing trends
- Burning credibility with leadership
- Concluding that “UX doesn’t really work.”
Not because UX failed —
But because bad tools were trusted too early.
Acting Like a Professional in an AI World
One of the strongest ideas from the episode was simple:
A professional owns outcomes — not tools.
Using AI isn’t unprofessional.
Blindly trusting it is.
Professionals:
- Reduce uncertainty where it matters
- Understand acceptable risk
- Know when speed is worth it — and when it isn’t
That’s why Baymard intentionally limits AI’s role in UX Ray:
- AI handles classification (what pattern is present)
- Humans define judgment (is this good, harmful, or risky)
Probabilistic systems are used where they’re strong.
Deterministic logic is used where correctness matters.
This also makes mistakes visible — so humans can catch them before damage is done.

The Quiet Risk for Junior Designers
There’s a more profound concern here.
AI UX tools are most often used by:
- Junior designers
- Non-UX specialists
- Teams without research support
The very people are least equipped to spot bad advice.
Heuristic evaluations, guideline application, and expert reviews are how UX instincts are built.
Outsourcing that thinking too early risks creating a generation that:
- Executes without understanding
- Ships without confidence
- Can’t explain why a decision was made
That’s not a tooling problem.
That’s a professional development problem.
So… Can AI Replace UX Research?
Not yet.
Not even close.
AI can:
- Accelerate observation
- Support pattern detection
- Reduce grunt work
But judgment, context, and accountability are still human responsibilities.
And until AI tools can clearly say:
“Here’s what I know, here’s what I don’t, and here’s how often I’m wrong”
They should be treated as assistants, not authorities.
The One Question Every Team Should Ask
Before adopting any AI UX tool, ask this:
“What is your documented accuracy rate — and how was it measured?”
If there’s no answer?
That’s your answer.
Final Thought
AI isn’t killing UX.
Bad UX decisions are.
Use AI like a professional.
Demand evidence.
Own the outcome.
That’s how UX survives — and improves — in the age of automation.






































