When You Can Not Tell Good AI Advice From The Bad

AI UX tools sound smart—but are they reliable? Learn why 70% accuracy is dangerous in UX, and how bad recommendations can quietly hurt conversions.

When You Can’t Tell Good Al Advice From The Bad

Why a 70%-Accurate AI UX Tool Is Actually a Terrible Idea

10 Key Usability Heuristics

On paper, 70% accuracy sounds… fine.
Better than a coin toss.
Good enough to “get started,” right?

In UX, it’s not just not good enough — it’s actively dangerous.

This uncomfortable truth came up in a recent Nielsen Norman Group UX Podcast conversation featuring Baymard Institute co-founders Christian Holst and Jamie Holst, alongside Kate Moran from Nielsen Norman Group.

What they unpacked is something many teams are quietly struggling with right now:

AI tools are getting very good at sounding right — without being reliably right.

And in UX, that gap matters more than most people realize.


The 10-Suggestion Trap

Imagine this scenario:

You run your interface through an AI UX audit tool.
It gives you 10 recommendations.

  • ✅ 7 are genuinely good
  • ❌ 3 are subtly but seriously wrong

Here’s the real problem:

You can’t tell which is which.

And if you could reliably tell them apart…
you wouldn’t need the AI tool in the first place.

That’s not a hypothetical risk. That’s the core issue with AI-driven UX analysis today.

UX is full of small decisions with outsized impact:

  • A thumbnail vs dots under a product image
  • Button placement on a checkout page
  • Error handling copy that appears once every 200 sessions

One “minor” UI detail can shift conversion by millions of dollars at scale.

So when an AI tool is wrong 30% of the time — even politely wrong — it can quietly cancel out the gains from everything it got right.

You move fast.
You ship confidently.
And you end up exactly where you started — or worse.


Why This Is a UX-Specific Problem

In many fields, a 70% success rate might be acceptable.

UX is different.

Because UX decisions are:

  • Interconnected
  • Context-dependent
  • Often irreversible once scaled

A bad recommendation doesn’t just fail.
It can actively degrade performance.

Baymard shared real examples:

  • Replacing tiny dot indicators with image thumbnails on a product page → +1% conversion for a Fortune 500 retailer
  • Duplicating the “Place Order” button at both top and bottom of checkout → $10M annual revenue lift

Now imagine an AI confidently suggesting the opposite — because “minimalism” sounds cleaner.

It sounds smart.
It sounds reasonable.
It’s catastrophically wrong.


The Accuracy Question Nobody Is Asking

One of the most essential points from the discussion:

Most AI UX tools don’t publish their accuracy rates at all.

And when accuracy is measured, recent independent studies show:

  • Many tools land between 50–70% accuracy
  • Higher accuracy dramatically reduces what the tool can safely evaluate

Baymard made a deliberate decision with their tool, UX Ray:

  • Cover fewer heuristics
  • But deliver ~95% accuracy

That choice wasn’t easy — or cheap.
Documenting accuracy alone costs six figures.

But it protects something far more valuable than features or hype:

Trust.


Why “Looks Right” Is the Most Dangerous Phrase in UX

Generative AI excels at producing outputs that are:

  • Well-phrased
  • Confident
  • Structurally convincing

They’re shaped like good insights.

But UX isn’t about sounding correct.
It’s about being correct in context.

When teams implement AI-generated UX suggestions without deep validation, they often say:

“It looked right, so we shipped it.”

That’s how organizations end up:

  • Iterating endlessly
  • Chasing trends
  • Burning credibility with leadership
  • Concluding that “UX doesn’t really work.”

Not because UX failed —
But because bad tools were trusted too early.


Acting Like a Professional in an AI World

One of the strongest ideas from the episode was simple:

A professional owns outcomes — not tools.

Using AI isn’t unprofessional.
Blindly trusting it is.

Professionals:

  • Reduce uncertainty where it matters
  • Understand acceptable risk
  • Know when speed is worth it — and when it isn’t

That’s why Baymard intentionally limits AI’s role in UX Ray:

  • AI handles classification (what pattern is present)
  • Humans define judgment (is this good, harmful, or risky)

Probabilistic systems are used where they’re strong.
Deterministic logic is used where correctness matters.

This also makes mistakes visible — so humans can catch them before damage is done.


How Generative AI Helps UX Professionals

The Quiet Risk for Junior Designers

There’s a more profound concern here.

AI UX tools are most often used by:

  • Junior designers
  • Non-UX specialists
  • Teams without research support

The very people are least equipped to spot bad advice.

Heuristic evaluations, guideline application, and expert reviews are how UX instincts are built.
Outsourcing that thinking too early risks creating a generation that:

  • Executes without understanding
  • Ships without confidence
  • Can’t explain why a decision was made

That’s not a tooling problem.
That’s a professional development problem.


So… Can AI Replace UX Research?

Not yet.
Not even close.

AI can:

  • Accelerate observation
  • Support pattern detection
  • Reduce grunt work

But judgment, context, and accountability are still human responsibilities.

And until AI tools can clearly say:

“Here’s what I know, here’s what I don’t, and here’s how often I’m wrong”

They should be treated as assistants, not authorities.


The One Question Every Team Should Ask

Before adopting any AI UX tool, ask this:

“What is your documented accuracy rate — and how was it measured?”

If there’s no answer?
That’s your answer.


Final Thought

AI isn’t killing UX.
Bad UX decisions are.

Use AI like a professional.
Demand evidence.
Own the outcome.

That’s how UX survives — and improves — in the age of automation.