CSAT is dead. Long live CSAT.

CSAT is dead. Long live CSAT.

Steve Hind

Steve Hind

|

Jun 4, 2025

CSAT is dead. Long live CSAT.
CSAT is dead. Long live CSAT.

Intercom just launched their "Customer Experience Score" (CX Score) with a blog post arguing that CSAT should be replaced by AI-driven evaluation. They make some valid points about CSAT's problems, but they're essentially asking you to replace a flawed industry standard with their proprietary, Intercom-specific score.

Here's why we think that's a bad idea.

CSAT does have problems

Let's be honest about CSAT's limitations first. Some of Intercom's criticisms aren't wrong – two in particular.

  1. Response bias is real. People with extreme experiences are more likely to respond (particularly if you only send out CSAT surveys after customers contact support). Your "average" CSAT score definitely isn't representative of your actual average customer experience.

  2. CSAT is shallow. It captures sentiment at one moment but doesn't explain why or predict future behavior. It also muddles “policy CSAT” (customer happy / not happy with policy) from “experience CSAT” (customer happy / not happy with how the support agent behaved).

These are legitimate problems. But Intercom's solution has the potential to create bigger ones.

The problem with vendor-invented metrics

When a vendor creates their own evaluation metric, they're grading their own homework. Intercom's CX Score conveniently shows their AI agent performing better than human agents. 

I’m sure that is really what they are seeing in their data, the challenge is it requires an ongoing heroic level of self awareness and self discipline from them to avoid slipping into finding the convenient conclusion from the data.

While Intercom claims 0.8 F-score validation, after experienced agents reviewed its AI-scoring model, customers can't independently verify or audit the scoring. 

We've seen this movie before – businesses creating internal metrics that unintentionally overstated their success, causing them to overlook real product or service issues. Uber initially used "completed trips" as their primary success metric, which encouraged rapid growth while overlooking driver satisfaction, safety, and service quality. Netflix counted a "view" as watching just two minutes of content, inflating their popularity metrics and distorting content decisions.

We've seen this movie before

Once you adopt a vendor's custom metric, you're also locked into their evaluation framework. You can’t meaningfully compare the new custom metric to industry standards like CSAT, which makes it harder to compare Intercom’s performance against other vendors. 

CSAT reflects reality

Despite its flaws, CSAT has one critical advantage: it reflects what customers actually think, not what an AI model thinks they should think.

CSAT reflects reality, CSAT vs CX Score

Yes, CSAT coverage is low and biased. But the customers who do respond are giving you their genuine reaction. That's more valuable than a black box algorithm's assessment of what their reaction should be.

CSAT is also transparent and auditable. Everyone understands what it measures, even if imperfectly. You can benchmark across vendors and time periods. You can independently verify results.

Don't replace CSAT, complement it

The right approach isn't to throw out CSAT for a vendor's proprietary metric. It's to use multiple evaluation methods.

At Lorikeet, we let customers define their own quality metrics and test them rigorously. You can use CSAT alongside other measures like resolution time, escalation rates, and customer-defined quality criteria.

Test everything against human judgment. Build evaluation frameworks you control and understand. Use metrics you can port between vendors. There is no substitute for “tasting the soup” and reviewing tickets by hand to ask “is this how we want to show up for our customers”.

Most importantly, don't outsource your quality standards to a vendor. They have different incentives than you do.

The bottom line

CSAT is flawed, but it's flawed in ways we understand. Vendor-invented metrics are flawed in ways we can't see or audit.

Better to have an imperfect metric you control than a "perfect" one you can't verify. Keep using CSAT, but don't rely on it alone. Build your own comprehensive evaluation framework.

Your customers' actual opinions matter more than any algorithm's assessment of what their opinions should be.

Intercom just launched their "Customer Experience Score" (CX Score) with a blog post arguing that CSAT should be replaced by AI-driven evaluation. They make some valid points about CSAT's problems, but they're essentially asking you to replace a flawed industry standard with their proprietary, Intercom-specific score.

Here's why we think that's a bad idea.

CSAT does have problems

Let's be honest about CSAT's limitations first. Some of Intercom's criticisms aren't wrong – two in particular.

  1. Response bias is real. People with extreme experiences are more likely to respond (particularly if you only send out CSAT surveys after customers contact support). Your "average" CSAT score definitely isn't representative of your actual average customer experience.

  2. CSAT is shallow. It captures sentiment at one moment but doesn't explain why or predict future behavior. It also muddles “policy CSAT” (customer happy / not happy with policy) from “experience CSAT” (customer happy / not happy with how the support agent behaved).

These are legitimate problems. But Intercom's solution has the potential to create bigger ones.

The problem with vendor-invented metrics

When a vendor creates their own evaluation metric, they're grading their own homework. Intercom's CX Score conveniently shows their AI agent performing better than human agents. 

I’m sure that is really what they are seeing in their data, the challenge is it requires an ongoing heroic level of self awareness and self discipline from them to avoid slipping into finding the convenient conclusion from the data.

While Intercom claims 0.8 F-score validation, after experienced agents reviewed its AI-scoring model, customers can't independently verify or audit the scoring. 

We've seen this movie before – businesses creating internal metrics that unintentionally overstated their success, causing them to overlook real product or service issues. Uber initially used "completed trips" as their primary success metric, which encouraged rapid growth while overlooking driver satisfaction, safety, and service quality. Netflix counted a "view" as watching just two minutes of content, inflating their popularity metrics and distorting content decisions.

We've seen this movie before

Once you adopt a vendor's custom metric, you're also locked into their evaluation framework. You can’t meaningfully compare the new custom metric to industry standards like CSAT, which makes it harder to compare Intercom’s performance against other vendors. 

CSAT reflects reality

Despite its flaws, CSAT has one critical advantage: it reflects what customers actually think, not what an AI model thinks they should think.

CSAT reflects reality, CSAT vs CX Score

Yes, CSAT coverage is low and biased. But the customers who do respond are giving you their genuine reaction. That's more valuable than a black box algorithm's assessment of what their reaction should be.

CSAT is also transparent and auditable. Everyone understands what it measures, even if imperfectly. You can benchmark across vendors and time periods. You can independently verify results.

Don't replace CSAT, complement it

The right approach isn't to throw out CSAT for a vendor's proprietary metric. It's to use multiple evaluation methods.

At Lorikeet, we let customers define their own quality metrics and test them rigorously. You can use CSAT alongside other measures like resolution time, escalation rates, and customer-defined quality criteria.

Test everything against human judgment. Build evaluation frameworks you control and understand. Use metrics you can port between vendors. There is no substitute for “tasting the soup” and reviewing tickets by hand to ask “is this how we want to show up for our customers”.

Most importantly, don't outsource your quality standards to a vendor. They have different incentives than you do.

The bottom line

CSAT is flawed, but it's flawed in ways we understand. Vendor-invented metrics are flawed in ways we can't see or audit.

Better to have an imperfect metric you control than a "perfect" one you can't verify. Keep using CSAT, but don't rely on it alone. Build your own comprehensive evaluation framework.

Your customers' actual opinions matter more than any algorithm's assessment of what their opinions should be.

Ready to deploy human-quality CX?

Ready to deploy human-quality CX?

Businesses with the highest CX standards choose Lorikeet's AI agents to

solve the most complicated support cases in the most complex industries.