Surveys & Feedback

CSAT vs NPS vs CES

Three customer survey metrics, three jobs. Pick the right one for the moment.

8 min read Updated April 29, 2026

CSAT, NPS, and CES are the three customer survey metrics that show up in almost every feedback program. They are not interchangeable. Each measures a different thing, fires at a different moment, and answers a different question — picking the wrong one is one of the most common reasons feedback data ends up unused.

What each one measures

Strip away the slideware and the three metrics are answering three different questions:

CSAT (Customer Satisfaction) — "Were you happy with this specific thing?" Asked right after a defined event: a support ticket, a delivery, a feature use. Short scale, usually 1–5 or 1–7. Reported as the percentage of respondents in the top one or two boxes.
NPS (Net Promoter Score) — "How loyal are you to us, overall?" Asked on a cadence about the relationship, not the event. 0–10 scale. Reported as percent promoters minus percent detractors.
CES (Customer Effort Score) — "How hard was it to get this done?" Asked after a friction-prone task: contacting support, completing onboarding, returning a product. Usually a 1–7 agreement scale on "It was easy to…". Reported as average score or top-box rate.

If you can name the moment, the metric, and the calculation in one sentence, you are using it correctly. If you cannot, the metric is decoration.

When to use which

The simplest mapping:

Use CSAT after a transaction you want to evaluate — a support resolution, a delivered order, a completed onboarding step. It catches problems where they happen.
Use NPS twice a year as a relationship pulse — a forward-looking signal about loyalty and word-of-mouth. The NPS complete guide covers cadence and sampling in depth.
Use CES after high-effort moments where a small bump in friction would lose the customer — first login, first ticket, first renewal. CES correlates with retention more reliably than satisfaction does in many B2B categories.

Most mature feedback programs run all three: CSAT in the support flow, CES at friction points, and NPS twice a year on the relationship. They reinforce each other, but only if each one is wired to a clear action.

How to calculate each

The math is simple, the inconsistency is what causes problems:

CSAT — most teams report the percentage of respondents who pick the top one or two boxes (4–5 on a 5-point scale). A few report it as a mean. Pick one and never switch.
NPS — promoters (9–10) minus detractors (0–6), passives (7–8) excluded. Result is -100 to +100.
CES — most teams report mean score on a 1–7 scale, or the percentage of "agree/strongly agree" responses. Again, pick one and stick.

The thing that breaks most dashboards is changing the calculation mid-year. If you switch from a mean to a top-box CSAT, your trend line is now nonsense. Document the formula in the survey itself so the next analyst inherits the choice.

Combining them in a feedback program

The three metrics fit together if you treat them as different layers:

Event-level — CSAT and CES, fired by triggers, owned by the team responsible for that event.
Relationship-level — NPS, on a cadence, owned at a higher level (CX leadership or the executive team).
Diagnostic-level — open-text follow-ups on every metric, themed monthly to inform the roadmap.

The strongest pattern: when an NPS detractor explains "support was slow," check the CSAT and CES from that customer's last support interaction. When the metrics agree, you have a signal worth acting on. When they disagree, you have a question worth investigating. Customer feedback survey templates include the standard variants you can drop into your tool.

Benchmarks and traps

Industry benchmarks are useful directional anchors and dangerous targets. The companies that publish their scores have selected for being above average — that is part of why they publish. Compare to your own past trend first, your sector benchmarks second, and never to a competitor's marketing slide.

The biggest trap with all three metrics is making them compensation targets without protecting the data. Once a team's bonus depends on a number, the surveys start getting curated, the sample skewed, and the score drifts loose from reality. Tie incentives to closed-loop actions and verbatim themes — see how to write survey questions for the wording rules that keep the data clean.

Pick by job to be done: CSAT for "did this thing go well?", CES for "was this hard?", NPS for "how do you feel about us overall?". Run all three in a mature program, never average them together, and always pair the score with an open follow-up.

Frequently asked

Which metric is most important?

The one that maps to the decision you are trying to make. CSAT diagnoses event quality, CES diagnoses friction, NPS forecasts loyalty and word-of-mouth. Picking a single most-important metric usually means optimizing the wrong thing for the question someone actually asked.

Can I just use NPS for everything?

You can, but you will lose resolution. NPS at the relationship level cannot tell you that your support team is great and your billing flow is broken. Event-level CSAT and CES find the specific causes that show up later in the relationship score.

What sample size do I need for these to be reliable?

For a single period comparison, aim for at least a hundred responses per cohort before you read movement as signal. With fewer than that, swings of five or ten points are usually noise. If your customer base is small, report a moving average over a longer window.

How do I avoid survey fatigue when running all three?

Cap total surveys at one per customer per month, sequence them so the same person rarely gets more than one in a window, and never re-survey someone who responded recently. Use silent suppression rules in your survey tool to enforce the cap automatically.

Should the wording exactly match the standard?

For NPS, yes — changing the wording breaks comparability with industry benchmarks. For CSAT and CES, the standard wording is a starting point you can localize for your product, as long as the scale and the calculation stay consistent inside your own tracking.