Landing page A/B testing guide
What to test first, how long to run, and how to read results without fooling yourself.
Most landing page A/B tests are a waste of time. Not because testing is wrong, but because most teams test the wrong things, declare winners too early, and read noise as signal. Here's how to test what actually moves the number, run it long enough to know, and read the results without fooling yourself.
What to test first
The order matters. Big levers first, small ones later. Most teams burn weeks testing button colors before they've fixed a headline that doesn't match the ad — and then wonder why their tests "never produce winners."
The order that pays back:
- Hero copy. Headline and subhead. Highest variance, biggest leverage on every visitor.
- Offer structure. What you're asking for, what they get, the price/trial/free-or-paid choice.
- Form length. Number of fields, progressive disclosure, required vs optional.
- CTA copy. Verb, specificity, first-person framing. CTA design covered here.
- Page structure. Section order, length, presence or absence of pricing, FAQ position.
- Visual elements. Hero image, button color, illustrations. Run last because the lift is usually small.
You don't have to follow this order rigidly, but if your test queue starts with button color, your queue is upside down.
Sample size and significance
Most teams run tests for two weeks, eyeball the result, and ship the variant with the higher number. That's not testing — that's confirmation bias with a control group. A real test needs enough traffic to detect the lift you're hoping to see.
Two principles:
- Small lifts need huge samples. Detecting a 5% relative lift on a 3% baseline conversion rate takes tens of thousands of visitors per variant. Detecting a 50% lift takes a tenth of that.
- Run for full weeks, not partial ones. Tuesday traffic doesn't behave like Saturday traffic. Stop a test on Friday and the weekend tells a different story.
The deeper math is in sample size and significance. The short version: pick a minimum detectable effect, calculate the sample size you need, and don't peek at the result every day.
The four mistakes that kill most tests
If your tests "never produce winners," it's almost always one of these:
- Testing too small a change. A button shade variation rarely produces a measurable lift even when the design intuition is right. Test bigger swings — different copy, different offers, different page structures.
- Testing too many variants at once. Four-arm tests need four times the traffic to reach significance. Two arms is the workhorse; reserve multivariate for high-traffic pages with budgets to match.
- Calling winners on partial weeks. The biggest source of false positives. Day-of-week effects move conversion rates more than most variants do.
- Stopping at "p < 0.05" the first time it crosses. Significance values bounce around as a test runs. The right move is to set a sample-size goal up front and only check results when you hit it. Otherwise you're running a peek-and-call test, which inflates false-positive rates dramatically.
Reading the result honestly
A test produces three pieces of information: a point estimate (variant B converted at X% vs A's Y%), a confidence interval, and a sample size. Most teams report the first and ignore the other two. That's a problem because the point estimate by itself can be misleading.
A working interpretation discipline:
- Did you hit the pre-committed sample size? If no, the result isn't reliable, regardless of what the lift looks like.
- Is the confidence interval narrow enough to act on? An interval of "+5% to +50%" is technically a win but tells you almost nothing about the real magnitude.
- Did the result hold across segments? If the variant won overall but lost on mobile, the rollout decision is more complicated than the headline number suggests.
- Does the result match a believable mechanism? A 30% lift from changing one word should make you suspicious, not excited. Re-run before shipping.
The deeper framework lives in conversion rate optimization framework.
When to skip testing entirely
Not everything needs a test. If your traffic is too small to reach significance in a reasonable window, testing isn't generating knowledge — it's generating noise. Two cases where ship-and-watch beats test-first:
- Low-traffic pages. Below a few hundred conversions per month, most A/B tests will never reach significance. Make the change, watch the trailing average, and roll back if conversion drops.
- Obviously broken things. A misspelled headline, a CTA that doesn't work on mobile, a form that won't submit. Don't test fixes to bugs — fix the bug.
For everything else, test thoughtfully. Landing page best practices covers the patterns most worth testing first.