Landing Pages

Landing page A/B testing guide

Q: How long should an A/B test run?

Until you hit your pre-committed sample size, and never less than one full week. Two to four weeks is typical for most landing pages. Stopping early on partial weeks is the most common cause of false positives.

Q: How much traffic do I need to A/B test?

Depends on your baseline conversion rate and the lift you want to detect. As a rough floor, plan for at least a few hundred conversions per variant. Pages with under a few thousand monthly visitors usually can't produce reliable test results in any reasonable window.

Q: Can I test more than two variants at once?

Yes, but the math is harsher. Four-arm tests need roughly four times the traffic to reach significance, and the chance of a false positive on at least one arm rises with each variant added. Two-arm tests are the workhorse for most teams.

Q: What if the test result isn't significant but the variant looks better?

Treat it as inconclusive, not a quiet win. "Looks better" without significance often reverses on a longer run. Either run longer to gather more data or move on to the next test — don't ship a non-significant variant and call it a winner.

Q: Should I test landing pages or whole funnels?

Both, but landing pages are usually the easier test. Funnel-level tests catch downstream effects (a higher-converting landing page that produces worse customers) but need much more traffic and longer windows. Start with the landing page; check downstream metrics on the rollout.

Q: Is it okay to test on a small percentage of traffic?

Splitting traffic 90/10 to limit risk extends the test duration, but doesn't invalidate it. The test still needs the same total sample size in the variant; it just takes longer to get there. Use small splits for risky variants, even splits for everything else.

What to test first, how long to run, and how to read results without fooling yourself.

8 min read Updated April 29, 2026

Most landing page A/B tests are a waste of time. Not because testing is wrong, but because most teams test the wrong things, declare winners too early, and read noise as signal. Here's how to test what actually moves the number, run it long enough to know, and read the results without fooling yourself.

What to test first

The order matters. Big levers first, small ones later. Most teams burn weeks testing button colors before they've fixed a headline that doesn't match the ad — and then wonder why their tests "never produce winners."

The order that pays back:

Hero copy. Headline and subhead. Highest variance, biggest leverage on every visitor.
Offer structure. What you're asking for, what they get, the price/trial/free-or-paid choice.
Form length. Number of fields, progressive disclosure, required vs optional.
CTA copy. Verb, specificity, first-person framing. CTA design covered here.
Page structure. Section order, length, presence or absence of pricing, FAQ position.
Visual elements. Hero image, button color, illustrations. Run last because the lift is usually small.

You don't have to follow this order rigidly, but if your test queue starts with button color, your queue is upside down.

Sample size and significance

Most teams run tests for two weeks, eyeball the result, and ship the variant with the higher number. That's not testing — that's confirmation bias with a control group. A real test needs enough traffic to detect the lift you're hoping to see.

Two principles:

Small lifts need huge samples. Detecting a 5% relative lift on a 3% baseline conversion rate takes tens of thousands of visitors per variant. Detecting a 50% lift takes a tenth of that.
Run for full weeks, not partial ones. Tuesday traffic doesn't behave like Saturday traffic. Stop a test on Friday and the weekend tells a different story.

The deeper math is in sample size and significance. The short version: pick a minimum detectable effect, calculate the sample size you need, and don't peek at the result every day.

The four mistakes that kill most tests

If your tests "never produce winners," it's almost always one of these:

Testing too small a change. A button shade variation rarely produces a measurable lift even when the design intuition is right. Test bigger swings — different copy, different offers, different page structures.
Testing too many variants at once. Four-arm tests need four times the traffic to reach significance. Two arms is the workhorse; reserve multivariate for high-traffic pages with budgets to match.
Calling winners on partial weeks. The biggest source of false positives. Day-of-week effects move conversion rates more than most variants do.
Stopping at "p < 0.05" the first time it crosses. Significance values bounce around as a test runs. The right move is to set a sample-size goal up front and only check results when you hit it. Otherwise you're running a peek-and-call test, which inflates false-positive rates dramatically.

Reading the result honestly

A test produces three pieces of information: a point estimate (variant B converted at X% vs A's Y%), a confidence interval, and a sample size. Most teams report the first and ignore the other two. That's a problem because the point estimate by itself can be misleading.

A working interpretation discipline:

Did you hit the pre-committed sample size? If no, the result isn't reliable, regardless of what the lift looks like.
Is the confidence interval narrow enough to act on? An interval of "+5% to +50%" is technically a win but tells you almost nothing about the real magnitude.
Did the result hold across segments? If the variant won overall but lost on mobile, the rollout decision is more complicated than the headline number suggests.
Does the result match a believable mechanism? A 30% lift from changing one word should make you suspicious, not excited. Re-run before shipping.

The deeper framework lives in conversion rate optimization framework.

When to skip testing entirely

Not everything needs a test. If your traffic is too small to reach significance in a reasonable window, testing isn't generating knowledge — it's generating noise. Two cases where ship-and-watch beats test-first:

Low-traffic pages. Below a few hundred conversions per month, most A/B tests will never reach significance. Make the change, watch the trailing average, and roll back if conversion drops.
Obviously broken things. A misspelled headline, a CTA that doesn't work on mobile, a form that won't submit. Don't test fixes to bugs — fix the bug.

For everything else, test thoughtfully. Landing page best practices covers the patterns most worth testing first.

The honest A/B testing rule: test big levers first, pre-commit to a sample size, run full weeks only, and don't stop at the first peek of significance. Three real tests a year beat thirty undisciplined ones.

Frequently asked

How long should an A/B test run?

Until you hit your pre-committed sample size, and never less than one full week. Two to four weeks is typical for most landing pages. Stopping early on partial weeks is the most common cause of false positives.

How much traffic do I need to A/B test?

Depends on your baseline conversion rate and the lift you want to detect. As a rough floor, plan for at least a few hundred conversions per variant. Pages with under a few thousand monthly visitors usually can't produce reliable test results in any reasonable window.

Can I test more than two variants at once?

Yes, but the math is harsher. Four-arm tests need roughly four times the traffic to reach significance, and the chance of a false positive on at least one arm rises with each variant added. Two-arm tests are the workhorse for most teams.

What if the test result isn't significant but the variant looks better?

Treat it as inconclusive, not a quiet win. "Looks better" without significance often reverses on a longer run. Either run longer to gather more data or move on to the next test — don't ship a non-significant variant and call it a winner.

Should I test landing pages or whole funnels?

Both, but landing pages are usually the easier test. Funnel-level tests catch downstream effects (a higher-converting landing page that produces worse customers) but need much more traffic and longer windows. Start with the landing page; check downstream metrics on the rollout.

Is it okay to test on a small percentage of traffic?

Splitting traffic 90/10 to limit risk extends the test duration, but doesn't invalidate it. The test still needs the same total sample size in the variant; it just takes longer to get there. Use small splits for risky variants, even splits for everything else.