← All field notes Blog

Statistical significance for online marketers doing A/B testing

We're marketers, not statisticians — but if you run A/B tests, statistical significance is what separates a real winner from random noise. A plain-English guide to what it means, the confidence level to use, why you can't stop a test early, and the mistakes that quietly ruin results.

We're online marketers, not statisticians, so "statistical significance" tends to cause a little panic when it enters the room. It shouldn't. The core idea is simple, and getting it right is the difference between acting on a real, repeatable win and chasing a result that was just random noise.

Whether you like it or not, the moment you run an A/B test you're part scientist — you have to prove your result is worth acting on before you roll it out. Here's everything a marketer actually needs to know, in plain English.

What statistical significance actually means

Statistical significance measures how confident you can be that the difference you're seeing didn't just happen by chance. That's it. It's usually expressed as a confidence level:

The plainest way to think about it is the conversation you'll eventually have:

Your CEO: "How confident are you that this test winner isn't just a coincidence?"
You: "95% confident, statistically speaking, that it's no coincidence."

Now you're empowered to use the phrase correctly — which already puts you ahead of most people slinging it around.

The mistake that ruins more tests than any other: peeking

Here's the one that quietly invalidates a huge share of marketing A/B tests. You start a test, check it on day two, see it's hit 95%, and call a winner. Don't.

Significance fluctuates wildly early in a test when you have little data — it'll cross 95% and fall back repeatedly just by random chance. If you keep peeking and stop the instant it looks significant, you're effectively cherry-picking a lucky moment, and your "95% confidence" is a fiction. This is called the peeking problem, and it's why disciplined teams decide the sample size and duration before the test starts and don't touch the stop button until they get there.

Decide your sample size and duration up front

Significance and sample size are two halves of the same decision. Before you launch:

Statistical significance is not the whole story

A result can be statistically significant and still not worth shipping. Significance tells you the difference is real; it says nothing about whether it's big enough to care about. With enough traffic, a 0.3% lift can hit 99% confidence — technically a winner, practically a rounding error. Always ask two questions: is it significant (real), and is it a meaningful enough lift to justify the change? That second one is practical significance, and it's the one marketers forget.

A few more traps

Get the number in seconds

You don't need to do any of this math by hand. We built a free A/B test significance calculator — plug in visitors and conversions for each variant and it returns the conversion rates, the lift, the p-value, and whether the result is actually significant. Use it to check a finished test, and use it before you start to figure out how much traffic you'll need.

And if you'd rather have senior marketers design and run the testing program for you — properly powered, properly read — that's the heart of how we approach CRO and landing pages.

Questions we get
What does statistical significance mean in A/B testing?

It measures how confident you can be that the difference between your variants is real and didn't just happen by random chance. It's expressed as a confidence level — 95% means there's only about a 1-in-20 chance the result is a fluke. It does not tell you how big or valuable the difference is; only that it's unlikely to be noise.

What confidence level should I use — is 95% enough?

95% is the standard minimum for marketing A/B tests, and for most decisions it's enough. Pushing to 97–99% gives you more certainty and is worth it for high-stakes or hard-to-reverse changes. Below about 90%, you don't really have a result yet. The right bar depends on how costly it would be to act on a false winner.

Can I stop an A/B test as soon as it hits significance?

No — this is the most common way marketers ruin their own tests. Significance bounces above and below 95% early on purely by chance, so stopping the moment it looks significant means cherry-picking a lucky reading. Decide your sample size and run length before you start, and don't call the test until it gets there. This is known as the peeking problem.

How long should I run an A/B test, and how big a sample do I need?

Long enough to reach the sample size your target effect requires, and always across full business cycles — at least one to two weeks, in whole weeks, so weekday/weekend and pay-cycle patterns don't bias the result. Calculate the needed sample size up front from your baseline conversion rate, the minimum lift worth detecting, and your confidence level. Smaller effects need far more traffic to detect.

What's the difference between statistical and practical significance?

Statistical significance says the difference is real; practical significance asks whether it's big enough to matter. With enough traffic, a tiny 0.3% lift can be statistically significant yet practically meaningless. Always check both — is the result real, and is the size of the win worth making the change?

I
Igor Belogolovsky
CEO

Igor co-founded Clever Zebo in 2011 and has run paid acquisition for everyone from seed-stage SaaS to DTC brands. He’s allergic to vanity metrics and very fond of clean attribution.

How we'd help
enough reading —

Want this run on your account?

Say hello