A/B Test Statistical Significance Calculator

What is statistical significance?

When you run an A/B test, one variant almost always "wins" by some margin. Statistical significance answers the only question that matters: is that difference real, or could it have happened by random chance? A result is conventionally called significant when there's a 95% or greater chance the difference is genuine — meaning less than a 5% probability it's a fluke.

This calculator runs a two-tailed two-proportion z-test — the standard method for comparing two conversion rates — and translates the result into plain language: your confidence level, the p-value, and a clear verdict.

How to use it

Enter the total visitors each variant received.
Enter the number of conversions (signups, sales, leads — whatever you're measuring).
Read the verdict: conversion rates for each variant, the relative uplift, and whether you've reached significance.

What the numbers mean

Confidence

How sure you can be that the winning variant is genuinely better. 95%+ is the common bar; 99%+ is very strong.

P-value

The probability of seeing a difference this large if the two variants were actually identical. Lower is better — under 0.05 means significant at 95%.

Relative uplift

How much better the variation converts versus the control, in percentage terms. A jump from 10% to 13% is a 30% relative uplift.

A word of caution

Significance isn't the same as "enough data." Small samples can show big swings that vanish with more traffic, and calling a test too early is one of the most common — and expensive — mistakes in marketing. Let tests run to a planned sample size, and beware of peeking. If you'd like a second pair of (senior) eyes on your testing program, that's literally our favorite thing to do.

Frequently asked questions

What confidence level should I aim for?

95% is the standard in marketing and most science. For high-stakes or hard-to-reverse decisions, wait for 99%. For quick, low-risk iterations some teams accept 90% — just know you're taking on more risk of a false positive.

My result isn't significant. What now?

Usually it means you need more data, or the true difference is small. Keep the test running toward a pre-planned sample size, or test a bolder change that's more likely to move the needle.

Does this work for more than two variants?

This tool compares two variants (A vs B). Multivariate and multi-arm tests need corrections for multiple comparisons — happy to help you set those up properly.

Is this the same math the big testing tools use?

It's the classic frequentist two-proportion z-test that underpins most A/B testing platforms. Some tools layer on Bayesian methods or sequential testing — useful, but the core idea is the same.

A/B test significance calculator.

Control

Variation

Testing is our love language.