When a Failed A/B Test is Still a Success

If the current design beats the new design in an A/B test (a.k.a “split test”), the experiment is called a failure. But it’s the best kind of failure there is.

Let’s say you have an idea that you think will work. You try that idea out… and find out that no, it doesn’t work. In regular life, we tend to call that failure. In a scientific context, however, it’s not failure at all.

A/B tests are little science experiments we run to improve our products and increase revenue. But we tend to use regular-life terms when discussing the outcomes: if the new design doesn’t beat the current one, we call it a “failure”.

That’s okay, as long as we remember that in A/B testing, failures can still be successes.

Success, and two kinds of failure

In A/B tests (a.k.a. split tests), you try out a prospective design (B) concurrently against your existing design (A), resulting in one of these four basic outcomes:

1. Positive result (Success) – B performed better than A (by enough to meet your threshold for success).

Going into a test, your philosophy may simply be “better is better.” Or you might have predetermined a performance delta value that design B must exceed in order to be worth using.

2. Neutral result (“Failure”) – B performed effectively the same as A (or insignificantly better for your purposes).

3. Negative result (“Failure”) – B performed worse than A.

4. Invalid test (Actual failure) – The test was compromised in some fashion such that the data is useless.

Perhaps design B had software bugs that design A did not. Or mechanisms were not put in place at the start of the test to collect the data needed for evaluation.

Whatever the reason, if you determine the test was compromised then you should toss the results, fix the test infrastructure problem, and run another test to determine the real outcome of B vs A.

When B performs way better than A, it’s an obvious win. But actually, all three outcomes of a valid A/B test—positive, neutral, and negative results—should be considered successes.

Here’s why.

How A/B test “failures” can be successes

You learn something from each A/B test

At the very least, you learned that your design didn’t work as well as you thought it would.

It sounds silly, but if you hadn’t tested that, you wouldn’t know. Seriously. You might’ve been arguing about the theoretical benefits of that design change in meetings for years to come. Now you can move on and argue about something else.

If you go further than that simple lesson, there is…

An opportunity to learn even more

Okay, so your design didn’t perform as expected. Think about the results, make connections, capture conclusions, and figure out what to do next.

Is the underlying idea still sound?

Perhaps you should try a variation on the design and see if that works better. If that doesn’t work, maybe the idea could be expressed in a completely different way.

Did your new design fail because you don’t understand your users as well as you should?

Only you can figure this out. If you think you have a knowledge gap, you can plan some research to fill it.

But if you honestly think your user research game is solid, you might be right; an A/B test failure doesn’t prove you didn’t do enough user research.

Did you just learn something about your users that could only come from A/B testing?

All the interviews, surveys, ethnography, and usability sessions in the world are not going to tell you whether your users are 12% more likely to click a button labeled “Free to Try!” over “Try for Free!”. That’s the kind of thing you can only learn about your users from A/B testing.

A springboard for better ideas

A/B test failures answer a question, pose new questions, and inspire new thinking. The more you learn about what works and what doesn’t work, the less scattershot your future design changes will be.

It was probably cheaper than blindly rolling out the design

If an A/B test has a negative outcome, the impact on your user base is minimal: only the “B” participants saw the change, over a limited period of time.

If you had instead pushed that design directly out to all your users, it could’ve meant significant lost revenue. And because you couldn’t compare results to a concurrent control group, it would have taken longer to notice and would have been harder to determine the cause.

For a neutral-outcome design, a direct roll-out might be a bit less expensive than testing it first—but you’d never really know if it was or not.

Conclusion

A/B tests allow you to:

measure the relative success of a new design;
keep failures from annoying your users and negatively impacting revenue; and
gain valuable understanding about your users.

You’ll have more failures than big wins. That’s fine. Resist the temptation to bury the failures. By examining them, you can turn each one into a success worth sharing.

Want to get more insight into how other companies conduct A/B testing? Check out our article: How Netflix Does A/B Testing (And You Can, Too)

Learn about how BetaTesting can help your company launch better products with our beta testing platform and huge community of global testers.