- Science Says
- Posts
- Your Meta ads A/B tests are misleading
Your Meta ads A/B tests are misleading
Tests on Instagram, Facebook, and similar platforms are not real A/B tests because they are shown to different audiences.
New to Science Says? This is a 3min practical summary of a scientific study 🎓 Join 31,439 marketers who use science, not flawed opinions 📈 Subscribe here.
This insight is brought to you by… KeepCart
Increase your margins 4-15% with discount coupon protection.
Someone is probably abusing your coupons right now. And the consequences on your margins are bigger than you think.
That’s why you need a partner like KeepCart. Monitor leaks on sites like Honey, CapitalOne, RetailMeNot - and stop them from eating into your profits. Try it for free for 14 days.
Want to sponsor Science Says? Here’s all you need to know.
📝 Intro
You’re launching a video ad campaign on both Meta and TikTok. You’ve prepped two creatives to test, but to save resources, you A/B test only on Meta. After a few weeks, version A shows a 30% lower CPA, so you confidently pour all your budget across both platforms on it.
You’re sure the creative you chose will get you the best results - data doesn’t lie… right?
Science says you might have to be more careful, as A/B testing data from Meta might not be as reliable as we thought.
P.S.: Last week, I was invited at a AI Horizons at Wharton talk with Prof. Stefano Puntoni where we went through our favourite insights on how to design smarter chatbots and where we are in our understanding of AI-human interactions and psychology.
🤖 You can check out the 29-minute recording for free.
Want hundreds more insights like these? Explore all Science Says insights here.
Digital ad A/B tests tell you much less than you’d think
Topics: Ads | Social Media
For: Both B2C and B2B
Research date: August 2024
Universities: Southern Methodist University, University of Michigan
📈 Recommendation
When A/B testing ads on Meta or other digital ad platforms, be careful how you interpret results. Results can be misleading because each version is shown to different audiences (e.g. one version is shown to more female vs men).
Your A/B testing is:
Reliable if you're optimizing a campaign and you stick to the same platform you tested on (e.g. Don’t generalize Facebook results to YouTube).
Misleading if you're testing different products, messages, prices, landing pages, and similar variables a new product or strategy

🎓 Findings
A/B tests on Meta are unreliable for testing broader marketing strategies and tactics, outside of the scope of optimising campaigns on it.
In a field experiment with Meta ads combined with models and simulations of what was expected and what actually happened, researchers found many inconsistencies. For example:
Different ads were shown to different gender mixes, with some audiences being up to 57% women.
Ads that featured a domestic violence officer were more likely to be targeted towards women (52.2% female), while ads with a male patrol officer were more likely to be shown to men (52.6% male)
Emotional appeals were targeted more at women than men
The effect of this distortion is stronger when genders respond differently to the creatives (e.g. the more men convert with a male patrol officer ad, the more Meta will show that ad to men vs women).
🧠 Why it works
Meta’s A/B testing algorithm’s goal is to maximize performance by matching each creative with the audience that converts best to it. This is known as divergent delivery or skewed delivery.
To do this, the algorithm targets creatives towards the most responsive audiences.
This leads to the different versions being shown to different audiences.
As a result, A/B tests don’t reveal which ad is objectively better, but which creative performs best with the audience it was shown to.
This makes A/B test results useful only for optimizing campaigns within that specific platform, with that same audience.
However, if used to test broader marketing decisions (e.g. messaging, product validation), or to choose which creative to use on other platforms (e.g. Using Meta’s top performer on TikTok or Youtube), A/B test results are unreliable.
High-ranking sites don’t just have great content—they have the right backlinks. Google’s algorithm favors authority, and the fastest way to build it is through high-DR links from trusted sources.
B2B SaaS brands like Pitch, Surfshark, and LegalZoom work with dofollow.com to earn links that move the needle. More trust, better rankings, higher conversions.
This announcement was sponsored. Want your brand here? Click here.
✋ Limitations
The study only looked at Meta (ads on Facebook and Instagram), although the effect is likely found on other digital advertising platforms too (e.g. Google ads).
The study only considered audiences in Detroit aged 18-40, broader or narrower audience parameters might impact how strong the distortion is.
The study only looked at differences in gender, it is likely that the audiences differed in other ways that were not considered as they relate to proprietary factors Meta uses to target audiences.
The study only looked at specific ads and audiences (e.g. recruitment ads in Detroit). Although not tested, it’s likely that the effect is the same for A/B testing on other industries and audiences.
🏢 Companies using this
Marketers widely consider A/B tests, including Meta ads reliable.
Authoritative sources like Forbes advise marketers to use A/B testing to gather insights into consumers’ preferences and use them for overall strategy. However, insights from this type of A/B tests do not provide a reliable picture of consumers and could lead to suboptimal decisions.
Aligned with Forbes’ recommendation, Social analytics platform, Social Insider suggests using social media A/B test to validate new ideas and messaging. This flawed approach is widely endorsed and used.
As a workaround to check the validity of digital A/B tests results for other purposes, Hubspot suggests using A/A tests (e.g. on pricing or product messaging). This requires testing the winning creative against an identical version of itself, if results are the same for both, the test can be considered reliable.
Meta’s official website says their A/B tests audiences are “random” to assure marketers about the reliability of data. This study proves that’s not the case.
⚡ Steps to implement
If your goal is to know which ads will perform best in a specific advertising platform (e.g. you’re using A/B testing on Meta to find the best creative for a Meta campaign exclusively), then carry on using A/B testing.
Repeat A/B tests for creatives on each platform you want to run them.
Don’t generalize A/B test findings from digital platforms to other marketing decisions (e.g. using the winning ad message for website copy) as the data cannot be generalized to wider consumer insights.
If you want to validate a new idea (e.g. testing a new product or messaging), use alternative ways to gather insights. For example, consider surveys, interviews, or A/B tests directly on websites and landing pages.
If you aren’t already - pre-optimize your ads using the Science-based Playbook of Ad Creatives.
🔍 Study type
Field experiment (14 ads with 533,161 impressions across 3 weeks with 96,150 unique users).
📖 Research
Where A/B Testing Goes Wrong: How Divergent Delivery Affects What Online Experiments Cannot (and Can) Tell You About How Customers Respond to Advertising. Journal of Marketing (August 2024).
🏫 Researchers
Michael Braun. Southern Methodist University
Eric Schwartz. University of Michigan
Remember: This is a new scientific discovery. In the future it will probably be better understood and could even be proven wrong (that’s how science works). It may also not be generalizable to your situation. If it’s a risky change, always test it on a small scale before rolling it out widely.
What did you think of today's insight?Help me make the next insights 🎓 even more useful 📈 |
Here is how else Science Says can help your marketing:
📈 Join the Science Says Platform to unlock all 250+ insights, real-world case studies, and exclusive playbooks
📘 Boost your sales and profits with topic-specific Science-based Playbooks (e.g. Pricing, Ecommerce, SaaS, AI Best Practices)
🔬 Get insights and playbooks tailored to your team’s needs. My team of PhDs and I regularly help leading brands in FMCG, retail, and tech. Find out more.
🎓 It took 3 of us 13.5 hours to accurately turn this 25-page research paper into this 3min insight.
If you enjoyed it please share it with a friend, or share it on LinkedIn and tag me (Thomas McKinlay), I’d love to engage and amplify!
If this was forwarded by a friend you can subscribe below for $0 👇