Case Study: retailer improves ROI with incrementality testing (2025)

November 11, 2025

Gabrielle Thomson

Why do retailers overestimate marketing ROI?

Retail leaders often overestimate marketing ROI because they rely on observational attribution rather than causal evidence. Observational models struggle to separate correlation from causation when audiences self select into exposures and when measurement breaks across devices and identities. Randomized experiments, often called lift or incrementality tests, remain the cleanest way to measure causal impact because they hold unobserved confounders constant by design.¹ ² Retailers that shift from last click models to experiments usually discover spend that looks efficient is not incremental and that some mid funnel or geographic tactics quietly drive profitable growth.³

What is incrementality testing in plain terms?

Incrementality testing measures the lift caused by a campaign by comparing outcomes between a test group and a clean control group that did not receive the ads. Platform native implementations such as Meta Conversion Lift and Google’s Conversion Lift use randomized controlled trials and report the difference in conversions, revenue, or visits that would not have happened without the ads.⁴ ⁵ Geo experiments extend the same logic to matched markets where entire regions are randomized to treatment or control through geo targeted delivery.⁶ Incrementality focuses on causal lift, not attributed conversions, which makes it more reliable for budget allocation decisions.¹

How did the retailer define the problem?

The retailer, a national specialty chain with 250 stores and a fast growing ecommerce channel, faced rising acquisition costs and flat revenue. Last click reports suggested paid social and branded search were efficient. Executive leadership asked a sharper question. Do these channels create net new demand or mostly harvest existing demand? The team needed an evidentiary answer that would withstand scrutiny in finance, operations, and the board.

What methodology delivered credible evidence?

The team designed a two tier program that combined platform lift tests and geo experiments. Meta Conversion Lift measured person level incrementality for prospecting and retargeting.⁴ Google Ads Conversion Lift evaluated branded and non branded search.⁵ In parallel, a 12 week matched markets geo experiment randomized 28 statistically similar regions into treatment and control, with treatment regions receiving a pre defined media boost across social, search, and display.⁶ The post analysis used Bayesian structural time series to estimate the counterfactual and to compute cumulative incremental revenue with credible intervals.⁷ ¹² This structure created convergent evidence across identity based and geography based designs.

How were identity and data foundations handled?

The team stabilized identity resolution by aligning platform pixel events with server side conversions and by auditing deduplication rules. They enforced event naming consistency and ensured each platform’s test cell assignment persisted for the full window. This governance reduced contamination risk between treatment and control and ensured consistent revenue mapping. Lift test eligibility rules followed platform guidance to avoid overlap and to maintain randomization integrity.⁴ ⁵

What controls protected validity and power?

The team designed for power, not convenience. Sample size and expected lift determined the minimum budget and duration required to detect effects with acceptable uncertainty.¹ Treatment adherence was monitored daily with pace checks. The geo experiment used matched markets to reduce variance and to respect real operational constraints across stores.⁶ Seasonality and promotions were pre planned and mirrored across cells where possible. The Bayesian model included spikes and local trends, which kept inference stable through retail volatility.⁷

What did the experiments reveal?

The experiments revealed three decisive insights. Prospecting on paid social produced strong incremental revenue with a positive return on ad spend after all costs. Retargeting produced modest lift that did not cover incremental costs in most segments. Branded search delivered near zero incremental lift during the test windows, which indicated harvest rather than creation of demand. These results aligned between person level platform lift studies and the independent geo experiment, which increased confidence in the findings.

How did the team translate lift into ROI?

The team computed iROAS, defined as incremental revenue divided by incremental spend, for each tactic and region. They used cumulative lift and its credible interval to express a range for financial planning.¹ The analysis included contribution margin, not just revenue, and included media and measurement costs. Geo level estimates were then pooled using partial pooling to stabilize noisy regional estimates. The result was a portfolio view that ranked channels by causal effectiveness rather than by attributed conversions.

What financial impact did budget reallocation create?

The retailer reallocated 28 percent of paid media from low lift retargeting and branded search into lookalike prospecting and creative diversification. The portfolio iROAS improved materially in the following quarter, and same store sales increased in treatment like regions with matched promotional calendars. The evidence base persuaded finance to expand the high lift cells and to institutionalize quarterly experimentation as an operating rhythm.

How does incrementality compare to MMM and attribution?

Incrementality experiments answer the causal question for a defined time and audience. Marketing mix modeling estimates long run elasticities from historical data and can calibrate to experiment outcomes for better realism. Robyn, an open source MMM from Meta, enables teams to incorporate automated adstocking, saturation, and ground truth calibration using lift results.⁹ ¹⁴ Last click and multi touch attribution explain exposure paths but do not prove causality when selection bias is present.¹ The most resilient measurement stacks use experiments as anchors, MMM for portfolio planning, and pragmatic attribution for execution hygiene.

What risks or limitations should executives consider?

Executives should consider test contamination, underpowered designs, and operational drift. Lift tests require disciplined randomization and adherence.⁴ ⁵ Geo experiments require careful market matching and safeguards against media spillover across borders.⁶ Experiments can be costly when expected lifts are small, which is why power analysis and matched designs matter.¹ Synthetic control and Bayesian time series help when clean experimentation is impractical, but these approaches still rely on assumptions that must be tested and transparently reported.⁷ ¹²

Which measurement operating model works in practice?

Successful teams treat experimentation as a product, not a project. They maintain a backlog of decisions that require causal answers. They standardize designs, pre analysis plans, and reporting templates. They automate counterfactual estimation using maintained code and peer review. They align cadence to business rhythms such as promotional cycles and finance planning. They socialize learnings with narrative memos that link lift to margin, inventory, and service operations. This model keeps measurement credible and reusable across quarters.

What happened inside the retailer’s execution playbook?

The retailer embedded three changes. First, the media team adopted test to scale, which means no new line item received sustained investment without a passing lift test. Second, the analytics team added an always on geo framework so that national bursts could be converted into evidence with minimal new work.⁶ ¹¹ Third, the merchandising and CX teams integrated learnings into creative briefs and onsite experience, which increased relevance and reduced bounce. The flywheel improved because every quarter added fresh causal evidence rather than more attribution noise.

How should leaders get started in 30 days?

Leaders should start small and rigorous. Select one high spend tactic, such as retargeting, and one promising prospecting tactic. Design a single cell lift test in platform with enough power.⁴ ⁵ Stand up a parallel geo test in ten matched regions using a proven design toolkit.⁶ ¹¹ Implement Bayesian post analysis to estimate cumulative lift and uncertainty.⁷ Close the loop with a finance reviewed iROAS calculation and a decision memo. This first evidence brick builds momentum for a durable measurement program.

What is the lasting takeaway for CX and service transformation?

Customer experience improves when measurement rewards true value creation. Incrementality testing refocuses the organization on customers who change behavior because of the experience and the message. Experiments supply the evidence required to prioritize investments that raise lifetime value, reduce friction, and grow profitable demand. Leaders who make room for experiments in the operating system create a culture where decisions compound and where ROI stories are not just plausible but proven.¹ ²

FAQ

How does Meta Conversion Lift measure incremental conversions for retail campaigns?
Meta Conversion Lift uses randomized controlled trials to compare outcomes between people eligible to see ads and a holdout group, reporting incremental conversions and lift that would not have occurred without the ads.⁴

What is a geo experiment and when should a retailer use it?
A geo experiment randomizes entire geographic regions into treatment and control and delivers media through geo targeting to estimate incremental impact at market level. It is ideal for omnichannel retailers that need to measure store and ecommerce effects together.⁶

Which tools support counterfactual modeling for incrementality analysis?
Bayesian structural time series models, including the CausalImpact implementation, estimate the counterfactual trajectory and provide pointwise and cumulative lift with credible intervals when randomized experiments are not feasible or to complement them.⁷ ¹²

Why can last click or MTA overstate ROI compared to experiments?
Attribution methods are vulnerable to selection bias because users who see or click ads often differ from those who do not. Randomized experiments control for these differences and provide unbiased causal estimates of lift.¹

Which MMM solution can calibrate to experiment results for portfolio planning?
Robyn, an open source Marketing Mix Modeling package from Meta Marketing Science, supports automation, adstocking, saturation, and calibration with ground truth from experiments to improve planning decisions.⁹ ¹⁴

Which platforms offer built in incrementality testing for retail use cases?
Meta provides Conversion Lift studies within the Business suite, and Google Ads provides Conversion Lift for causal impact measurement of ad exposures on conversions.⁴ ¹⁵

Which open resources help teams design matched market tests quickly?
Google’s matched markets library and public papers on geo experiments provide templates and notebooks for designing and analyzing regional experiments with statistical rigor.⁶ ¹¹

Sources

The Unfavorable Economics of Measuring the Returns to Advertising — Randall A. Lewis, Justin M. Rao — 2015 — The Quarterly Journal of Economics. https://academic.oup.com/qje/article-abstract/130/4/1941/1914592⁸
Measuring Ad Effectiveness Using Geo Experiments — Vaver, Niklas; Koehler, Jim — 2011 — Google Research. https://services.google.com/fh/files/blogs/geo_experiments_final_version.pdf⁶
Measuring Ad Effectiveness Using Geo Experiments — Google Research overview page — 2011 — Google Research. https://research.google/pubs/measuring-ad-effectiveness-using-geo-experiments/¹
About Conversion Lift — Meta Business Help Center — 2023 — Meta. https://www.facebook.com/business/help/221353413010930/⁵
About Conversion Lift — Google Ads Help — 2025 — Google. https://support.google.com/google-ads/answer/12003020¹⁵
Estimating Ad Effectiveness using Geo Experiments in a Time-Based Regression Framework — Vaver, Niklas; Koehler, Jim — 2012 — Google Research. https://research.google.com/pubs/archive/45950.pdf¹⁶
Inferring causal impact using Bayesian structural time-series models — Brodersen, Gallusser, Koehler, Remy, Scott — 2015 — Annals of Applied Statistics. https://arxiv.org/abs/1506.00356²
CausalImpact R package documentation — Google — 2025 — CRAN mirror. https://mirror.las.iastate.edu/CRAN/web/packages/CausalImpact/CausalImpact.pdf¹²
Robyn: Open Source MMM from Meta Marketing Science — Meta — 2024 — GitHub README. https://github.com/facebookexperimental/Robyn/blob/main/README.md⁹
Robyn Releases and Version History — Meta — 2024 — GitHub Releases. https://github.com/facebookexperimental/Robyn/releases¹⁹
Matched Markets library for geo experiments — Google — 2025 — GitHub. https://github.com/google/matched_markets¹¹

Customer Experience & Operations

People

AI, Automation & Technology

Management Consulting

Explore the Business

Your Team

Doing Business

For You