Implementing holdout and ghost ads step by step

November 10, 2025

Gabrielle Thomson

Why should leaders test ad incrementality now?

Leaders confront rising acquisition costs and opaque attribution. A controlled experiment gives you a clean answer to the only question that matters: what changed because of the ads. A holdout test separates similar audiences into exposed and control groups, then measures the difference in downstream outcomes like sales or sign-ups. Platforms such as Google Ads and Meta provide native Conversion Lift workflows to randomize assignment and compute uplift.¹ ² Ghost ads extend this idea with a platform side “shadow” that records when an impression would have been served to a control user, which reduces cost and bias while preserving auction dynamics.³

What is a holdout test in plain language?

A holdout test is a randomized controlled trial in ad delivery. The platform puts a share of eligible users into a control cell and withholds ads for that cell. It assigns the rest to treatment and lets campaigns run as usual. At the end, the analyst compares conversions between cells to estimate incremental lift, not just observed volume. Google and Meta both define Conversion Lift as the increase in conversions directly caused by ad exposure relative to the holdout.¹ ² This definition avoids double counting with organic demand and isolates causal impact in a way that last click or media mix models cannot.¹

What are ghost ads and why do they matter?

Ghost ads are an experimentation method where the platform or DSP flags a “ghost impression” for a control user at the moment a real auction would have served the test ad. The system logs the would-have-served event and prevents delivery in control, so spend is not wasted on PSAs or artificial inventory. The method maintains auction symmetry and works with real-time bidding and optimization.³ It can reduce the budget cost of testing and improve precision compared with PSA control designs, especially when the platform optimizes delivery mid-flight.³ ⁵ The Trade Desk and other buy-side platforms describe conversion lift using ghost bidding to avoid control spend.⁴

When should teams prefer holdout vs ghost ads?

Teams should select holdouts when the ad platform offers first-party lift testing, straightforward setup, and enough eligible conversions. Google Ads and Meta provide end-to-end flows that suit in-platform campaigns and privacy constraints.¹ ² Teams should prefer ghost ads when they buy across exchanges through a DSP, need lower test cost, or face algorithmic optimization that would contaminate classic PSA control groups.³ ⁵ Cross-exchange buying and retargeting often benefit from predicted ghost ads, a variant that simulates auctions to tag would-have-served impressions where direct integration is not possible.⁵

How does holdout testing work on Google Ads and Meta?

A practical holdout test follows a simple sequence. First, define one outcome that the platform can attribute with confidence, such as purchases or app installs. Second, segment a clearly eligible audience and randomize assignment to exposed and control. Third, run the campaign until the platform’s power calculator indicates sufficient sample or for a fixed window, then read the lift report. Google Ads details user-based Conversion Lift where test and control are split and downstream conversions are compared over a defined window.¹ ⁸ Meta documents Conversion Lift with random assignment, a holdout that is not served the campaign, and a study report that presents incremental results.² Many teams use a 10 to 20 percent holdout for 10 to 14 days to balance power and opportunity cost.⁹

How do ghost ads run inside a DSP?

A DSP that supports ghost bidding assigns each eligible user to treatment or control at bid time. If the user is treatment, the DSP proceeds to bid. If the user is control, the DSP records that it would have bid and suppresses delivery, creating a logged ghost impression for fair comparison.⁴ This design avoids PSA waste and keeps the pacing and optimization logic stable because the model sees the same stream of opportunities.³ ⁴ In predicted ghost ads, the platform runs a simulated auction to mark would-have-served events when direct auction participation is not available.⁵

How do we design the audience and cells to avoid contamination?

Analysts should design disjoint cells, stable identity resolution, and clear exposure rules. One user should not appear in multiple cells. One identity should resolve consistently across web and app. A conversion window should start after randomization to avoid pre-exposure contamination. The scholarly literature on large scale display experiments highlights the importance of clean eligibility and carryover effects across time.⁶ Microsoft and other platforms recommend formal A/B infrastructure to split traffic and keep treatments isolated at the campaign level when platform lift testing is unavailable.⁷ Clear isolation prevents models from learning on control behavior and preserves internal validity.⁶ ⁷

How do we compute lift, ICE, and confidence?

Teams should compute three core metrics for decision making. First, incremental conversions equal treatment conversions minus control conversions, normalized by cell size if assignment is not 50:50.¹ Second, incremental cost per incremental conversion equals media spend divided by incremental conversions, which supports budgeting and scenario planning. Third, lift percentage equals the incremental conversions divided by control conversions. Platform reports compute these values and provide confidence intervals that reflect randomization and sample variance.¹ ² A well powered test offers tight intervals. A small test produces wide intervals that limit decisions. Keeping assignment random and exposure policies stable improves precision.³ ⁵

How do we implement a holdout test step by step?

Leaders can run a holdout test in seven steps. Step one defines the business question and the single north star outcome, such as completed orders. Step two sets the test scope by campaign, placement, and geography. Step three configures platform lift testing. Google Ads supports user-based Conversion Lift for Video, Discovery, and Demand Gen campaigns with guided setup.⁸ Meta supports Conversion Lift across campaigns with automatic test and control creation.² Step four sets the holdout share and length. Many teams start with a 10 to 20 percent holdout for 10 to 14 days.⁹ Step five locks creative, bids, budgets, and audiences. Step six monitors eligibility and sample accrual without peeking at effects. Step seven reads the final report and moves to scaled rollouts.

How do we implement ghost ads step by step?

Teams that buy through a DSP can implement ghost ads in eight moves. Move one enables the platform’s conversion lift feature with test and control assignment at bid time.⁴ Move two defines the inclusion criteria for bidding so the universe is consistent. Move three ensures the logger writes ghost impressions for control when the system would have bid, and writes delivered impressions for treatment.³ Move four synchronizes identity and conversion logging across web and app so the lift analysis sees the same people. Move five freezes pacing, budgets, and optimization rules. Move six selects a test window long enough to collect stable outcomes. Move seven computes lift using ghost and served impressions as the exposure basis. Move eight publishes ICE and confidence with a clear decision rule for scale up.

How do we control bias and improve precision in practice?

Teams should remove known sources of bias and increase precision with design choices. Randomized assignment removes selection bias from audience construction. Platform side randomization avoids leakage from targeting updates and pacing.¹ ² Ghost ads reduce PSA waste and preserve auction symmetry, which can improve precision by an order of magnitude in some settings.⁵ Pre-registration of hypotheses prevents outcome switching. Consistent identity resolution reduces mis-attribution. Large experiments with billions of observations show that many campaigns deliver small but statistically significant effects, so power and clean execution matter.⁶ Using a standard A/B framework in search and shopping helps validate lift estimates against performance changes.⁷

How do we operationalize governance, privacy, and ethics?

Teams should respect privacy by relying on platform side assignment and aggregated reporting where possible. Platform lift testing on Google and Meta keeps randomization and lift calculation within the platform boundary, which reduces data movement.¹ ² Ghost ads operate at the auction log level and can be implemented without user level exports by using platform analytics.⁴ Leaders should document the test purpose, assignment rules, exposure policies, and retention schedule. Analysts should monitor for unequal opportunity to see ads across cells and adjust with weighting if necessary. Legal teams should confirm that experimentation and audience selection comply with regional regulations and first party consent standards.

How do we turn results into confident decisions?

Executives should use a simple decision table that links ICE and lift to scale or stop. A campaign that shows positive incremental conversions and an incremental cost below target CPA should scale. A campaign that shows lift but an ICE above target should optimize creative or bids and retest. A campaign that shows no lift should stop. Platform reports provide the core values and intervals needed for this decision.¹ ² DSP logs from ghost ads provide the same values with lower test cost.³ ⁴ Teams should run lift tests as a quarterly operating rhythm and build a library of effect sizes by audience, creative, and placement to inform planning.

What changes when product and brand share budgets?

Executives can split attribution and lift testing by objective. A product team can test retargeting with purchase outcomes while a brand team tests upper funnel video with site visits. Google and Meta support multiple conversion types and attribution windows in lift studies, which allows role specific reads.¹ ² A DSP can implement ghost ads on prospecting and retargeting separately by tagging would-have-served events by tactic.⁴ The literature on the display ad effectiveness funnel suggests carryover and lag that varies by objective, so windows and reads should match the outcome.⁶ Clear objectives and windows produce actionable reads and better investment governance.

What does success look like within 90 days?

A mature program ships two to three well powered lift tests, deprecates underperforming tactics, and scales winners. Search and shopping experiments in Microsoft Advertising or similar platforms help validate improvements in blended performance.⁷ Platform lift tests in Google Ads and Meta provide a causal baseline that improves budgeting, forecasting, and creative iteration.¹ ² DSP ghost ads extend programmatic reach while keeping test costs down and precision high.³ ⁴ Leaders who run this playbook reduce wasted spend, align teams on causal impact, and steer decisions with confidence.

FAQ

What is the difference between a holdout test and a ghost ads test in advertising?
A holdout test withholds ads from a randomized control group and compares outcomes with an exposed group.¹ ² A ghost ads test records would-have-served impressions for control at auction time, suppresses delivery, and compares outcomes with delivered impressions, which reduces control spend and improves precision.³ ⁴ ⁵

How do Google Ads and Meta run Conversion Lift experiments?
Google Ads splits eligible users into exposed and control groups and reports incremental conversions for supported campaign types.¹ ⁸ Meta creates similar test and holdout audiences and reports incremental results across selected campaigns.² Both workflows randomize assignment and compute lift inside the platform.

Which metrics should executives read to decide whether to scale a campaign?
Executives should read incremental conversions, lift percentage, and incremental cost per incremental conversion. Platform lift reports and DSP ghost ad logs provide these values with confidence intervals for decision making.¹ ² ⁴

Why do many teams choose a 10 to 20 percent holdout for about two weeks?
Teams balance statistical power and opportunity cost. A 10 to 20 percent holdout for 10 to 14 days is a common starting point that allows enough sample for a stable read without excessive budget impact on the campaign.⁹

Who benefits most from ghost ads through a DSP like The Trade Desk?
Programmatic buyers who run cross-exchange prospecting or retargeting benefit because ghost bidding tags control opportunities at auction time, avoids PSA waste, and maintains optimization symmetry.⁴ ⁵

What evidence shows that careful design matters in display ad experiments?
A large study of 432 field experiments across the Google Display Network details the importance of clean eligibility, carryover effects, and causal estimation at scale, which underscores the need for disciplined design.⁶

Which steps help Customer Experience leaders operationalize lift testing quickly?
Leaders can use Google Ads and Meta lift workflows for in-platform campaigns, add ghost ads in the DSP for programmatic, standardize ICE decision rules, and schedule quarterly tests to build an internal effect size library for Customer Science style governance.¹ ² ⁴

Sources

About Conversion Lift. Google Ads Help. 2024. Google. https://support.google.com/google-ads/answer/12003020
About Conversion Lift. Meta Business Help Center. 2023. Meta. https://www.facebook.com/business/help/221353413010930
Ghost Ads: Improving the Economics of Measuring Ad Effectiveness. Johnson G., Lewis R., Nubbemeyer E. 2020. Marketing Science Institute Working Paper. https://thearf-org-unified-admin.s3.amazonaws.com/MSI/2020/06/MSI_Report_15-122.pdf
Best practices for better conversion lift. The Trade Desk. 2023. The Trade Desk. https://www.thetradedesk.com/resources/best-practices-for-better-conversion-lift
Ghost Ads methodology draft. Johnson G., Lewis R., Nubbemeyer E. 2016. NBER Conference Draft PDF. https://conference.nber.org/confer/2016/EoDs16/Johnson_Lewis_Nubbemeyer.pdf
The Online Display Ad Effectiveness Funnel & Carryover: Lessons from 432 Field Experiments. Johnson G., Lewis R., Nubbemeyer E. 2017. Working Paper. https://marketing.wharton.upenn.edu/wp-content/uploads/2017/08/Johnson-Garrett-PAPER-VERSION-2.pdf
Experiment Data Object. Microsoft Advertising Campaign Management Service. 2024. Microsoft. https://learn.microsoft.com/en-us/advertising/campaign-management-service/experiment
Set up Conversion Lift based on users. Google Ads Help. 2024. Google. https://support.google.com/google-ads/answer/12005564
METHOD: Run a lock-down incrementality test on Meta ads. Reforge. 2025. Reforge. https://www.reforge.com/guides/method-run-a-lock-down-incrementality-test-on-meta-ads

Customer Experience & Operations​

People

AI, Automation & Technology

Management Consulting

Explore the Business

Your Team

Doing Business

For You