What is a propensity model and why do CX leaders rely on it?
Propensity models predict the probability that an individual will take a future action such as converting, churning, or upgrading. A propensity score is the model’s estimate of that probability for a given customer based on historic behaviors and attributes.¹ Propensity modeling is widely used across digital, subscription, and B2B funnels to prioritize leads, trigger journeys, and set offer depth because the output is stable, interpretable, and easy to operationalize in marketing and service platforms.² The construct is simple. The model learns a mapping from customer features to a binary or continuous outcome and then ranks customers by likelihood. This direct ranking makes resource allocation intuitive for executives who manage sales, service, and retention budgets.¹
What is uplift modeling and how does it change the decision?
Uplift modeling, also called incremental or true lift modeling, predicts the individual treatment effect. The treatment effect is the difference in expected outcome if a specific intervention happens versus if it does not happen.³ In customer experience and service operations, the treatment might be a retention offer, a proactive service call, or a fee waiver. An uplift model ranks customers by expected incremental impact rather than raw likelihood. This distinction matters because a high-propensity customer may buy anyway, and a low-propensity customer may not buy even with an offer. Uplift modeling aims to target the persuadable middle where the intervention changes the outcome.³
Where does each approach shine in real-world CX programs?
Propensity models fit decisions where the action does not meaningfully alter the underlying probability or where the action cost is negligible. Examples include forecasting inbound volume for staffing, flagging at-risk accounts to monitor, or ranking self-service content by predicted usefulness.¹ Propensity also works when compliance or policy requires universal treatment, since the primary need is prioritization rather than selection.²
Uplift models fit decisions where the action is costly, capacity is constrained, or the action can backfire. Classic examples include retention save offers, discounting strategies, and outbound service contacts. By modeling incremental effect, uplift helps exclude sure things and lost causes, and it helps avoid contacting do-not-disturb segments that churn when pushed.³
How do the mechanisms differ under the hood?
A propensity model directly estimates P(Y=1 | X), where Y is the behavior and X are features. The score ranks customers by likelihood.¹ In contrast, uplift modeling estimates the conditional average treatment effect, often denoted τ(X) = E[Y|T=1,X] − E[Y|T=0,X], where T marks treatment. This is a causal quantity that requires either randomized control data or strong assumptions about confounding. Uplift learners and causal forests estimate this difference directly and produce an individualized effect estimate rather than a raw probability.³ ⁴
Causal forest methods extend random forests to estimate heterogeneous treatment effects with valid uncertainty intervals. These models partition the feature space to find where treatment effects vary and provide principled inference for segment-level decisions.⁴ Double or debiased machine learning offers another route to estimate treatment effects reliably when using flexible models for nuisance components such as propensity scores or outcome regressions.⁵
How do you evaluate performance fairly?
Propensity models use familiar metrics like AUC and lift charts that measure ranking quality against observed outcomes.¹ Uplift models use uplift curves and the Qini coefficient, which quantify the incremental gains from targeting by predicted uplift versus random contact. The Qini coefficient generalizes the Gini concept to true incremental response and is computed as the area between the model’s incremental gains curve and the random baseline.⁶ ⁷ These metrics align evaluation to the actual objective, which is incremental impact per contact or per dollar, not raw response.⁶
What data and design choices protect validity?
Executives should design experiments and data pipelines that support causal answers. Randomized control groups provide the cleanest identification for uplift, because they expose outcomes with and without treatment for comparable cohorts.³ When randomization is not feasible, methods such as causal forests and double machine learning can reduce bias if unconfoundedness holds and if the propensity to be treated is properly modeled.⁴ ⁵ Careful feature selection, leakage checks, and time-based validation remain essential for both approaches to prevent optimistic bias that will not replicate in production.¹
Fairness testing matters when models change who gets contacted, who receives offers, or who is escalated for service. Recent work in marketing science shows how to evaluate and mitigate disparate impact in uplift models by testing for treatment effect differences across protected groups and by constraining optimization to fairness criteria.⁸ This protects brand equity and reduces regulatory risk while maintaining measurable incremental value.⁸
When should leaders prefer propensity vs uplift in budgeting cycles?
Choose propensity when the decision is ranking within a mandatory action set or when you need a stable probability to feed downstream rules. Choose uplift when your budget, headcount, or offer inventory is limited and when the intervention can have negative effects on some customers. For example, a retention desk should prioritize uplift to reduce waste on customers who would renew anyway and to avoid contacting customers who would churn when nudged.³ By contrast, a forecasting analyst may prefer propensity to predict self-service success rates and plan staffing, since the intervention is unavailable or uniform.¹
How do uplift methods compare across maturity levels?
Early-stage teams can begin with two-model approaches that train separate outcome models for treated and control groups and then subtract their predictions to form uplift scores.³ As maturity grows, teams can adopt direct uplift learners and causal forests that optimize for treatment effect estimation using splitting rules built for heterogeneity discovery.⁴ In regulated or high-stakes contexts, teams can use double machine learning to obtain treatment effect estimates with valid inference while leveraging modern feature sets and regularization.⁵ Investing in measurement infrastructure for Qini and incremental ROI closes the loop and prevents local optimizations on the wrong metric.⁶
What risks matter, and how do you avoid them?
The first risk is optimizing for response rather than incrementality. Targeting high propensity can look successful while adding little net revenue once control outcomes are accounted for.³ The second risk is harmful contact. Uplift methods reveal do-not-disturb segments that should be suppressed to protect retention and satisfaction.³ The third risk is bias from nonrandom exposure, which can be reduced through experimental design, careful covariate control, and modern causal learners.⁴ ⁵ The fourth risk is fairness drift across demographics, which requires explicit tests and constraints in deployment.⁸
How do you prove impact and sequence adoption?
Start by isolating a single high-leverage decision such as retention saves or offer depth. Run an A/B test that compares business-as-usual targeting to uplift targeting with a holdout control. Use Qini, incremental conversion, and incremental margin per contact as primary outcomes.⁶ Pair the test with customer-level logs that track treatment, scores, and outcomes to support audit and learning. Scale by integrating uplift segments into journey orchestration, service routing, and contact center playbooks, while maintaining small control groups for continuous measurement.³ Mature programs add causal forest dashboards with confidence intervals by segment, and apply double machine learning pipelines to high-dimensional channels such as email and in-app messaging.⁴ ⁵
What is the practical rule of thumb for executives?
Use propensity to rank likelihood when your action is uniform or unavoidable. Use uplift to rank changeability when your action is selective, costly, or risky. Anchor your evaluation on Qini and incremental ROI rather than AUC alone. Invest in experiments and causal tooling that match your decision stakes. This approach aligns customer science with financial rigor and delivers measurable, defensible gains in revenue, retention, and customer trust.³ ⁶
FAQ
What is the difference between a propensity score and an uplift score?
A propensity score estimates how likely a customer is to take an action, while an uplift score estimates how much an intervention changes that likelihood for that customer.¹ ³
Why should a retention team prefer uplift models over propensity models?
A retention team faces costly offers and limited capacity. Uplift models target persuadable customers and avoid sure things, lost causes, and do-not-disturb segments, which improves incremental renewals and margins.³
Which metrics should we use to evaluate uplift models?
Use uplift curves and the Qini coefficient to measure incremental gains versus random targeting, and report incremental conversion or margin per contact as business KPIs.⁶
How do causal forests support CX use cases?
Causal forests estimate heterogeneous treatment effects with valid inference, enabling segment-level decisions about who to contact, what to offer, and where treatment works best.⁴
How does double machine learning reduce bias in treatment effect estimation?
Double machine learning separates the causal parameter from high-dimensional nuisance components and uses orthogonalization to deliver robust estimates when flexible models are used.⁵
Which situations still call for propensity modeling?
Use propensity when actions are uniform or mandatory, such as forecasting or universal notifications, or when you need a stable probability to feed rules and staffing plans.¹
What fairness considerations apply to uplift targeting?
Teams should test treatment effect differences across protected groups and apply fairness constraints where needed to prevent disparate impact in offer allocation and contact decisions.⁸
Sources
“What Is Propensity Modeling? Using Data to Predict Behavior.” Sharan, CXL, 2023, blog. https://cxl.com/blog/propensity-modeling/
“Propensity Modelling: Definition, types and use cases.” Impression, 2024, blog. https://www.impressiondigital.com/blog/propensity-modelling/
“Uplift modelling.” Devriendt, Moldovan, Verbeke, Wikipedia entry with references, 2024, reference article. https://en.wikipedia.org/wiki/Uplift_modelling
“Estimation and Inference of Heterogeneous Treatment Effects using Random Forests.” Wager and Athey, 2015, arXiv preprint. https://arxiv.org/abs/1510.04342
“Double/Debiased Machine Learning for Treatment and Causal Parameters.” Chernozhukov et al., 2018, Econometrics Journal. https://academic.oup.com/ectj/article/21/1/C1/5056401
“qini: Computes the Qini Coefficient Q.” CRAN R package ‘uplift’ documentation, 2014, software documentation. https://rdrr.io/cran/uplift/man/qini.html
“Qini-based Uplift Regression.” Belbahri et al., 2019, arXiv preprint. https://arxiv.org/pdf/1911.12474
“Fairness testing for uplift models.” Lo et al., 2024, Journal of Marketing Analytics. https://link.springer.com/article/10.1057/s41270-024-00339-6