IDEAS home Printed from https://ideas.repec.org/a/eee/ejores/v316y2024i3p1058-1069.html
   My bibliography  Save this article

Propensity score oversampling and matching for uplift modeling

Author

Listed:
  • Vairetti, Carla
  • Gennaro, Franco
  • Maldonado, Sebastián

Abstract

In this paper, we propose a novel matching strategy to correct for confounding in uplift modeling. Our method, called propensity score oversampling and matching (ProSOM), extends the well-known propensity score matching (PSM) technique by addressing one of its main limitations: dealing with small datasets that face an imbalance in the distribution of the causal variable. Apart from this, we also face the additional complexity of dealing with class labels. The proposed method establishes a parallel between uplift modeling and class-imbalance classification as it extends existing oversampling techniques to create synthetic elements from the treatment group. We design an algorithm that performs classaware data oversampling in the treatment group, and then it matches samples from this group with the control group. This can be seen as a novel hybrid undersampling-oversampling solution for causal learning. Experiments on five datasets show the virtues of ProSOM in terms of predictive performance, achieving the best Qini coefficient for all five datasets in relation to PSM and other resampling solutions.

Suggested Citation

  • Vairetti, Carla & Gennaro, Franco & Maldonado, Sebastián, 2024. "Propensity score oversampling and matching for uplift modeling," European Journal of Operational Research, Elsevier, vol. 316(3), pages 1058-1069.
  • Handle: RePEc:eee:ejores:v:316:y:2024:i:3:p:1058-1069
    DOI: 10.1016/j.ejor.2024.03.024
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S037722172400225X
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ejor.2024.03.024?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. King, Gary & Zeng, Langche, 2001. "Logistic Regression in Rare Events Data," Political Analysis, Cambridge University Press, vol. 9(2), pages 137-163, January.
    2. De Bock, Koen W. & Coussement, Kristof & Lessmann, Stefan, 2020. "Cost-sensitive business failure prediction when misclassification costs are uncertain: A heterogeneous ensemble selection approach," European Journal of Operational Research, Elsevier, vol. 285(2), pages 612-630.
    3. Andrew K. Carlson & Julie G. Zaehringer & Rachael D. Garrett & Ramon Felipe Bicudo Silva & Paul R. Furumo & Andrea N Raya Rey & Aurora Torres & Min Gon Chung & Yingjie Li & Jianguo Liu, 2018. "Toward Rigorous Telecoupling Causal Attribution: A Systematic Review and Typology," Sustainability, MDPI, vol. 10(12), pages 1-17, November.
    4. Hinz, Oliver & Skiera, Bernd & Barrot, Christian & Becker, Jan, 2011. "Seeding Strategies for Viral Marketing: An Empirical Comparison," Publications of Darmstadt Technical University, Institute for Business Studies (BWL) 56543, Darmstadt Technical University, Department of Business Administration, Economics and Law, Institute for Business Studies (BWL).
    5. Koen W. de Bock & Kristof Coussement & Stefan Lessmann, 2020. "Cost-sensitive business failure prediction when misclassification costs are uncertain: A heterogeneous ensemble selection approach," Post-Print hal-02863245, HAL.
    6. King, Gary & Nielsen, Richard, 2019. "Why Propensity Scores Should Not Be Used for Matching," Political Analysis, Cambridge University Press, vol. 27(4), pages 435-454, October.
    7. Yu Xie & Charles F. Manski, 1989. "The Logit Model and Response-Based Samples," Sociological Methods & Research, , vol. 17(3), pages 283-302, February.
    8. Verbeke, Wouter & Olaya, Diego & Guerry, Marie-Anne & Van Belle, Jente, 2023. "To do or not to do? Cost-sensitive causal classification with individual treatment effect estimates," European Journal of Operational Research, Elsevier, vol. 305(2), pages 838-852.
    9. Alberto Abadie, 2021. "Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Aspects," Journal of Economic Literature, American Economic Association, vol. 59(2), pages 391-425, June.
    10. Marco Caliendo & Sabine Kopeinig, 2008. "Some Practical Guidance For The Implementation Of Propensity Score Matching," Journal of Economic Surveys, Wiley Blackwell, vol. 22(1), pages 31-72, February.
    11. Ruoqi Yu & Paul R. Rosenbaum, 2019. "Directional penalties for optimal matching in observational studies," Biometrics, The International Biometric Society, vol. 75(4), pages 1380-1390, December.
    12. Maldonado, Sebastián & López, Julio & Vairetti, Carla, 2020. "Profit-based churn prediction based on Minimax Probability Machines," European Journal of Operational Research, Elsevier, vol. 284(1), pages 273-284.
    13. Gubela, Robin M. & Lessmann, Stefan & Jaroszewicz, Szymon, 2020. "Response transformation and profit decomposition for revenue uplift modeling," European Journal of Operational Research, Elsevier, vol. 283(2), pages 647-661.
    14. Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881, October.
    15. Ho, Daniel & Imai, Kosuke & King, Gary & Stuart, Elizabeth A., 2011. "MatchIt: Nonparametric Preprocessing for Parametric Causal Inference," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 42(i08).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Akoh Fabien Yao & Maxime Sèbe & Laura Recuero Virto & Abdelhak Nassiri & Hervé Dumez, 2024. "The effect of LNG bunkering on port competitiveness using multilevel data analysis [L'effet du soutage par GNL sur la compétitivité des ports à l'aide de l'analyse de données à plusieurs niveaux]," Post-Print hal-04611804, HAL.
    2. Turati, Riccardo, 2024. "Network Abroad and Culture: Global Individual-Level Evidence," GLO Discussion Paper Series 1488, Global Labor Organization (GLO).
    3. Sirin, Selahattin Murat & Erten, Ibrahim, 2022. "Price spikes, temporary price caps, and welfare effects of regulatory interventions on wholesale electricity markets," Energy Policy, Elsevier, vol. 163(C).
    4. Francesco Manaresi & Carlo Menon & Pietro Santoleri, 2021. "Supporting innovative entrepreneurship: an evaluation of the Italian “Start-up Act” [The effects of entry on incumbent innovation and productivity]," Industrial and Corporate Change, Oxford University Press and the Associazione ICC, vol. 30(6), pages 1591-1614.
    5. Fukui Hideki, 2023. "Evaluating Different Covariate Balancing Methods: A Monte Carlo Simulation," Statistics, Politics and Policy, De Gruyter, vol. 14(2), pages 205-326, June.
    6. Uehleke, Reinhard & Petrick, Martin & Hüttel, Silke, 2022. "Evaluations of agri-environmental schemes based on observational farm data: The importance of covariate selection," Land Use Policy, Elsevier, vol. 114(C).
    7. Takahashi, Ryo, 2021. "How to stimulate environmentally friendly consumption: Evidence from a nationwide social experiment in Japan to promote eco-friendly coffee," Ecological Economics, Elsevier, vol. 186(C).
    8. Sung Jae Jun & Sokbae Lee, 2024. "Causal Inference Under Outcome-Based Sampling with Monotonicity Assumptions," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 42(3), pages 998-1009, July.
    9. Camboni, Riccardo & Corsini, Alberto & Miniaci, Raffaele & Valbonesi, Paola, 2021. "Mapping fuel poverty risk at the municipal level. A small-scale analysis of Italian Energy Performance Certificate, census and survey data," Energy Policy, Elsevier, vol. 155(C).
    10. Matilde Cappelletti & Leonardo M. Giuffrida, 2024. "Targeted Bidders in Government Tenders," CESifo Working Paper Series 11142, CESifo.
    11. Lahiri, Kajal & Yang, Liu, 2013. "Forecasting Binary Outcomes," Handbook of Economic Forecasting, in: G. Elliott & C. Granger & A. Timmermann (ed.), Handbook of Economic Forecasting, edition 1, volume 2, chapter 0, pages 1025-1106, Elsevier.
    12. Bokelmann, Björn & Lessmann, Stefan, 2024. "Improving uplift model evaluation on randomized controlled trial data," European Journal of Operational Research, Elsevier, vol. 313(2), pages 691-707.
    13. Florian Gunsilius & Yuliang Xu, 2021. "Matching for causal effects via multimarginal unbalanced optimal transport," Papers 2112.04398, arXiv.org, revised Jul 2022.
    14. Benítez-Peña, Sandra & Blanquero, Rafael & Carrizosa, Emilio & Ramírez-Cobo, Pepa, 2024. "Cost-sensitive probabilistic predictions for support vector machines," European Journal of Operational Research, Elsevier, vol. 314(1), pages 268-279.
    15. Goryunov, Alexander & Ageshina, Elena & Lavrentev, Igor & Peretyatko, Polina, 2023. "Estimating the effect of Russia’s development policy in the Far Eastern region: The synthetic control approach," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 72, pages 58-72.
    16. Mohamed Ali Marouani & Michelle Marshalian, 2020. "Winners and losers in industrial policy 2.0," WIDER Working Paper Series wp-2020-21, World Institute for Development Economic Research (UNU-WIDER).
    17. Finocchiaro Castro, Massimo & Guccio, Calogero & Rizzo, Ilde, 2023. "How "one-size-fits-all" public works contract does it better? An assessment of infrastructure provision in Italy," EconStor Preprints 270729, ZBW - Leibniz Information Centre for Economics.
    18. Ugur, Mehmet & Trushin, Eshref, 2018. "Asymmetric information and heterogeneous effects of R&D subsidies: evidence on R&D investment and employment of R&D personel," Greenwich Papers in Political Economy 21943, University of Greenwich, Greenwich Political Economy Research Centre.
    19. Gregory S. Crawford & Nicola Pavanini & Fabiano Schivardi, 2018. "Asymmetric Information and Imperfect Competition in Lending Markets," American Economic Review, American Economic Association, vol. 108(7), pages 1659-1701, July.
    20. Kengo Igei & Kana Takio & Keitaro Aoyagi & Yoshito Takasaki, 2021. "Vocational training for demobilized ex-combatants with disabilities in Rwanda," Journal of Development Effectiveness, Taylor & Francis Journals, vol. 13(4), pages 360-384, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:316:y:2024:i:3:p:1058-1069. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/eor .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.