IDEAS home Printed from https://ideas.repec.org/p/osf/osfxxx/gmrk7.html
   My bibliography  Save this paper

Machine Learning for Propensity Score Estimation: A Systematic Review and Reporting Guidelines

Author

Listed:
  • Leite, Walter
  • Zhang, Huibin
  • collier, zachary
  • Chawla, Kamal
  • , l.kong@ufl.edu
  • Lee, Yongseok

    (University of Florida)

  • Quan, Jia
  • Soyoye, Olushola

Abstract

Machine learning has become a common approach for estimating propensity scores for quasi-experimental research using matching, weighting, or stratification on the propensity score. This systematic review examined machine learning applications for propensity score estimation across different fields, such as health, education, social sciences, and business over 40 years. The results show that the gradient boosting machine (GBM) is the most frequently used method, followed by random forest. Classification and regression trees (CART), neural networks, and the super learner were also used in more than five percent of studies. The most frequently used packages to estimate propensity scores were twang, gbm and randomforest in the R statistical software. The review identified many hyperparameter configurations used for machine learning methods. However, it also shows that hyperparameters are frequently under-reported, as well as critical steps of the propensity score analysis, such as the covariate balance evaluation. A set of guidelines for reporting the use of machine learning for propensity score estimation is provided.

Suggested Citation

  • Leite, Walter & Zhang, Huibin & collier, zachary & Chawla, Kamal & , l.kong@ufl.edu & Lee, Yongseok & Quan, Jia & Soyoye, Olushola, 2024. "Machine Learning for Propensity Score Estimation: A Systematic Review and Reporting Guidelines," OSF Preprints gmrk7, Center for Open Science.
  • Handle: RePEc:osf:osfxxx:gmrk7
    DOI: 10.31219/osf.io/gmrk7
    as

    Download full text from publisher

    File URL: https://osf.io/download/6704995657ca12bf85fcaf56/
    Download Restriction: no

    File URL: https://libkey.io/10.31219/osf.io/gmrk7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Kosuke Imai & Marc Ratkovic, 2014. "Covariate balancing propensity score," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(1), pages 243-263, January.
    2. Rosenbaum, Paul R., 2010. "Design Sensitivity and Efficiency in Observational Studies," Journal of the American Statistical Association, American Statistical Association, vol. 105(490), pages 692-702.
    3. Alberto Abadie & Guido W. Imbens, 2002. "Simple and Bias-Corrected Matching Estimators for Average Treatment Effects," NBER Technical Working Papers 0283, National Bureau of Economic Research, Inc.
    4. Ho, Daniel & Imai, Kosuke & King, Gary & Stuart, Elizabeth A., 2011. "MatchIt: Nonparametric Preprocessing for Parametric Causal Inference," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 42(i08).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Meyer, Maximilian & Hulke, Carolin & Kamwi, Jonathan & Kolem, Hannah & Börner, Jan, 2022. "Spatially heterogeneous effects of collective action on environmental dependence in Namibia’s Zambezi region," World Development, Elsevier, vol. 159(C).
    2. Kim, Ja Young & Bartholomew, Keith & Ewing, Reid, 2020. "Another one rides the bus? The connections between bus stop amenities, bus ridership, and ADA paratransit demand," Transportation Research Part A: Policy and Practice, Elsevier, vol. 135(C), pages 280-288.
    3. Chen, Shanting & Mallory, Allen B., 2021. "The effect of racial discrimination on mental and physical health: A propensity score weighting approach," Social Science & Medicine, Elsevier, vol. 285(C).
    4. Tenglong Li & Jordan Lawson, 2021. "A generalized bootstrap procedure of the standard error and confidence interval estimation for inverse probability of treatment weighting," Papers 2109.00171, arXiv.org.
    5. Riccardo D'Alberto & Francesco Pagliacci & Matteo Zavalloni, 2023. "A socioeconomic impact assessment of three Italian national parks," Journal of Regional Science, Wiley Blackwell, vol. 63(1), pages 114-147, January.
    6. Brian G. Vegetabile & Daniel L. Gillen & Hal S. Stern, 2020. "Optimally balanced Gaussian process propensity scores for estimating treatment effects," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 183(1), pages 355-377, January.
    7. Walter L. Leite & Burak Aydin & Dee D. Cetin-Berber, 2021. "Imputation of Missing Covariate Data Prior to Propensity Score Analysis: A Tutorial and Evaluation of the Robustness of Practical Approaches," Evaluation Review, , vol. 45(1-2), pages 34-69, February.
    8. Ruonan Xu, 2023. "Difference-in-Differences with Interference," Papers 2306.12003, arXiv.org, revised May 2024.
    9. Mawa, Christopher & Babweteera, Fred & Tumusiime, David Mwesigye, 2022. "Livelihood outcomes after two decades of co-managing a state forest in Uganda," Forest Policy and Economics, Elsevier, vol. 135(C).
    10. Meyer, Maximilian & Hulke, Carolin & Kamwi, Jonathan & Kolem, Hannah & Börner, Jan, 2021. "Spatially heterogeneous effects of collective action on environmental dependence in the Kavango-Zambezi Transfrontier Conservation Area," 2021 Conference, August 17-31, 2021, Virtual 315018, International Association of Agricultural Economists.
    11. Ruoqi Yu, 2021. "Evaluating and improving a matched comparison of antidepressants and bone density," Biometrics, The International Biometric Society, vol. 77(4), pages 1276-1288, December.
    12. Xinkun Nie & Guido Imbens & Stefan Wager, 2021. "Covariate Balancing Sensitivity Analysis for Extrapolating Randomized Trials across Locations," Papers 2112.04723, arXiv.org.
    13. Richard Aviles-Lopez & Juan de Dios Luna del Castillo & Miguel Ángel Montero-Alonso, 2023. "Exploratory Matching Model Search Algorithm (EMMSA) for Causal Analysis: Application to the Cardboard Industry," Mathematics, MDPI, vol. 11(21), pages 1-34, October.
    14. Lucija Muehlenbachs & Elisheba Spiller & Christopher Timmins, 2015. "The Housing Market Impacts of Shale Gas Development," American Economic Review, American Economic Association, vol. 105(12), pages 3633-3659, December.
    15. Noémi Kreif & Richard Grieve & Iván Díaz & David Harrison, 2015. "Evaluation of the Effect of a Continuous Treatment: A Machine Learning Approach with an Application to Treatment for Traumatic Brain Injury," Health Economics, John Wiley & Sons, Ltd., vol. 24(9), pages 1213-1228, September.
    16. Liao, Chuan & Jung, Suhyun & Brown, Daniel G. & Agrawal, Arun, 2024. "Does land tenure change accelerate deforestation? A matching-based four-country comparison," Ecological Economics, Elsevier, vol. 215(C).
    17. Turner, Alex J. & Fichera, Eleonora & Sutton, Matt, 2021. "The effects of in-utero exposure to influenza on mental health and mortality risk throughout the life-course," Economics & Human Biology, Elsevier, vol. 43(C).
    18. Dettmann, E. & Becker, C. & Schmeißer, C., 2011. "Distance functions for matching in small samples," Computational Statistics & Data Analysis, Elsevier, vol. 55(5), pages 1942-1960, May.
    19. Harris, J. Michael & Williams, Robert P. & Mishra, Ashok K., 2015. "The Effect of Gender on Productivity Status in U.S. Agriculture," 2015 AAEA & WAEA Joint Annual Meeting, July 26-28, San Francisco, California 205780, Agricultural and Applied Economics Association.
    20. Aragón, Fernando M., 2015. "Do better property rights improve local income?: Evidence from First Nations' treaties," Journal of Development Economics, Elsevier, vol. 116(C), pages 43-56.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:osf:osfxxx:gmrk7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: OSF (email available below). General contact details of provider: https://osf.io/preprints/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.