IDEAS home Printed from https://ideas.repec.org/a/gam/jstats/v7y2024i4p82-1420d1528246.html
   My bibliography  Save this article

Evaluating Imputation Methods to Improve Prediction Accuracy for an HIV Study in Uganda

Author

Listed:
  • Nadia B. Mendoza

    (Department of Mathematics and Statistics, San Diego State University, San Diego, CA 92182, USA)

  • Chii-Dean Lin

    (Department of Mathematics and Statistics, San Diego State University, San Diego, CA 92182, USA)

  • Susan M. Kiene

    (Department of Disease Control and Environmental Health, Makerere University School of Public Health, Kampala P.O. Box 7072, Uganda
    Division of Epidemiology and Biostatistics, San Diego State University School of Public Health, San Diego, CA 92182, USA)

  • Nicolas A. Menzies

    (Department of Global Health and Population, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA)

  • Rhoda K. Wanyenze

    (Department of Disease Control and Environmental Health, Makerere University School of Public Health, Kampala P.O. Box 7072, Uganda
    Division of Epidemiology and Biostatistics, San Diego State University School of Public Health, San Diego, CA 92182, USA)

  • Katherine A. Schmarje

    (Division of Epidemiology and Biostatistics, San Diego State University School of Public Health, San Diego, CA 92182, USA)

  • Rose Naigino

    (Department of Disease Control and Environmental Health, Makerere University School of Public Health, Kampala P.O. Box 7072, Uganda
    Division of Epidemiology and Biostatistics, San Diego State University School of Public Health, San Diego, CA 92182, USA)

  • Michael Ediau

    (Division of Epidemiology and Biostatistics, San Diego State University School of Public Health, San Diego, CA 92182, USA
    Department of Health Policy, Planning and Management, Makerere University School of Public Health, Kampala P.O. Box 7072, Uganda)

  • Seth C. Kalichman

    (Institute for Collaboration on Health, Intervention and Policy, University of Connecticut, Storrs, CT 06269, USA)

  • Barbara A. Bailey

    (Department of Mathematics and Statistics, San Diego State University, San Diego, CA 92182, USA)

Abstract

Standard statistical analyses often exclude incomplete observations, which can be particularly problematic when predicting rare outcomes, such as HIV positivity. In the linkage to the HIV care dataset, there were initially 553 complete HIV positive cases, with an additional 554 cases added through imputation. Imputation methods amelia , hmisc , mice and missForest were evaluated. Simulations were conducted across various scenarios using the complete data to guide imputation for the full dataset. A random forest model was used to predict HIV status, assessing imputation precision, overall prediction accuracy, and sensitivity. While missForest produced imputed values closer to the observed ones, this did not translate into better predictive models. Hmisc and mice imputations led to higher prediction accuracy and sensitivity, with median accuracy increasing from 64% to 76% and median sensitivity rising from 0.4 to 0.75. Hmisc and amelia were the fastest imputation methods. Additionally, oversampling the minority class combined with undersampling the majority class did not improve predictions of new HIV positive cases using only the complete observations. However, increasing the minority class information through imputation enhanced sensitivity for predicting cases in this class.

Suggested Citation

  • Nadia B. Mendoza & Chii-Dean Lin & Susan M. Kiene & Nicolas A. Menzies & Rhoda K. Wanyenze & Katherine A. Schmarje & Rose Naigino & Michael Ediau & Seth C. Kalichman & Barbara A. Bailey, 2024. "Evaluating Imputation Methods to Improve Prediction Accuracy for an HIV Study in Uganda," Stats, MDPI, vol. 7(4), pages 1-16, November.
  • Handle: RePEc:gam:jstats:v:7:y:2024:i:4:p:82-1420:d:1528246
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2571-905X/7/4/82/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2571-905X/7/4/82/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. van Buuren, Stef & Groothuis-Oudshoorn, Karin, 2011. "mice: Multivariate Imputation by Chained Equations in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i03).
    2. Honaker, James & King, Gary & Blackwell, Matthew, 2011. "Amelia II: A Program for Missing Data," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i07).
    3. King, Gary & Honaker, James & Joseph, Anne & Scheve, Kenneth, 2001. "Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation," American Political Science Review, Cambridge University Press, vol. 95(1), pages 49-69, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. World Bank & Organisation for Economic Co-operation and Development, 2017. "A Step Ahead," World Bank Publications - Books, The World Bank Group, number 27527.
    2. Cohen, Joseph N, 2010. "Neoliberalism’s relationship with economic growth in the developing world: Was it the power of the market or the resolution of financial crisis?," MPRA Paper 24527, University Library of Munich, Germany.
    3. Lara Lopez & Fernando L. Vázquez & Ángela J. Torres & Patricia Otero & Vanessa Blanco & Olga Díaz & Mario Páramo, 2020. "Long-Term Effects of a Cognitive Behavioral Conference Call Intervention on Depression in Non-Professional Caregivers," IJERPH, MDPI, vol. 17(22), pages 1-24, November.
    4. Seiler, Christian & Heumann, Christian, 2013. "Microdata imputations and macrodata implications: Evidence from the Ifo Business Survey," Economic Modelling, Elsevier, vol. 35(C), pages 722-733.
    5. Ihle, Rico & Rubin, Ofir D., 2012. "Price Transmission Subject to Security‐based Trade Barriers in the Context of the Israeli‐Palestinian Conflict," 2012 Conference, August 18-24, 2012, Foz do Iguacu, Brazil 125392, International Association of Agricultural Economists.
    6. Nicklas Pettersson, 2013. "Bias reduction of finite population imputation by kernel methods," Statistics in Transition new series, Główny Urząd Statystyczny (Polska), vol. 14(1), pages 139-160, March.
    7. Cohen, Joseph N, 2010. "Neoliberalism’s relationship with economic growth in the developing world: Was it the power of the market or the resolution of financial crisis?," MPRA Paper 24399, University Library of Munich, Germany.
    8. Roman Matkovskyy, 2016. "A comparison of pre- and post-crisis efficiency of OECD countries: evidence from a model with temporal heterogeneity in time and unobservable individual effect," European Journal of Comparative Economics, Cattaneo University (LIUC), vol. 13(2), pages 135-167, December.
    9. Catherine Norman, 2009. "Rule of Law and the Resource Curse: Abundance Versus Intensity," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 43(2), pages 183-207, June.
    10. Ann Bostrom & Adam L. Hayes & Katherine M. Crosman, 2019. "Efficacy, Action, and Support for Reducing Climate Change Risks," Risk Analysis, John Wiley & Sons, vol. 39(4), pages 805-828, April.
    11. Christian Seiler, 2013. "Nonresponse in Business Tendency Surveys: Theoretical Discourse and Empirical Evidence," ifo Beiträge zur Wirtschaftsforschung, ifo Institute - Leibniz Institute for Economic Research at the University of Munich, number 52.
    12. Cheng, Xiaoyue & Cook, Dianne & Hofmann, Heike, 2015. "Visually Exploring Missing Values in Multivariable Data Using a Graphical User Interface," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 68(i06).
    13. Mathur, Maya B & Shpitser, Ilya, 2024. "Pitfalls of imputing using incomplete auxiliary variables," OSF Preprints c3zrh, Center for Open Science.
    14. Talebian, Ahmadreza & Zou, Bo & Hansen, Mark, 2018. "Assessing the impacts of state-supported rail services on local population and employment: A California case study," Transport Policy, Elsevier, vol. 63(C), pages 108-121.
    15. Ahmad R. Alsaber & Jiazhu Pan & Adeeba Al-Hurban, 2021. "Handling Complex Missing Data Using Random Forest Approach for an Air Quality Monitoring Dataset: A Case Study of Kuwait Environmental Data (2012 to 2018)," IJERPH, MDPI, vol. 18(3), pages 1-25, February.
    16. Josse, Julie & Husson, François, 2016. "missMDA: A Package for Handling Missing Values in Multivariate Data Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 70(i01).
    17. Kowarik, Alexander & Templ, Matthias, 2016. "Imputation with the R Package VIM," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 74(i07).
    18. Eriko Miyama & Shunsuke Managi, 2014. "Global environmental emissions estimate: application of multiple imputation," Environmental Economics and Policy Studies, Springer;Society for Environmental Economics and Policy Studies - SEEPS, vol. 16(2), pages 115-135, April.
    19. Iacus, Stefano & King, Gary & Porro, Giuseppe, 2009. "cem: Software for Coarsened Exact Matching," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 30(i09).
    20. Simon Grund & Oliver Lüdtke & Alexander Robitzsch, 2021. "On the Treatment of Missing Data in Background Questionnaires in Educational Large-Scale Assessments: An Evaluation of Different Procedures," Journal of Educational and Behavioral Statistics, , vol. 46(4), pages 430-465, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jstats:v:7:y:2024:i:4:p:82-1420:d:1528246. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.