IDEAS home Printed from https://ideas.repec.org/a/gam/jstats/v7y2024i4p82-1420d1528246.html
   My bibliography  Save this article

Evaluating Imputation Methods to Improve Prediction Accuracy for an HIV Study in Uganda

Author

Listed:
  • Nadia B. Mendoza

    (Department of Mathematics and Statistics, San Diego State University, San Diego, CA 92182, USA)

  • Chii-Dean Lin

    (Department of Mathematics and Statistics, San Diego State University, San Diego, CA 92182, USA)

  • Susan M. Kiene

    (Department of Disease Control and Environmental Health, Makerere University School of Public Health, Kampala P.O. Box 7072, Uganda
    Division of Epidemiology and Biostatistics, San Diego State University School of Public Health, San Diego, CA 92182, USA)

  • Nicolas A. Menzies

    (Department of Global Health and Population, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA)

  • Rhoda K. Wanyenze

    (Department of Disease Control and Environmental Health, Makerere University School of Public Health, Kampala P.O. Box 7072, Uganda
    Division of Epidemiology and Biostatistics, San Diego State University School of Public Health, San Diego, CA 92182, USA)

  • Katherine A. Schmarje

    (Division of Epidemiology and Biostatistics, San Diego State University School of Public Health, San Diego, CA 92182, USA)

  • Rose Naigino

    (Department of Disease Control and Environmental Health, Makerere University School of Public Health, Kampala P.O. Box 7072, Uganda
    Division of Epidemiology and Biostatistics, San Diego State University School of Public Health, San Diego, CA 92182, USA)

  • Michael Ediau

    (Division of Epidemiology and Biostatistics, San Diego State University School of Public Health, San Diego, CA 92182, USA
    Department of Health Policy, Planning and Management, Makerere University School of Public Health, Kampala P.O. Box 7072, Uganda)

  • Seth C. Kalichman

    (Institute for Collaboration on Health, Intervention and Policy, University of Connecticut, Storrs, CT 06269, USA)

  • Barbara A. Bailey

    (Department of Mathematics and Statistics, San Diego State University, San Diego, CA 92182, USA)

Abstract

Standard statistical analyses often exclude incomplete observations, which can be particularly problematic when predicting rare outcomes, such as HIV positivity. In the linkage to the HIV care dataset, there were initially 553 complete HIV positive cases, with an additional 554 cases added through imputation. Imputation methods amelia , hmisc , mice and missForest were evaluated. Simulations were conducted across various scenarios using the complete data to guide imputation for the full dataset. A random forest model was used to predict HIV status, assessing imputation precision, overall prediction accuracy, and sensitivity. While missForest produced imputed values closer to the observed ones, this did not translate into better predictive models. Hmisc and mice imputations led to higher prediction accuracy and sensitivity, with median accuracy increasing from 64% to 76% and median sensitivity rising from 0.4 to 0.75. Hmisc and amelia were the fastest imputation methods. Additionally, oversampling the minority class combined with undersampling the majority class did not improve predictions of new HIV positive cases using only the complete observations. However, increasing the minority class information through imputation enhanced sensitivity for predicting cases in this class.

Suggested Citation

  • Nadia B. Mendoza & Chii-Dean Lin & Susan M. Kiene & Nicolas A. Menzies & Rhoda K. Wanyenze & Katherine A. Schmarje & Rose Naigino & Michael Ediau & Seth C. Kalichman & Barbara A. Bailey, 2024. "Evaluating Imputation Methods to Improve Prediction Accuracy for an HIV Study in Uganda," Stats, MDPI, vol. 7(4), pages 1-16, November.
  • Handle: RePEc:gam:jstats:v:7:y:2024:i:4:p:82-1420:d:1528246
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2571-905X/7/4/82/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2571-905X/7/4/82/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. van Buuren, Stef & Groothuis-Oudshoorn, Karin, 2011. "mice: Multivariate Imputation by Chained Equations in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i03).
    2. Honaker, James & King, Gary & Blackwell, Matthew, 2011. "Amelia II: A Program for Missing Data," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i07).
    3. King, Gary & Honaker, James & Joseph, Anne & Scheve, Kenneth, 2001. "Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation," American Political Science Review, Cambridge University Press, vol. 95(1), pages 49-69, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. World Bank & Organisation for Economic Co-operation and Development, 2017. "A Step Ahead," World Bank Publications - Books, The World Bank Group, number 27527.
    2. Cohen, Joseph N, 2010. "Neoliberalism’s relationship with economic growth in the developing world: Was it the power of the market or the resolution of financial crisis?," MPRA Paper 24527, University Library of Munich, Germany.
    3. Wurriehausen, Nadine & Ihle, Rico & Lakner, Sebastian, 2011. "The Integration of the Conventional and Organic Wheat Market," 2011 International Congress, August 30-September 2, 2011, Zurich, Switzerland 115784, European Association of Agricultural Economists.
    4. Lara Lopez & Fernando L. Vázquez & Ángela J. Torres & Patricia Otero & Vanessa Blanco & Olga Díaz & Mario Páramo, 2020. "Long-Term Effects of a Cognitive Behavioral Conference Call Intervention on Depression in Non-Professional Caregivers," IJERPH, MDPI, vol. 17(22), pages 1-24, November.
    5. Seiler, Christian & Heumann, Christian, 2013. "Microdata imputations and macrodata implications: Evidence from the Ifo Business Survey," Economic Modelling, Elsevier, vol. 35(C), pages 722-733.
    6. Schoemaker, Nikita K. & Juffer, Femmie & Rippe, Ralph C.A. & Vermeer, Harriet J. & Stoltenborgh, Marije & Jagersma, Gabrine J. & Maras, Athanasios & Alink, Lenneke R.A., 2020. "Positive parenting in foster care: Testing the effectiveness of a video-feedback intervention program on foster parents’ behavior and attitudes," Children and Youth Services Review, Elsevier, vol. 110(C).
    7. Ihle, Rico & Rubin, Ofir D., 2012. "Price Transmission Subject to Security‐based Trade Barriers in the Context of the Israeli‐Palestinian Conflict," 2012 Conference, August 18-24, 2012, Foz do Iguacu, Brazil 125392, International Association of Agricultural Economists.
    8. Jue Yang & Shunsuke Managi & Masayuki Sato, 2015. "The effect of institutional quality on national wealth: an examination using multiple imputation method," Environmental Economics and Policy Studies, Springer;Society for Environmental Economics and Policy Studies - SEEPS, vol. 17(3), pages 431-453, July.
    9. Nicklas Pettersson, 2013. "Bias reduction of finite population imputation by kernel methods," Statistics in Transition new series, Główny Urząd Statystyczny (Polska), vol. 14(1), pages 139-160, March.
    10. Nicholas Tierney & Dianne Cook, 2018. "Expanding tidy data principles to facilitate missing data exploration, visualization and assessment of imputations," Monash Econometrics and Business Statistics Working Papers 14/18, Monash University, Department of Econometrics and Business Statistics.
    11. Thelma Dede Baddoo & Zhijia Li & Samuel Nii Odai & Kenneth Rodolphe Chabi Boni & Isaac Kwesi Nooni & Samuel Ato Andam-Akorful, 2021. "Comparison of Missing Data Infilling Mechanisms for Recovering a Real-World Single Station Streamflow Observation," IJERPH, MDPI, vol. 18(16), pages 1-26, August.
    12. Cohen, Joseph N, 2010. "Neoliberalism’s relationship with economic growth in the developing world: Was it the power of the market or the resolution of financial crisis?," MPRA Paper 24399, University Library of Munich, Germany.
    13. Roman Matkovskyy, 2016. "A comparison of pre- and post-crisis efficiency of OECD countries: evidence from a model with temporal heterogeneity in time and unobservable individual effect," European Journal of Comparative Economics, Cattaneo University (LIUC), vol. 13(2), pages 135-167, December.
    14. Catherine Norman, 2009. "Rule of Law and the Resource Curse: Abundance Versus Intensity," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 43(2), pages 183-207, June.
    15. Ann Bostrom & Adam L. Hayes & Katherine M. Crosman, 2019. "Efficacy, Action, and Support for Reducing Climate Change Risks," Risk Analysis, John Wiley & Sons, vol. 39(4), pages 805-828, April.
    16. Christian Seiler, 2013. "Nonresponse in Business Tendency Surveys: Theoretical Discourse and Empirical Evidence," ifo Beiträge zur Wirtschaftsforschung, ifo Institute - Leibniz Institute for Economic Research at the University of Munich, number 52.
    17. Cheng, Xiaoyue & Cook, Dianne & Hofmann, Heike, 2015. "Visually Exploring Missing Values in Multivariable Data Using a Graphical User Interface," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 68(i06).
    18. Mathur, Maya B & Shpitser, Ilya, 2024. "Pitfalls of imputing using incomplete auxiliary variables," OSF Preprints c3zrh, Center for Open Science.
    19. Eriko Miyama & Shunsuke Managi, 2014. "Global environmental emissions estimate: application of multiple imputation," Environmental Economics and Policy Studies, Springer;Society for Environmental Economics and Policy Studies - SEEPS, vol. 16(2), pages 115-135, April.
    20. Talebian, Ahmadreza & Zou, Bo & Hansen, Mark, 2018. "Assessing the impacts of state-supported rail services on local population and employment: A California case study," Transport Policy, Elsevier, vol. 63(C), pages 108-121.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jstats:v:7:y:2024:i:4:p:82-1420:d:1528246. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.