IDEAS home Printed from https://ideas.repec.org/a/gam/jijerp/v18y2021i5p2679-d512214.html
   My bibliography  Save this article

Predicting Survival in Veterans with Follicular Lymphoma Using Structured Electronic Health Record Information and Machine Learning

Author

Listed:
  • Chunyang Li

    (Veritas, Division of Epidemiology, School of Medicine, University of Utah, Salt Lake City, UT 84112, USA
    George E. Wahlen Veterans Health Administration, Salt Lake City, UT 84148, USA)

  • Vikas Patil

    (Veritas, Division of Epidemiology, School of Medicine, University of Utah, Salt Lake City, UT 84112, USA
    George E. Wahlen Veterans Health Administration, Salt Lake City, UT 84148, USA)

  • Kelli M. Rasmussen

    (Veritas, Division of Epidemiology, School of Medicine, University of Utah, Salt Lake City, UT 84112, USA
    George E. Wahlen Veterans Health Administration, Salt Lake City, UT 84148, USA)

  • Christina Yong

    (Veritas, Division of Epidemiology, School of Medicine, University of Utah, Salt Lake City, UT 84112, USA
    George E. Wahlen Veterans Health Administration, Salt Lake City, UT 84148, USA)

  • Hsu-Chih Chien

    (Veritas, Division of Epidemiology, School of Medicine, University of Utah, Salt Lake City, UT 84112, USA
    George E. Wahlen Veterans Health Administration, Salt Lake City, UT 84148, USA)

  • Debbie Morreall

    (Veritas, Division of Epidemiology, School of Medicine, University of Utah, Salt Lake City, UT 84112, USA
    George E. Wahlen Veterans Health Administration, Salt Lake City, UT 84148, USA)

  • Jeffrey Humpherys

    (Veritas, Division of Epidemiology, School of Medicine, University of Utah, Salt Lake City, UT 84112, USA
    George E. Wahlen Veterans Health Administration, Salt Lake City, UT 84148, USA)

  • Brian C. Sauer

    (Veritas, Division of Epidemiology, School of Medicine, University of Utah, Salt Lake City, UT 84112, USA
    George E. Wahlen Veterans Health Administration, Salt Lake City, UT 84148, USA)

  • Zachary Burningham

    (Veritas, Division of Epidemiology, School of Medicine, University of Utah, Salt Lake City, UT 84112, USA
    George E. Wahlen Veterans Health Administration, Salt Lake City, UT 84148, USA)

  • Ahmad S. Halwani

    (Veritas, Division of Epidemiology, School of Medicine, University of Utah, Salt Lake City, UT 84112, USA
    George E. Wahlen Veterans Health Administration, Salt Lake City, UT 84148, USA
    Hematology & Hematologic Malignancies, Huntsman Cancer Institute, Salt Lake City, UT 84112, USA)

Abstract

The most accurate prognostic approach for follicular lymphoma (FL), progression of disease at 24 months (POD24), requires two years’ observation after initiating first-line therapy (L1) to predict outcomes. We applied machine learning to structured electronic health record (EHR) data to predict individual survival at L1 initiation. We grouped 523 observations and 1933 variables from a nationwide cohort of FL patients diagnosed 2006–2014 in the Veterans Health Administration into traditionally used prognostic variables (“curated”), commonly measured labs (“labs”), and International Classification of Diseases diagnostic codes (“ICD”) sets. We compared performance of random survival forests (RSF) vs. traditional Cox model using four datasets: curated, curated + labs, curated + ICD, and curated + ICD + labs, also using Cox on curated + POD24. We evaluated variable importance and partial dependence plots with area under the receiver operating characteristic curve (AUC). RSF with curated + labs performed best, with mean AUC 0.73 (95% CI: 0.71–0.75). It approximated, but did not surpass, Cox with POD24 (mean AUC 0.74 [95% CI: 0.71–0.77]). RSF using EHR data achieved better performance than traditional prognostic variables, setting the foundation for the incorporation of our algorithm into the EHR. It also provides for possible future scenarios in which clinicians could be provided an EHR-based tool which approximates the predictive ability of the most accurate known indicator, using information available 24 months earlier.

Suggested Citation

  • Chunyang Li & Vikas Patil & Kelli M. Rasmussen & Christina Yong & Hsu-Chih Chien & Debbie Morreall & Jeffrey Humpherys & Brian C. Sauer & Zachary Burningham & Ahmad S. Halwani, 2021. "Predicting Survival in Veterans with Follicular Lymphoma Using Structured Electronic Health Record Information and Machine Learning," IJERPH, MDPI, vol. 18(5), pages 1-19, March.
  • Handle: RePEc:gam:jijerp:v:18:y:2021:i:5:p:2679-:d:512214
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1660-4601/18/5/2679/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1660-4601/18/5/2679/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Monica A Konerman & Lauren A Beste & Tony Van & Boang Liu & Xuefei Zhang & Ji Zhu & Sameer D Saini & Grace L Su & Brahmajee K Nallamothu & George N Ioannou & Akbar K Waljee, 2019. "Machine learning models to predict disease progression among veterans with hepatitis C virus," PLOS ONE, Public Library of Science, vol. 14(1), pages 1-14, January.
    2. Mogensen, Ulla B. & Ishwaran, Hemant & Gerds, Thomas A., 2012. "Evaluating Random Forests for Survival Analysis Using Prediction Error Curves," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 50(i11).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bayu Adhi Tama & Sunghoon Lim, 2020. "A Comparative Performance Evaluation of Classification Algorithms for Clinical Decision Support Systems," Mathematics, MDPI, vol. 8(10), pages 1-25, October.
    2. Aizawa, Toshiaki, 2021. "Inequality of opportunity in infant mortality in South Asia: A decomposition analysis of survival data," Economics & Human Biology, Elsevier, vol. 43(C).
    3. Kamaryn T. Tanner & Linda D. Sharples & Rhian M. Daniel & Ruth H. Keogh, 2021. "Dynamic survival prediction combining landmarking with a machine learning ensemble: Methodology and empirical comparison," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(1), pages 3-30, January.
    4. Arfan Raheen Afzal & Jing Yang & Xuewen Lu, 2021. "Variable selection in partially linear additive hazards model with grouped covariates and a diverging number of parameters," Computational Statistics, Springer, vol. 36(2), pages 829-855, June.
    5. Hoora Moradian & Denis Larocque & François Bellavance, 2017. "$$L_1$$ L 1 splitting rules in survival forests," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 23(4), pages 671-691, October.
    6. Mike Jones & George Collier & David J. Reinkensmeyer & Frank DeRuyter & John Dzivak & Daniel Zondervan & John Morris, 2020. "Big Data Analytics and Sensor-Enhanced Activity Management to Improve Effectiveness and Efficiency of Outpatient Medical Rehabilitation," IJERPH, MDPI, vol. 17(3), pages 1-13, January.
    7. Lore Zumeta-Olaskoaga & Maximilian Weigert & Jon Larruskain & Eder Bikandi & Igor Setuain & Josean Lekue & Helmut Küchenhoff & Dae-Jin Lee, 2023. "Prediction of sports injuries in football: a recurrent time-to-event approach using regularized Cox models," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 107(1), pages 101-126, March.
    8. Zhengnan Huang & Hongjiu Zhang & Jonathan Boss & Stephen A Goutman & Bhramar Mukherjee & Ivo D Dinov & Yuanfang Guan & for the Pooled Resource Open-Access ALS Clinical Trials Consortium, 2017. "Complete hazard ranking to analyze right-censored data: An ALS survival study," PLOS Computational Biology, Public Library of Science, vol. 13(12), pages 1-21, December.
    9. Wang, Shikun & Li, Zhao & Lan, Lan & Zhao, Jieyi & Zheng, W. Jim & Li, Liang, 2022. "GPU accelerated estimation of a shared random effect joint model for dynamic prediction," Computational Statistics & Data Analysis, Elsevier, vol. 174(C).
    10. Julia Gilhodes & Florence Dalenc & Jocelyn Gal & Christophe Zemmour & Eve Leconte & Jean Marie Boher & Thomas Filleron, 2020. "Comparison of Variable Selection Methods for Time-to-Event Data in High-Dimensional Settings," Post-Print hal-02934793, HAL.
    11. Mikulec Artur & Misztal Małgorzata, 2018. "Does the Type of Business Activity and the Enterprise Location Affect a Firm’S Survival? Results of an Analysis for Natural Persons Conducting Economic Activity in the Łódzkie Voivodship," Econometrics. Advances in Applied Data Analysis, Sciendo, vol. 22(3), pages 23-40, September.
    12. Gauss M. Cordeiro & Elisângela C. Biazatti & Luís H. de Santana, 2023. "A New Extended Weibull Distribution with Application to Influenza and Hepatitis Data," Stats, MDPI, vol. 6(2), pages 1-17, May.
    13. Sill, Martin & Hielscher, Thomas & Becker, Natalia & Zucknick, Manuela, 2014. "c060: Extended Inference with Lasso and Elastic-Net Regularized Cox and Generalized Linear Models," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 62(i05).
    14. Heidi Seibold & Christoph Bernau & Anne-Laure Boulesteix & Riccardo De Bin, 2018. "On the choice and influence of the number of boosting steps for high-dimensional linear Cox-models," Computational Statistics, Springer, vol. 33(3), pages 1195-1215, September.
    15. Chu Dani & Swartz Tim B., 2020. "Foul accumulation in the NBA," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 16(4), pages 301-309, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jijerp:v:18:y:2021:i:5:p:2679-:d:512214. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.