IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0179805.html
   My bibliography  Save this article

Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project

Author

Listed:
  • Manal Alghamdi
  • Mouaz Al-Mallah
  • Steven Keteyian
  • Clinton Brawner
  • Jonathan Ehrman
  • Sherif Sakr

Abstract

Machine learning is becoming a popular and important approach in the field of medical research. In this study, we investigate the relative performance of various machine learning methods such as Decision Tree, Naïve Bayes, Logistic Regression, Logistic Model Tree and Random Forests for predicting incident diabetes using medical records of cardiorespiratory fitness. In addition, we apply different techniques to uncover potential predictors of diabetes. This FIT project study used data of 32,555 patients who are free of any known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 5-year follow-up. At the completion of the fifth year, 5,099 of those patients have developed diabetes. The dataset contained 62 attributes classified into four categories: demographic characteristics, disease history, medication use history, and stress test vital signs. We developed an Ensembling-based predictive model using 13 attributes that were selected based on their clinical importance, Multiple Linear Regression, and Information Gain Ranking methods. The negative effect of the imbalance class of the constructed model was handled by Synthetic Minority Oversampling Technique (SMOTE). The overall performance of the predictive model classifier was improved by the Ensemble machine learning approach using the Vote method with three Decision Trees (Naïve Bayes Tree, Random Forest, and Logistic Model Tree) and achieved high accuracy of prediction (AUC = 0.92). The study shows the potential of ensembling and SMOTE approaches for predicting incident diabetes using cardiorespiratory fitness data.

Suggested Citation

  • Manal Alghamdi & Mouaz Al-Mallah & Steven Keteyian & Clinton Brawner & Jonathan Ehrman & Sherif Sakr, 2017. "Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project," PLOS ONE, Public Library of Science, vol. 12(7), pages 1-15, July.
  • Handle: RePEc:plo:pone00:0179805
    DOI: 10.1371/journal.pone.0179805
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0179805
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0179805&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0179805?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ram D. Joshi & Chandra K. Dhakal, 2021. "Predicting Type 2 Diabetes Using Logistic Regression and Machine Learning Approaches," IJERPH, MDPI, vol. 18(14), pages 1-17, July.
    2. David Harvey & Wessel Valkenburg & Amara Amara, 2021. "Predicting malaria epidemics in Burkina Faso with machine learning," PLOS ONE, Public Library of Science, vol. 16(6), pages 1-16, June.
    3. Wei-Ming Luo & Jing-Yang Su & Tong Xu & Zhong-Ze Fang, 2023. "Prevalence of Diabetic Retinopathy and Use of Common Oral Hypoglycemic Agents Increase the Risk of Diabetic Nephropathy—A Cross-Sectional Study in Patients with Type 2 Diabetes," IJERPH, MDPI, vol. 20(5), pages 1-13, March.
    4. Yi-Ching Lynn Ho & Vivian Shu Yi Lee & Moon-Ho Ringo Ho & Gladis Jing Lin & Julian Thumboo, 2021. "Towards a Parsimonious Pathway Model of Modifiable and Mediating Risk Factors Leading to Diabetes Risk," IJERPH, MDPI, vol. 18(20), pages 1-20, October.
    5. Pin-Wei Chen & Nathan A. Baune & Igor Zwir & Jiayu Wang & Victoria Swamidass & Alex W.K. Wong, 2021. "Measuring Activities of Daily Living in Stroke Patients with Motion Machine Learning Algorithms: A Pilot Study," IJERPH, MDPI, vol. 18(4), pages 1-16, February.
    6. Ying-Jen Chang & Kuo-Chuan Hung & Li-Kai Wang & Chia-Hung Yu & Chao-Kun Chen & Hung-Tze Tay & Jhi-Joung Wang & Chung-Feng Liu, 2021. "A Real-Time Artificial Intelligence-Assisted System to Predict Weaning from Ventilator Immediately after Lung Resection Surgery," IJERPH, MDPI, vol. 18(5), pages 1-14, March.
    7. Sharan Srinivas, 2020. "A Machine Learning-Based Approach for Predicting Patient Punctuality in Ambulatory Care Centers," IJERPH, MDPI, vol. 17(10), pages 1-15, May.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0179805. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.