IDEAS home Printed from https://ideas.repec.org/a/gam/jijerp/v19y2022i22p15027-d973323.html
   My bibliography  Save this article

Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type

Author

Listed:
  • Yifan Qin

    (College of Physical Education, Shenzhen University, Shenzhen 518000, China
    The authors contributed equally to this work.)

  • Jinlong Wu

    (College of Physical Education, Southwest University, Chongqing 400715, China
    The authors contributed equally to this work.)

  • Wen Xiao

    (College of Physical Education, Shenzhen University, Shenzhen 518000, China)

  • Kun Wang

    (Physical Education College, Yanching Institute of Technology, Langfang 065201, China)

  • Anbing Huang

    (College of Physical Education, Shenzhen University, Shenzhen 518000, China)

  • Bowen Liu

    (College of Physical Education, Shenzhen University, Shenzhen 518000, China)

  • Jingxuan Yu

    (College of Physical Education, Shenzhen University, Shenzhen 518000, China)

  • Chuhao Li

    (College of Physical Education, Shenzhen University, Shenzhen 518000, China)

  • Fengyu Yu

    (College of Physical Education, Shenzhen University, Shenzhen 518000, China)

  • Zhanbing Ren

    (College of Physical Education, Shenzhen University, Shenzhen 518000, China)

Abstract

The prevalence of diabetes has been increasing in recent years, and previous research has found that machine-learning models are good diabetes prediction tools. The purpose of this study was to compare the efficacy of five different machine-learning models for diabetes prediction using lifestyle data from the National Health and Nutrition Examination Survey (NHANES) database. The 1999–2020 NHANES database yielded data on 17,833 individuals data based on demographic characteristics and lifestyle-related variables. To screen training data for machine models, the Akaike Information Criterion (AIC) forward propagation algorithm was utilized. For predicting diabetes, five machine-learning models (CATBoost, XGBoost, Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM)) were developed. Model performance was evaluated using accuracy, sensitivity, specificity, precision, F1 score, and receiver operating characteristic (ROC) curve. Among the five machine-learning models, the dietary intake levels of energy, carbohydrate, and fat, contributed the most to the prediction of diabetes patients. In terms of model performance, CATBoost ranks higher than RF, LG, XGBoost, and SVM. The best-performing machine-learning model among the five is CATBoost, which achieves an accuracy of 82.1% and an AUC of 0.83. Machine-learning models based on NHANES data can assist medical institutions in identifying diabetes patients.

Suggested Citation

  • Yifan Qin & Jinlong Wu & Wen Xiao & Kun Wang & Anbing Huang & Bowen Liu & Jingxuan Yu & Chuhao Li & Fengyu Yu & Zhanbing Ren, 2022. "Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type," IJERPH, MDPI, vol. 19(22), pages 1-16, November.
  • Handle: RePEc:gam:jijerp:v:19:y:2022:i:22:p:15027-:d:973323
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1660-4601/19/22/15027/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1660-4601/19/22/15027/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Rosy Oh & Hong Kyu Lee & Youngmi Kim Pak & Man-Suk Oh, 2022. "An Interactive Online App for Predicting Diabetes via Machine Learning from Environment-Polluting Chemical Exposure Data," IJERPH, MDPI, vol. 19(10), pages 1-17, May.
    2. Swati V. Narwane & Sudhir D. Sawarkar, 2021. "Effects of Class Imbalance Using Machine Learning Algorithms: Case Study Approach," International Journal of Applied Evolutionary Computation (IJAEC), IGI Global, vol. 12(1), pages 1-17, January.
    3. Govinda R. Poudel & Anthony Barnett & Muhammad Akram & Erika Martino & Luke D. Knibbs & Kaarin J. Anstey & Jonathan E. Shaw & Ester Cerin, 2022. "Machine Learning for Prediction of Cognitive Health in Adults Using Sociodemographic, Neighbourhood Environmental, and Lifestyle Factors," IJERPH, MDPI, vol. 19(17), pages 1-14, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Sergio Celada-Bernal & Guillermo Pérez-Acosta & Carlos M. Travieso-González & José Blanco-López & Luciano Santana-Cabrera, 2023. "Applying Neural Networks to Recover Values of Monitoring Parameters for COVID-19 Patients in the ICU," Mathematics, MDPI, vol. 11(15), pages 1-19, July.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.

      Corrections

      All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jijerp:v:19:y:2022:i:22:p:15027-:d:973323. See general information about how to correct material in RePEc.

      If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

      If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

      If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

      For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

      Please note that corrections may take a couple of weeks to filter through the various RePEc services.

      IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.