IDEAS home Printed from https://ideas.repec.org/a/wsi/fracta/v31y2023i06ns0218348x23401060.html
   My bibliography  Save this article

A Diabetes Risk Predicting Method With Multi-Strategy Counterfactual-Based Data Augmentation

Author

Listed:
  • CHEN WANG

    (School of Computer Science and Engineering, Northeastern University, Shenyang 110004, P. R. China†Neusoft Institute of Intelligent Medical Research, Shenyang 110179, P. R. China)

  • YAN-YI LIU

    (School of Computer Science and Engineering, Northeastern University, Shenyang 110004, P. R. China)

  • ZHAO-SHUO DIAO

    (School of Computer Science and Engineering, Northeastern University, Shenyang 110004, P. R. China)

  • JIA-WEI TANG

    (School of Computer Science and Engineering, Northeastern University, Shenyang 110004, P. R. China)

  • YING-YOU WEN

    (School of Computer Science and Engineering, Northeastern University, Shenyang 110004, P. R. China†Neusoft Institute of Intelligent Medical Research, Shenyang 110179, P. R. China)

  • XIAO-TAO YANG

    (��The First Affiliated Hospital of China Medical University, Shenyang 110001, P. R. China)

Abstract

Diabetes is a chronic disease that poses a serious threat to health, and its early risk prediction has been a hot research topic in the field of medical artificial intelligence. Routine medical checkups are the most common way to monitor people’s health status, and the data from medical checkups contain rich diagnostic information, which is valuable for diabetes risk prediction. Currently, most of the available studies on diabetes risk prediction are based on publicly available datasets, and the models and algorithms do not work well on real clinical datasets. Real routine checkup data are characterized by complex information, diverse features, high redundancy and poor balance, which pose great challenges for diabetes risk prediction. To address this problem, this paper proposes a multi-strategy data augmentation-based diabetes risk prediction method, after completing data pre-processing and feature selection, a counterfactual-based data balancing strategy is used to augment a minority class of instances, and a density clustering-based supplemental counterfactual data augmentation strategy is proposed to address the problem of insufficient representation of generated instances in the counterfactual method. Moreover, the uncertainty-weighted method is used in the model training phase. Based on the real checkup dataset, five machine learning methods including Logistic Regression (LR), SVM, Decision Tree, Random Forest and Gradient Boosting are used to model and use 5-fold cross-validation to carry out diabetes risk assessment and prediction. The experimental results showed that the sensitivity and precision of the models were significantly improved compared with the existing methods, and the sensitivity of the LR model for diabetes risk prediction on the real routine checkup dataset reached more than 90%, which meet the requirements of clinical application.

Suggested Citation

  • Chen Wang & Yan-Yi Liu & Zhao-Shuo Diao & Jia-Wei Tang & Ying-You Wen & Xiao-Tao Yang, 2023. "A Diabetes Risk Predicting Method With Multi-Strategy Counterfactual-Based Data Augmentation," FRACTALS (fractals), World Scientific Publishing Co. Pte. Ltd., vol. 31(06), pages 1-17.
  • Handle: RePEc:wsi:fracta:v:31:y:2023:i:06:n:s0218348x23401060
    DOI: 10.1142/S0218348X23401060
    as

    Download full text from publisher

    File URL: http://www.worldscientific.com/doi/abs/10.1142/S0218348X23401060
    Download Restriction: Access to full text is restricted to subscribers

    File URL: https://libkey.io/10.1142/S0218348X23401060?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wsi:fracta:v:31:y:2023:i:06:n:s0218348x23401060. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Tai Tone Lim (email available below). General contact details of provider: https://www.worldscientific.com/worldscinet/fractals .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.