IDEAS home Printed from https://ideas.repec.org/a/plo/pntd00/0006262.html
   My bibliography  Save this article

Comparison of three data mining models for prediction of advanced schistosomiasis prognosis in the Hubei province

Author

Listed:
  • Guo Li
  • Xiaorong Zhou
  • Jianbing Liu
  • Yuanqi Chen
  • Hengtao Zhang
  • Yanyan Chen
  • Jianhua Liu
  • Hongbo Jiang
  • Junjing Yang
  • Shaofa Nie

Abstract

Background: In order to better assist medical professionals, this study aimed to develop and compare the performance of three models—a multivariate logistic regression (LR) model, an artificial neural network (ANN) model, and a decision tree (DT) model—to predict the prognosis of patients with advanced schistosomiasis residing in the Hubei province. Methodology/Principal findings: Schistosomiasis surveillance data were collected from a previous study based on a Hubei population sample including 4136 advanced schistosomiasis cases. The predictive models use LR, ANN, and DT methods. From each of the three groups, 70% of the cases (2896 cases) were used as training data for the predictive models. The remaining 30% of the cases (1240 cases) were used as validation groups for performance comparisons between the three models. Prediction performance was evaluated using area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and accuracy. Univariate analysis indicated that 16 risk factors were significantly associated with a patient’s outcome of prognosis. In the training group, the mean AUC was 0.8276 for LR, 0.9267 for ANN, and 0.8229 for DT. In the validation group, the mean AUC was 0.8349 for LR, 0.8318 for ANN, and 0.8148 for DT. The three models yielded similar results in terms of accuracy, sensitivity, and specificity. Conclusions/Significance: Predictive models for advanced schistosomiasis prognosis, respectively using LR, ANN and DT models were proved to be effective approaches based on our dataset. The ANN model outperformed the LR and DT models in terms of AUC. Author summary: Worldwide, approximately 240 million individuals are infected with schistosomiasis, a parasitic neglected tropical disease that continues to be a significant cause of morbidity and mortality, especially in China. Effective tools that can accurately predict the prognosis of patients with advanced schistosomiasis would aid in the treatment and management of the disease. To this end, we constructed and compared the performance of three predictive models—an artificial neural network (ANN) model, a logistic regression (LR) model and a decision tree (DT) model—in their ability to predict the prognosis of patients with advanced schistosomiasis. We found that while all three models proved effective, the ANN model outperformed the LR and DT models in terms of AUC and sensitivity. Yet, to achieve the highest level of prediction accuracy and to better assist medical professionals, we recommend comparing the performance of the three predictive models to select the optimal one, which will be better than select a model at random. The findings of this study not only provide valuable information on the construction of effective predictive models for the prognosis of advanced schistosomiasis, but also offer new methodology for clinically determining patient diagnosis and prognosis.

Suggested Citation

  • Guo Li & Xiaorong Zhou & Jianbing Liu & Yuanqi Chen & Hengtao Zhang & Yanyan Chen & Jianhua Liu & Hongbo Jiang & Junjing Yang & Shaofa Nie, 2018. "Comparison of three data mining models for prediction of advanced schistosomiasis prognosis in the Hubei province," PLOS Neglected Tropical Diseases, Public Library of Science, vol. 12(2), pages 1-19, February.
  • Handle: RePEc:plo:pntd00:0006262
    DOI: 10.1371/journal.pntd.0006262
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0006262
    Download Restriction: no

    File URL: https://journals.plos.org/plosntds/article/file?id=10.1371/journal.pntd.0006262&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pntd.0006262?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Vahid Habibi & Hasan Ahmadi & Mohammad Jafari & Abolfazl Moeini, 2019. "Application of nonlinear models and groundwater index to predict desertification case study: Sharifabad watershed," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 99(2), pages 715-733, November.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pntd00:0006262. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosntds (email available below). General contact details of provider: https://journals.plos.org/plosntds/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.