IDEAS home Printed from https://ideas.repec.org/a/gam/jijerp/v20y2023i5p3910-d1076934.html
   My bibliography  Save this article

Machine Learning Prediction Model of Tuberculosis Incidence Based on Meteorological Factors and Air Pollutants

Author

Listed:
  • Na Tang

    (The Key Laboratory of Model Animals and Stem Cell Biology in Hunan Province, School of Medicine, Hunan Normal University, Changsha 410013, China)

  • Maoxiang Yuan

    (Changde Center for Disease Control and Prevention, Changde 415000, China)

  • Zhijun Chen

    (The Key Laboratory of Model Animals and Stem Cell Biology in Hunan Province, School of Medicine, Hunan Normal University, Changsha 410013, China)

  • Jian Ma

    (The Key Laboratory of Model Animals and Stem Cell Biology in Hunan Province, School of Medicine, Hunan Normal University, Changsha 410013, China)

  • Rui Sun

    (The Key Laboratory of Model Animals and Stem Cell Biology in Hunan Province, School of Medicine, Hunan Normal University, Changsha 410013, China)

  • Yide Yang

    (The Key Laboratory of Model Animals and Stem Cell Biology in Hunan Province, School of Medicine, Hunan Normal University, Changsha 410013, China)

  • Quanyuan He

    (The Key Laboratory of Model Animals and Stem Cell Biology in Hunan Province, School of Medicine, Hunan Normal University, Changsha 410013, China)

  • Xiaowei Guo

    (The Key Laboratory of Model Animals and Stem Cell Biology in Hunan Province, School of Medicine, Hunan Normal University, Changsha 410013, China)

  • Shixiong Hu

    (Hunan Provincial Center for Disease Control and Prevention, Changsha 410005, China)

  • Junhua Zhou

    (The Key Laboratory of Model Animals and Stem Cell Biology in Hunan Province, School of Medicine, Hunan Normal University, Changsha 410013, China)

Abstract

Background: Tuberculosis (TB) is a public health problem worldwide, and the influence of meteorological and air pollutants on the incidence of tuberculosis have been attracting interest from researchers. It is of great importance to use machine learning to build a prediction model of tuberculosis incidence influenced by meteorological and air pollutants for timely and applicable measures of both prevention and control. Methods: The data of daily TB notifications, meteorological factors and air pollutants in Changde City, Hunan Province ranging from 2010 to 2021 were collected. Spearman rank correlation analysis was conducted to analyze the correlation between the daily TB notifications and the meteorological factors or air pollutants. Based on the correlation analysis results, machine learning methods, including support vector regression, random forest regression and a BP neural network model, were utilized to construct the incidence prediction model of tuberculosis. RMSE, MAE and MAPE were performed to evaluate the constructed model for selecting the best prediction model. Results: (1) From the year 2010 to 2021, the overall incidence of tuberculosis in Changde City showed a downward trend. (2) The daily TB notifications was positively correlated with average temperature (r = 0.231), maximum temperature (r = 0.194), minimum temperature (r = 0.165), sunshine duration (r = 0.329), PM 2.5 (r = 0.097), PM 10 (r = 0.215) and O 3 (r = 0.084) ( p < 0.05). However, there was a significant negative correlation between the daily TB notifications and mean air pressure (r = −0.119), precipitation (r = −0.063), relative humidity (r = −0.084), CO (r = −0.038) and SO 2 (r = −0.034) ( p < 0.05). (3) The random forest regression model had the best fitting effect, while the BP neural network model exhibited the best prediction. (4) The validation set of the BP neural network model, including average daily temperature, sunshine hours and PM 10 , showed the lowest root mean square error, mean absolute error and mean absolute percentage error, followed by support vector regression. Conclusions: The prediction trend of the BP neural network model, including average daily temperature, sunshine hours and PM 10 , successfully mimics the actual incidence, and the peak incidence highly coincides with the actual aggregation time, with a high accuracy and a minimum error. Taken together, these data suggest that the BP neural network model can predict the incidence trend of tuberculosis in Changde City.

Suggested Citation

  • Na Tang & Maoxiang Yuan & Zhijun Chen & Jian Ma & Rui Sun & Yide Yang & Quanyuan He & Xiaowei Guo & Shixiong Hu & Junhua Zhou, 2023. "Machine Learning Prediction Model of Tuberculosis Incidence Based on Meteorological Factors and Air Pollutants," IJERPH, MDPI, vol. 20(5), pages 1-17, February.
  • Handle: RePEc:gam:jijerp:v:20:y:2023:i:5:p:3910-:d:1076934
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1660-4601/20/5/3910/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1660-4601/20/5/3910/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Kim, Younoh & Knowles, Scott & Manley, James & Radoias, Vlad, 2017. "Long-run health consequences of air pollution: Evidence from Indonesia's forest fires of 1997," Economics & Human Biology, Elsevier, vol. 26(C), pages 186-198.
    2. Younoh Kim & Vlad Radoias, 2022. "Severe Air Pollution Exposure and Long-Term Health Outcomes," IJERPH, MDPI, vol. 19(21), pages 1-8, October.
    3. Nick Guenther & Matthias Schonlau, 2016. "Support vector machines," Stata Journal, StataCorp LP, vol. 16(4), pages 917-937, December.
    4. Karatzoglou, Alexandros & Meyer, David & Hornik, Kurt, 2006. "Support Vector Machines in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 15(i09).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Elizabeth C. Heintz & Derek P. Scott & Kolby R. Simms & Jeremy J. Foreman, 2022. "Air Quality Is Predictive of Mistakes in Professional Baseball and American Football," IJERPH, MDPI, vol. 20(1), pages 1-11, December.
    2. Singh, Damini & Gupta, Indrani & Roy, Arjun, 2023. "The association of asthma and air pollution: Evidence from India," Economics & Human Biology, Elsevier, vol. 51(C).
    3. Li, Zhengtao & Hu, Bin, 2018. "Perceived health risk, environmental knowledge, and contingent valuation for improving air quality: New evidence from the Jinchuan mining area in China," Economics & Human Biology, Elsevier, vol. 31(C), pages 54-68.
    4. Paolo Sorino & Maria Gabriella Caruso & Giovanni Misciagna & Caterina Bonfiglio & Angelo Campanella & Antonella Mirizzi & Isabella Franco & Antonella Bianco & Claudia Buongiorno & Rosalba Liuzzi & Ann, 2020. "Selecting the best machine learning algorithm to support the diagnosis of Non-Alcoholic Fatty Liver Disease: A meta learner study," PLOS ONE, Public Library of Science, vol. 15(10), pages 1-15, October.
    5. Benítez-Peña, Sandra & Blanquero, Rafael & Carrizosa, Emilio & Ramírez-Cobo, Pepa, 2024. "Cost-sensitive probabilistic predictions for support vector machines," European Journal of Operational Research, Elsevier, vol. 314(1), pages 268-279.
    6. Moro Russ A. & Härdle Wolfgang K. & Schäfer Dorothea, 2017. "Company rating with support vector machines," Statistics & Risk Modeling, De Gruyter, vol. 34(1-2), pages 55-67, June.
    7. Chris Reimann, 2024. "Predicting financial crises: an evaluation of machine learning algorithms and model explainability for early warning systems," Review of Evolutionary Political Economy, Springer, vol. 5(1), pages 51-83, June.
    8. Thi Phuoc Lai Nguyen & Salvatore G. P. Virdis & Ekbordin Winjikul, 2022. "Inequality of Low Air Quality-Related Health Impacts among Socioeconomic Groups in the World of Work," IJERPH, MDPI, vol. 19(19), pages 1-12, October.
    9. Dario Sansone & Anna Zhu, 2023. "Using Machine Learning to Create an Early Warning System for Welfare Recipients," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 85(5), pages 959-992, October.
    10. Ana Patrícia Rocha & Hugo Miguel Pereira Choupina & Maria do Carmo Vilas-Boas & José Maria Fernandes & João Paulo Silva Cunha, 2018. "System for automatic gait analysis based on a single RGB-D camera," PLOS ONE, Public Library of Science, vol. 13(8), pages 1-24, August.
    11. McKenzie, David & Sansone, Dario, 2017. "Man vs. Machine in Predicting Successful Entrepreneurs: Evidence from a Business Plan Competition in Nigeria," CEPR Discussion Papers 12523, C.E.P.R. Discussion Papers.
    12. Phichhang Ou & Hengshan Wang, 2009. "Prediction of Stock Market Index Movement by Ten Data Mining Techniques," Modern Applied Science, Canadian Center of Science and Education, vol. 3(12), pages 1-28, December.
    13. Lamperti, Fabio, 2024. "Unlocking machine learning for social sciences: The case for identifying Industry 4.0 adoption across business restructuring events," Technological Forecasting and Social Change, Elsevier, vol. 207(C).
    14. Luca Longo, 2018. "Experienced mental workload, perception of usability, their interaction and impact on task performance," PLOS ONE, Public Library of Science, vol. 13(8), pages 1-36, August.
    15. Vítor João Pereira Domingues Martinho, 2019. "Socioeconomic Impacts of Forest Fires upon Portugal: An Analysis for the Agricultural and Forestry Sectors," Sustainability, MDPI, vol. 11(2), pages 1-14, January.
    16. Courage Kamusoko & Jonah Gamba & Hitomi Murakami, 2014. "Mapping Woodland Cover in the Miombo Ecosystem: A Comparison of Machine Learning Classifiers," Land, MDPI, vol. 3(2), pages 1-17, June.
    17. Roberson Andrea, 2021. "Applying Machine Learning for Automatic Product Categorization," Journal of Official Statistics, Sciendo, vol. 37(2), pages 395-410, June.
    18. Sotiropoulou, Kalliopi F. & Vavatsikos, Athanasios P. & Botsaris, Pantelis N., 2024. "A hybrid AHP-PROMETHEE II onshore wind farms multicriteria suitability analysis using kNN and SVM regression models in northeastern Greece," Renewable Energy, Elsevier, vol. 221(C).
    19. Perthame, Emeline & Forbes, Florence & Deleforge, Antoine, 2018. "Inverse regression approach to robust nonlinear high-to-low dimensional mapping," Journal of Multivariate Analysis, Elsevier, vol. 163(C), pages 1-14.
    20. Jindřich Špička, 2018. "How Do Agricultural Biogas Investments Affect Czech Farms?," Central European Business Review, Prague University of Economics and Business, vol. 2018(4), pages 34-60.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jijerp:v:20:y:2023:i:5:p:3910-:d:1076934. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.