IDEAS home Printed from https://ideas.repec.org/a/gam/jijerp/v18y2021i13p6750-d580451.html
   My bibliography  Save this article

Machine Learning for Analyzing Non-Countermeasure Factors Affecting Early Spread of COVID-19

Author

Listed:
  • Vito Janko

    (Jožef Stefan Institute, 1000 Ljubljana, Slovenia)

  • Gašper Slapničar

    (Jožef Stefan Institute, 1000 Ljubljana, Slovenia)

  • Erik Dovgan

    (Jožef Stefan Institute, 1000 Ljubljana, Slovenia)

  • Nina Reščič

    (Jožef Stefan Institute, 1000 Ljubljana, Slovenia)

  • Tine Kolenik

    (Jožef Stefan Institute, 1000 Ljubljana, Slovenia)

  • Martin Gjoreski

    (Jožef Stefan Institute, 1000 Ljubljana, Slovenia)

  • Maj Smerkol

    (Jožef Stefan Institute, 1000 Ljubljana, Slovenia)

  • Matjaž Gams

    (Jožef Stefan Institute, 1000 Ljubljana, Slovenia)

  • Mitja Luštrek

    (Jožef Stefan Institute, 1000 Ljubljana, Slovenia)

Abstract

The COVID-19 pandemic affected the whole world, but not all countries were impacted equally. This opens the question of what factors can explain the initial faster spread in some countries compared to others. Many such factors are overshadowed by the effect of the countermeasures, so we studied the early phases of the infection when countermeasures had not yet taken place. We collected the most diverse dataset of potentially relevant factors and infection metrics to date for this task. Using it, we show the importance of different factors and factor categories as determined by both statistical methods and machine learning (ML) feature selection (FS) approaches. Factors related to culture (e.g., individualism, openness), development, and travel proved the most important. A more thorough factor analysis was then made using a novel rule discovery algorithm. We also show how interconnected these factors are and caution against relying on ML analysis in isolation. Importantly, we explore potential pitfalls found in the methodology of similar work and demonstrate their impact on COVID-19 data analysis. Our best models using the decision tree classifier can predict the infection class with roughly 80% accuracy.

Suggested Citation

  • Vito Janko & Gašper Slapničar & Erik Dovgan & Nina Reščič & Tine Kolenik & Martin Gjoreski & Maj Smerkol & Matjaž Gams & Mitja Luštrek, 2021. "Machine Learning for Analyzing Non-Countermeasure Factors Affecting Early Spread of COVID-19," IJERPH, MDPI, vol. 18(13), pages 1-33, June.
  • Handle: RePEc:gam:jijerp:v:18:y:2021:i:13:p:6750-:d:580451
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1660-4601/18/13/6750/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1660-4601/18/13/6750/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Enrico Spolaore & Romain Wacziarg, 2018. "Ancestry and development: New evidence," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 33(5), pages 748-762, August.
    2. Chimmula, Vinay Kumar Reddy & Zhang, Lei, 2020. "Time series forecasting of COVID-19 transmission in Canada using LSTM networks," Chaos, Solitons & Fractals, Elsevier, vol. 135(C).
    3. Kursa, Miron B. & Rudnicki, Witold R., 2010. "Feature Selection with the Boruta Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 36(i11).
    4. Yun Qiu & Xi Chen & Wei Shi, 2020. "Impacts of social and economic factors on the transmission of coronavirus disease 2019 (COVID-19) in China," Journal of Population Economics, Springer;European Society for Population Economics, vol. 33(4), pages 1127-1172, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wang, Peipei & Zheng, Xinqi & Ai, Gang & Liu, Dongya & Zhu, Bangren, 2020. "Time series prediction for the epidemic trends of COVID-19 using the improved LSTM deep learning method: Case studies in Russia, Peru and Iran," Chaos, Solitons & Fractals, Elsevier, vol. 140(C).
    2. Mudassar Arsalan & Omar Mubin & Fady Alnajjar & Belal Alsinglawi, 2020. "COVID-19 Global Risk: Expectation vs. Reality," IJERPH, MDPI, vol. 17(15), pages 1-10, August.
    3. Badarinza, Cristian & Ramadorai, Tarun & Shimizu, Chihiro, 2022. "Gravity, counterparties, and foreign investment," Journal of Financial Economics, Elsevier, vol. 145(2), pages 132-152.
    4. Mbassi, Christophe Martial & Messono, Omang Ombolo, 2023. "Historical technology and current economic development: Reassessing the nature of the relationship," Technological Forecasting and Social Change, Elsevier, vol. 195(C).
    5. Markus Brueckner & Ngo Van Long & Joaquin L. Vespignani, 2020. "Non-Gravity Trade," Globalization Institute Working Papers 388, Federal Reserve Bank of Dallas.
    6. Tong, Jianfeng & Liu, Zhenxing & Zhang, Yong & Zheng, Xiujuan & Jin, Junyang, 2023. "Improved multi-gate mixture-of-experts framework for multi-step prediction of gas load," Energy, Elsevier, vol. 282(C).
    7. Nicholas W. Papageorge & Matthew V. Zahn & Michèle Belot & Eline Broek-Altenburg & Syngjoo Choi & Julian C. Jamison & Egon Tripodi, 2021. "Socio-demographic factors associated with self-protecting behavior during the Covid-19 pandemic," Journal of Population Economics, Springer;European Society for Population Economics, vol. 34(2), pages 691-738, April.
    8. Asma Shaheen & Javed Iqbal, 2018. "Spatial Distribution and Mobility Assessment of Carcinogenic Heavy Metals in Soil Profiles Using Geostatistics and Random Forest, Boruta Algorithm," Sustainability, MDPI, vol. 10(3), pages 1-20, March.
    9. Alpaslan Akay & Amelie Constant & Corrado Giulietti & Martin Guzi, 2017. "Ethnic diversity and well-being," Journal of Population Economics, Springer;European Society for Population Economics, vol. 30(1), pages 265-306, January.
    10. Ramón Ferri-García & María del Mar Rueda, 2022. "Variable selection in Propensity Score Adjustment to mitigate selection bias in online surveys," Statistical Papers, Springer, vol. 63(6), pages 1829-1881, December.
    11. Chen, Simiao & Jin, Zhangfeng & Bloom, David E., 2020. "Act Early to Prevent Infections and Save Lives: Causal Impact of Diagnostic Efficiency on the COVID-19 Pandemic," IZA Discussion Papers 13749, Institute of Labor Economics (IZA).
    12. Célia Landmann Szwarcwald & Deborah Carvalho Malta & Marilisa Berti de Azevedo Barros & Paulo Roberto Borges de Souza Júnior & Dália Romero & Wanessa da Silva de Almeida & Giseli Nogueira Damacena & A, 2021. "Associations of Sociodemographic Factors and Health Behaviors with the Emotional Well-Being of Adolescents during the COVID-19 Pandemic in Brazil," IJERPH, MDPI, vol. 18(11), pages 1-13, June.
    13. Yvan Devaux & Lu Zhang & Andrew I. Lumley & Kanita Karaduzovic-Hadziabdic & Vincent Mooser & Simon Rousseau & Muhammad Shoaib & Venkata Satagopam & Muhamed Adilovic & Prashant Kumar Srivastava & Costa, 2024. "Development of a long noncoding RNA-based machine learning model to predict COVID-19 in-hospital mortality," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    14. Yulan Li & Kun Ma, 2022. "A Hybrid Model Based on Improved Transformer and Graph Convolutional Network for COVID-19 Forecasting," IJERPH, MDPI, vol. 19(19), pages 1-17, September.
    15. Ghosh, Indranil & Chaudhuri, Tamal Datta & Alfaro-Cortés, Esteban & Gámez, Matías & García, Noelia, 2022. "A hybrid approach to forecasting futures prices with simultaneous consideration of optimality in ensemble feature selection and advanced artificial intelligence," Technological Forecasting and Social Change, Elsevier, vol. 181(C).
    16. Catalina Amuedo-Dorantes & Neeraj Kaushal & Ashley N. Muchow, 2021. "Timing of social distancing policies and COVID-19 mortality: county-level evidence from the U.S," Journal of Population Economics, Springer;European Society for Population Economics, vol. 34(4), pages 1445-1472, October.
    17. Zhao, Xinxing & Li, Kainan & Ang, Candice Ke En & Cheong, Kang Hao, 2023. "A deep learning based hybrid architecture for weekly dengue incidences forecasting," Chaos, Solitons & Fractals, Elsevier, vol. 168(C).
    18. Conor Waldock & Bernhard Wegscheider & Dario Josi & Bárbara Borges Calegari & Jakob Brodersen & Luiz Jardim de Queiroz & Ole Seehausen, 2024. "Deconstructing the geography of human impacts on species’ natural distribution," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    19. Cui Zhang & Dandan Zhang, 2023. "Spatial Interactions and the Spread of COVID-19: A Network Perspective," Computational Economics, Springer;Society for Computational Economics, vol. 62(1), pages 383-405, June.
    20. Dodd, Olga & Frijns, Bart & Garel, Alexandre, 2022. "Cultural diversity among directors and corporate social responsibility," International Review of Financial Analysis, Elsevier, vol. 83(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jijerp:v:18:y:2021:i:13:p:6750-:d:580451. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.