IDEAS home Printed from https://ideas.repec.org/a/gam/jijerp/v18y2021i15p8153-d606587.html
   My bibliography  Save this article

Infectious Disease Relational Data Analysis Using String Grammar Non-Euclidean Relational Fuzzy C-Means

Author

Listed:
  • Apiwat Budwong

    (Department of Computer Engineering, Faculty of Engineering, Graduate School, Chiang Mai University, Chiang Mai 50200, Thailand)

  • Sansanee Auephanwiriyakul

    (Department of Computer Engineering, Faculty of Engineering, Excellence Center in Infrastructure Technology and Transportation Engineering, Biomedical Engineering Institute, Chiang Mai University, Chiang Mai 50200, Thailand)

  • Nipon Theera-Umpon

    (Department of Electrical Engineering, Faculty of Engineering, Biomedical Engineering Institute, Chiang Mai University, Chiang Mai 50200, Thailand)

Abstract

Statistical analysis in infectious diseases is becoming more important, especially in prevention policy development. To achieve that, the epidemiology, a study of the relationship between the occurrence and who/when/where, is needed. In this paper, we develop the string grammar non-Euclidean relational fuzzy C-means (sgNERF-CM) algorithm to determine a relationship inside the data from the age, career, and month viewpoint for all provinces in Thailand for the dengue fever, influenza, and Hepatitis B virus (HBV) infection. The Dunn’s index is used to select the best models because of its ability to identify the compact and well-separated clusters. We compare the results of the sgNERF-CM algorithm with the string grammar relational hard C-means (sgRHCM) algorithm. In addition, their numerical counterparts, i.e., relational hard C-means (RHCM) and non-Euclidean relational fuzzy C-means (NERF-CM) algorithms are also applied in the comparison. We found that the sgNERF-CM algorithm is far better than the numerical counterparts and better than the sgRHCM algorithm in most cases. From the results, we found that the month-based dataset does not help in relationship-finding since the diseases tend to happen all year round. People from different age ranges in different regions in Thailand have different numbers of dengue fever infections. The occupations that have a higher chance to have dengue fever are student and teacher groups from the central, north-east, north, and south regions. Additionally, students in all regions, except the central region, have a high risk of dengue infection. For the influenza dataset, we found that a group of people with the age of more than 1 year to 64 years old has higher number of influenza infections in every province. Most occupations in all regions have a higher risk of infecting the influenza. For the HBV dataset, people in all regions with an age between 10 to 65 years old have a high risk in infecting the disease. In addition, only farmer and general contractor groups in all regions have high chance of infecting HBV as well.

Suggested Citation

  • Apiwat Budwong & Sansanee Auephanwiriyakul & Nipon Theera-Umpon, 2021. "Infectious Disease Relational Data Analysis Using String Grammar Non-Euclidean Relational Fuzzy C-Means," IJERPH, MDPI, vol. 18(15), pages 1-18, August.
  • Handle: RePEc:gam:jijerp:v:18:y:2021:i:15:p:8153-:d:606587
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1660-4601/18/15/8153/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1660-4601/18/15/8153/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. N. G. Becker & T. Britton, 1999. "Statistical studies of infectious disease incidence," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(2), pages 287-307, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zhang, Zhibin, 2007. "The outbreak pattern of SARS cases in China as revealed by a mathematical model," Ecological Modelling, Elsevier, vol. 204(3), pages 420-426.
    2. Artalejo, J.R. & Lopez-Herrero, M.J., 2011. "The SIS and SIR stochastic epidemic models: A maximum entropy approach," Theoretical Population Biology, Elsevier, vol. 80(4), pages 256-264.
    3. Sifat Sharmin & Md. Israt Rayhan, 2012. "Spatio-temporal modeling of infectious disease dynamics," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(4), pages 875-882, September.
    4. David Lunn & Robert J B Goudie & Chen Wei & Oliver Kaltz & Olivier Restif, 2013. "Modelling the Dynamics of an Experimental Host-Pathogen Microcosm within a Hierarchical Bayesian Framework," PLOS ONE, Public Library of Science, vol. 8(8), pages 1-15, August.
    5. Lu Tang & Yiwang Zhou & Lili Wang & Soumik Purkayastha & Leyao Zhang & Jie He & Fei Wang & Peter X.‐K. Song, 2020. "A Review of Multi‐Compartment Infectious Disease Models," International Statistical Review, International Statistical Institute, vol. 88(2), pages 462-513, August.
    6. David A Rasmussen & Oliver Ratmann & Katia Koelle, 2011. "Inference for Nonlinear Epidemiological Models Using Genealogies and Time Series," PLOS Computational Biology, Public Library of Science, vol. 7(8), pages 1-11, August.
    7. Karen M Ong & Michael S Phillips & Charles S Peskin, 2020. "A mathematical model and inference method for bacterial colonization in hospital units applied to active surveillance data for carbapenem-resistant enterobacteriaceae," PLOS ONE, Public Library of Science, vol. 15(11), pages 1-32, November.
    8. Joe Meagher & Nial Friel, 2022. "Assessing epidemic curves for evidence of superspreading," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(4), pages 2179-2202, October.
    9. Qingxia Zhang & Dingcheng Wang, 2015. "Assessing the Role of Voluntary Self-Isolation in the Control of Pandemic Influenza Using a Household Epidemic Model," IJERPH, MDPI, vol. 12(8), pages 1-18, August.
    10. Tobias S Brett & Eamon B O’Dea & Éric Marty & Paige B Miller & Andrew W Park & John M Drake & Pejman Rohani, 2018. "Anticipating epidemic transitions with imperfect data," PLOS Computational Biology, Public Library of Science, vol. 14(6), pages 1-18, June.
    11. Ángel Berihuete & Marta Sánchez-Sánchez & Alfonso Suárez-Llorens, 2021. "A Bayesian Model of COVID-19 Cases Based on the Gompertz Curve," Mathematics, MDPI, vol. 9(3), pages 1-16, January.
    12. Akira Endo & Mitsuo Uchida & Adam J Kucharski & Sebastian Funk, 2019. "Fine-scale family structure shapes influenza transmission risk in households: Insights from primary schools in Matsumoto city, 2014/15," PLOS Computational Biology, Public Library of Science, vol. 15(12), pages 1-18, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jijerp:v:18:y:2021:i:15:p:8153-:d:606587. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.