IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1006173.html
   My bibliography  Save this article

Unsupervised extraction of epidemic syndromes from participatory influenza surveillance self-reported symptoms

Author

Listed:
  • Kyriaki Kalimeri
  • Matteo Delfino
  • Ciro Cattuto
  • Daniela Perrotta
  • Vittoria Colizza
  • Caroline Guerrisi
  • Clement Turbelin
  • Jim Duggan
  • John Edmunds
  • Chinelo Obi
  • Richard Pebody
  • Ana O Franco
  • Yamir Moreno
  • Sandro Meloni
  • Carl Koppeschaar
  • Charlotte Kjelsø
  • Ricardo Mexia
  • Daniela Paolotti

Abstract

Seasonal influenza surveillance is usually carried out by sentinel general practitioners (GPs) who compile weekly reports based on the number of influenza-like illness (ILI) clinical cases observed among visited patients. This traditional practice for surveillance generally presents several issues, such as a delay of one week or more in releasing reports, population biases in the health-seeking behaviour, and the lack of a common definition of ILI case. On the other hand, the availability of novel data streams has recently led to the emergence of non-traditional approaches for disease surveillance that can alleviate these issues. In Europe, a participatory web-based surveillance system called Influenzanet represents a powerful tool for monitoring seasonal influenza epidemics thanks to aid of self-selected volunteers from the general population who monitor and report their health status through Internet-based surveys, thus allowing a real-time estimate of the level of influenza circulating in the population. In this work, we propose an unsupervised probabilistic framework that combines time series analysis of self-reported symptoms collected by the Influenzanet platforms and performs an algorithmic detection of groups of symptoms, called syndromes. The aim of this study is to show that participatory web-based surveillance systems are capable of detecting the temporal trends of influenza-like illness even without relying on a specific case definition. The methodology was applied to data collected by Influenzanet platforms over the course of six influenza seasons, from 2011-2012 to 2016-2017, with an average of 34,000 participants per season. Results show that our framework is capable of selecting temporal trends of syndromes that closely follow the ILI incidence rates reported by the traditional surveillance systems in the various countries (Pearson correlations ranging from 0.69 for Italy to 0.88 for the Netherlands, with the sole exception of Ireland with a correlation of 0.38). The proposed framework was able to forecast quite accurately the ILI trend of the forthcoming influenza season (2016-2017) based only on the available information of the previous years (2011-2016). Furthermore, to broaden the scope of our approach, we applied it both in a forecasting fashion to predict the ILI trend of the 2016-2017 influenza season (Pearson correlations ranging from 0.60 for Ireland and UK, and 0.85 for the Netherlands) and also to detect gastrointestinal syndrome in France (Pearson correlation of 0.66). The final result is a near-real-time flexible surveillance framework not constrained by any specific case definition and capable of capturing the heterogeneity in symptoms circulation during influenza epidemics in the various European countries.Author summary: This study suggests how web-based surveillance data can provide an epidemiological signal capable of detecting the temporal trends of influenza-like illness without relying on a specific case definition. The proposed framework was able to forecast quite accurately the ILI trend of the forthcoming influenza season based only on the available information of the previous years. Moreover, to broaden the scope of our approach, we applied it to the detection of gastrointestinal syndromes. We evaluated the approach against the traditional surveillance data and despite the limited amount of data, the gastrointestinal trend was successfully detected. The result is a near-real-time flexible surveillance and prediction tool that is not constrained by any disease case definition.

Suggested Citation

  • Kyriaki Kalimeri & Matteo Delfino & Ciro Cattuto & Daniela Perrotta & Vittoria Colizza & Caroline Guerrisi & Clement Turbelin & Jim Duggan & John Edmunds & Chinelo Obi & Richard Pebody & Ana O Franco , 2019. "Unsupervised extraction of epidemic syndromes from participatory influenza surveillance self-reported symptoms," PLOS Computational Biology, Public Library of Science, vol. 15(4), pages 1-21, April.
  • Handle: RePEc:plo:pcbi00:1006173
    DOI: 10.1371/journal.pcbi.1006173
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006173
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1006173&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1006173?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Edward Goldstein & Benjamin J Cowling & Allison E Aiello & Saki Takahashi & Gary King & Ying Lu & Marc Lipsitch, 2011. "Estimating Incidence Curves of Several Infections Using Symptom Surveillance Data," PLOS ONE, Public Library of Science, vol. 6(8), pages 1-8, August.
    2. Ding, Chris & Li, Tao & Peng, Wei, 2008. "On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing," Computational Statistics & Data Analysis, Elsevier, vol. 52(8), pages 3913-3927, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Canelle Poirier & Yulin Hswen & Guillaume Bouzillé & Marc Cuggia & Audrey Lavenu & John S Brownstein & Thomas Brewer & Mauricio Santillana, 2021. "Influenza forecasting for French regions combining EHR, web and climatic data sources with a machine learning ensemble approach," PLOS ONE, Public Library of Science, vol. 16(5), pages 1-26, May.
    2. Maud Thomas & Holger Rootzén, 2022. "Real‐time prediction of severe influenza epidemics using extreme value statistics," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(2), pages 376-394, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ma, Tinghuai & Suo, Xiafei & Zhou, Jinjuan & Tang, Meili & Guan, Donghai & Tian, Yuan & Al-Dhelaan, Abdullah & Al-Rodhaan, Mznah, 2016. "Augmenting matrix factorization technique with the combination of tags and genres," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 461(C), pages 101-116.
    2. Manini Madireddy & Ramasubramanian Sundararajan & Goda Doreswamy & Meisam Hejazi Nia & Amod Mital, 2017. "Constructing bundled offers for airline customers," Journal of Revenue and Pricing Management, Palgrave Macmillan, vol. 16(6), pages 532-552, December.
    3. Shota Saito & Yoshito Hirata & Kazutoshi Sasahara & Hideyuki Suzuki, 2015. "Tracking Time Evolution of Collective Attention Clusters in Twitter: Time Evolving Nonnegative Matrix Factorisation," PLOS ONE, Public Library of Science, vol. 10(9), pages 1-17, September.
    4. Nicolas Jouvin & Pierre Latouche & Charles Bouveyron & Guillaume Bataillon & Alain Livartowski, 2021. "Greedy clustering of count data through a mixture of multinomial PCA," Computational Statistics, Springer, vol. 36(1), pages 1-33, March.
    5. Bastian Schaefermeier & Gerd Stumme & Tom Hanika, 2021. "Topic space trajectories," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5759-5795, July.
    6. Travis R Meyer & Daniel Balagué & Miguel Camacho-Collados & Hao Li & Katie Khuu & P Jeffrey Brantingham & Andrea L Bertozzi, 2019. "A year in Madrid as described through the analysis of geotagged Twitter data," Environment and Planning B, , vol. 46(9), pages 1724-1740, November.
    7. Triss Ashton & Nicholas Evangelopoulos & Victor Prybutok, 2014. "Extending monitoring methods to textual data: a research agenda," Quality & Quantity: International Journal of Methodology, Springer, vol. 48(4), pages 2277-2294, July.
    8. Zhang, Zhong-Yuan & Gai, Yujie & Wang, Yu-Fei & Cheng, Hui-Min & Liu, Xin, 2018. "On equivalence of likelihood maximization of stochastic block model and constrained nonnegative matrix factorization," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 503(C), pages 687-697.
    9. Dongjin Choi & Jun-Gi Jang & U Kang, 2019. "S3CMTF: Fast, accurate, and scalable method for incomplete coupled matrix-tensor factorization," PLOS ONE, Public Library of Science, vol. 14(6), pages 1-20, June.
    10. Sun, Lijun & Axhausen, Kay W., 2016. "Understanding urban mobility patterns with a probabilistic tensor factorization framework," Transportation Research Part B: Methodological, Elsevier, vol. 91(C), pages 511-524.
    11. Danushka Bollegala & Georgios Kontonatsios & Sophia Ananiadou, 2015. "A Cross-Lingual Similarity Measure for Detecting Biomedical Term Translations," PLOS ONE, Public Library of Science, vol. 10(6), pages 1-28, June.
    12. Ma, Xiaoke & Wang, Bingbo & Yu, Liang, 2018. "Semi-supervised spectral algorithms for community detection in complex networks based on equivalence of clustering methods," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 490(C), pages 786-802.
    13. Pau Figuera & Alfredo Cuzzocrea & Pablo García Bringas, 2024. "Clustering Validation Inference," Mathematics, MDPI, vol. 12(15), pages 1-31, July.
    14. Alexandre L. M. Levada, 2021. "PCA-KL: a parametric dimensionality reduction approach for unsupervised metric learning," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(4), pages 829-868, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1006173. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.