Author
Listed:
- Oduwa Edo-Osagie
- Gillian Smith
- Iain Lake
- Obaghe Edeghere
- Beatriz De La Iglesia
Abstract
We investigate the use of Twitter data to deliver signals for syndromic surveillance in order to assess its ability to augment existing syndromic surveillance efforts and give a better understanding of symptomatic people who do not seek healthcare advice directly. We focus on a specific syndrome—asthma/difficulty breathing. We outline data collection using the Twitter streaming API as well as analysis and pre-processing of the collected data. Even with keyword-based data collection, many of the tweets collected are not be relevant because they represent chatter, or talk of awareness instead of an individual suffering a particular condition. In light of this, we set out to identify relevant tweets to collect a strong and reliable signal. For this, we investigate text classification techniques, and in particular we focus on semi-supervised classification techniques since they enable us to use more of the Twitter data collected while only doing very minimal labelling. In this paper, we propose a semi-supervised approach to symptomatic tweet classification and relevance filtering. We also propose alternative techniques to popular deep learning approaches. Additionally, we highlight the use of emojis and other special features capturing the tweet’s tone to improve the classification performance. Our results show that negative emojis and those that denote laughter provide the best classification performance in conjunction with a simple word-level n-gram approach. We obtain good performance in classifying symptomatic tweets with both supervised and semi-supervised algorithms and found that the proposed semi-supervised algorithms preserve more of the relevant tweets and may be advantageous in the context of a weak signal. Finally, we found some correlation (r = 0.414, p = 0.0004) between the Twitter signal generated with the semi-supervised system and data from consultations for related health conditions.
Suggested Citation
Oduwa Edo-Osagie & Gillian Smith & Iain Lake & Obaghe Edeghere & Beatriz De La Iglesia, 2019.
"Twitter mining using semi-supervised classification for relevance filtering in syndromic surveillance,"
PLOS ONE, Public Library of Science, vol. 14(7), pages 1-29, July.
Handle:
RePEc:plo:pone00:0210689
DOI: 10.1371/journal.pone.0210689
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0210689. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.