IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0207996.html
   My bibliography  Save this article

Feature engineering for sentiment analysis in e-health forums

Author

Listed:
  • Jorge Carrillo-de-Albornoz
  • Javier Rodríguez Vidal
  • Laura Plaza

Abstract

Introduction: Exploiting information in health-related social media services is of great interest for patients, researchers and medical companies. The challenge is, however, to provide easy, quick and relevant access to the vast amount of information that is available. One step towards facilitating information access to online health data is opinion mining. Even though the classification of patient opinions into positive and negative has been previously tackled, most works make use of machine learning methods and bags of words. Our first contribution is an extensive evaluation of different features, including lexical, syntactic, semantic, network-based, sentiment-based and word embeddings features to represent patient-authored texts for polarity classification. The second contribution of this work is the study of polar facts (i.e. objective information with polar connotations). Traditionally, the presence of polar facts has been neglected and research in polarity classification has been bounded to opinionated texts. We demonstrate the existence and importance of polar facts for the polarity classification of health information. Material and methods: We annotate a set of more than 3500 posts to online health forums of breast cancer, crohn and different allergies, respectively. Each sentence in a post is manually labeled as “experience”, “fact” or “opinion”, and as “positive”, “negative” and “neutral”. Using this data, we train different machine learning algorithms and compare traditional bags of words representations with word embeddings in combination with lexical, syntactic, semantic, network-based and emotional properties of texts to automatically classify patient-authored contents into positive, negative and neutral. Beside, we experiment with a combination of textual and semantic representations by generating concept embeddings using the UMLS Metathesaurus. Results: We reach two main results: first, we find that it is possible to predict polarity of patient-authored contents with a very high accuracy (≈ 70 percent) using word embeddings, and that this considerably outperforms more traditional representations like bags of words; and second, when dealing with medical information, negative and positive facts (i.e. objective information) are nearly as frequent as negative and positive opinions and experiences (i.e. subjective information), and their importance for polarity classification is crucial.

Suggested Citation

  • Jorge Carrillo-de-Albornoz & Javier Rodríguez Vidal & Laura Plaza, 2018. "Feature engineering for sentiment analysis in e-health forums," PLOS ONE, Public Library of Science, vol. 13(11), pages 1-25, November.
  • Handle: RePEc:plo:pone00:0207996
    DOI: 10.1371/journal.pone.0207996
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0207996
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0207996&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0207996?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Silvia Garc'ia-M'endez & Francisco de Arriba-P'erez & Ana Barros-Vila & Francisco J. Gonz'alez-Casta~no, 2024. "Targeted aspect-based emotion analysis to detect opportunities and precaution in financial Twitter messages," Papers 2404.08665, arXiv.org.
    2. Emmanuel Ajayi Olajubu & Ezekiel Aliyu & Adesola Ganiyu Aderounmu & Kamagate Beman Hamidja, 2021. "Managing E-Patient Case Notes in Tertiary Hospitals: A Sub-Saharan African Experience," International Journal of Healthcare Information Systems and Informatics (IJHISI), IGI Global, vol. 16(4), pages 1-19, October.
    3. Dimitrios Kydros & Maria Argyropoulou & Vasiliki Vrana, 2021. "A Content and Sentiment Analysis of Greek Tweets during the Pandemic," Sustainability, MDPI, vol. 13(11), pages 1-21, May.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0207996. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.