IDEAS home Printed from https://ideas.repec.org/a/gam/jijerp/v19y2022i24p16590-d999314.html
   My bibliography  Save this article

Disease- and Drug-Related Knowledge Extraction for Health Management from Online Health Communities Based on BERT-BiGRU-ATT

Author

Listed:
  • Yanli Zhang

    (College of Business Administration, Henan Finance University, Zhengzhou 451464, China
    Business School, Henan University, Kaifeng 475004, China
    These authors contributed equally to this work.)

  • Xinmiao Li

    (School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai 200433, China
    These authors contributed equally to this work.)

  • Yu Yang

    (School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai 200433, China
    China Banking and Insurance Regulatory Commission Neimengu Office, Hohhot 010019, China
    These authors contributed equally to this work.)

  • Tao Wang

    (College of Business Administration, Henan Finance University, Zhengzhou 451464, China
    These authors contributed equally to this work.)

Abstract

Knowledge extraction from rich text in online health communities can supplement and improve the existing knowledge base, supporting evidence-based medicine and clinical decision making. The extracted time series health management data of users can help users with similar conditions when managing their health. By annotating four relationships, this study constructed a deep learning model, BERT-BiGRU-ATT, to extract disease–medication relationships. A Chinese-pretrained BERT model was used to generate word embeddings for the question-and-answer data from online health communities in China. In addition, the bidirectional gated recurrent unit, combined with an attention mechanism, was employed to capture sequence context features and then to classify text related to diseases and drugs using a softmax classifier and to obtain the time series data provided by users. By using various word embedding training experiments and comparisons with classical models, the superiority of our model in relation to extraction was verified. Based on the knowledge extraction, the evolution of a user’s disease progression was analyzed according to the time series data provided by users to further analyze the evolution of the user’s disease progression. BERT word embedding, GRU, and attention mechanisms in our research play major roles in knowledge extraction. The knowledge extraction results obtained are expected to supplement and improve the existing knowledge base, assist doctors’ diagnosis, and help users with dynamic lifecycle health management, such as user disease treatment management. In future studies, a co-reference resolution can be introduced to further improve the effect of extracting the relationships among diseases, drugs, and drug effects.

Suggested Citation

  • Yanli Zhang & Xinmiao Li & Yu Yang & Tao Wang, 2022. "Disease- and Drug-Related Knowledge Extraction for Health Management from Online Health Communities Based on BERT-BiGRU-ATT," IJERPH, MDPI, vol. 19(24), pages 1-13, December.
  • Handle: RePEc:gam:jijerp:v:19:y:2022:i:24:p:16590-:d:999314
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1660-4601/19/24/16590/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1660-4601/19/24/16590/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Tingran Zhang & Kun Wang & Ning Li & Chansol Hurr & Jiong Luo, 2021. "The Relationship between Different Amounts of Physical Exercise, Internal Inhibition, and Drug Craving in Individuals with Substance-Use Disorders," IJERPH, MDPI, vol. 18(23), pages 1-14, November.
    2. Ioannis N. Anastopoulos & Chloe K. Herczeg & Kasey N. Davis & Atray C. Dixit, 2021. "Multi-Drug Featurization and Deep Learning Improve Patient-Specific Predictions of Adverse Events," IJERPH, MDPI, vol. 18(5), pages 1-11, March.
    3. Chiu-Chu Lin & Shang-Jyh Hwang, 2020. "Patient-Centered Self-Management in Patients with Chronic Kidney Disease: Challenges and Implications," IJERPH, MDPI, vol. 17(24), pages 1-13, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Emanuel Adrian Sârbu & Marius Marici & Simona Bostan & Liviu Gavrila-Ardelean, 2023. "Physical and Recreational Activities, Sedentary Screen Time, Time Spent with Parents and Drug Use in Adolescents," IJERPH, MDPI, vol. 20(2), pages 1-13, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jijerp:v:19:y:2022:i:24:p:16590-:d:999314. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.