Sorting Through the Safety Data Haystack: Using Machine Learning to Identify Individual Case Safety Reports in Social-Digital Media

My bibliography Save this article

Sorting Through the Safety Data Haystack: Using Machine Learning to Identify Individual Case Safety Reports in Social-Digital Media

Author

Listed:

Shaun Comfort
(Genentech, A Member of the Roche Group, Roche)
Sujan Perera
(IBM Watson Health)
Zoe Hudson
(Genentech, A Member of the Roche Group, Roche)
Darren Dorrell
(Genentech, A Member of the Roche Group, Roche)
Shawman Meireis
(Genentech, A Member of the Roche Group, Roche)
Meenakshi Nagarajan
(IBM Watson Health)
Cartic Ramakrishnan
(IBM Watson Health)
Jennifer Fine
(Genentech, A Member of the Roche Group, Roche)

Registered:

Abstract

Introduction There is increasing interest in social digital media (SDM) as a data source for pharmacovigilance activities; however, SDM is considered a low information content data source for safety data. Given that pharmacovigilance itself operates in a high-noise, lower-validity environment without objective ‘gold standards’ beyond process definitions, the introduction of large volumes of SDM into the pharmacovigilance workflow has the potential to exacerbate issues with limited manual resources to perform adverse event identification and processing. Recent advances in medical informatics have resulted in methods for developing programs which can assist human experts in the detection of valid individual case safety reports (ICSRs) within SDM. Objective In this study, we developed rule-based and machine learning (ML) models for classifying ICSRs from SDM and compared their performance with that of human pharmacovigilance experts. Methods We used a random sampling from a collection of 311,189 SDM posts that mentioned Roche products and brands in combination with common medical and scientific terms sourced from Twitter, Tumblr, Facebook, and a spectrum of news media blogs to develop and evaluate three iterations of an automated ICSR classifier. The ICSR classifier models consisted of sub-components to annotate the relevant ICSR elements and a component to make the final decision on the validity of the ICSR. Agreement with human pharmacovigilance experts was chosen as the preferred performance metric and was evaluated by calculating the Gwet AC1 statistic (gKappa). The best performing model was tested against the Roche global pharmacovigilance expert using a blind dataset and put through a time test of the full 311,189-post dataset. Results During this effort, the initial strict rule-based approach to ICSR classification resulted in a model with an accuracy of 65% and a gKappa of 46%. Adding an ML-based adverse event annotator improved the accuracy to 74% and gKappa to 60%. This was further improved by the addition of an additional ML ICSR detector. On a blind test set of 2500 posts, the final model demonstrated a gKappa of 78% and an accuracy of 83%. In the time test, it took the final model 48 h to complete a task that would have taken an estimated 44,000 h for human experts to perform. Conclusion The results of this study indicate that an effective and scalable solution to the challenge of ICSR detection in SDM includes a workflow using an automated ML classifier to identify likely ICSRs for further human SME review.

Suggested Citation

Shaun Comfort & Sujan Perera & Zoe Hudson & Darren Dorrell & Shawman Meireis & Meenakshi Nagarajan & Cartic Ramakrishnan & Jennifer Fine, 2018. "Sorting Through the Safety Data Haystack: Using Machine Learning to Identify Individual Case Safety Reports in Social-Digital Media," Drug Safety, Springer, vol. 41(6), pages 579-590, June.

Handle: RePEc:spr:drugsa:v:41:y:2018:i:6:d:10.1007_s40264-018-0641-7
DOI: 10.1007/s40264-018-0641-7

Download full text from publisher

As the access to this document is restricted, you may want to search for a different version of it.

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Bissan Audeh & Florelle Bellet & Marie-Noëlle Beyens & Agnès Lillo-Le Louët & Cédric Bousquet, 2020. "Use of Social Media for Pharmacovigilance Activities: Key Findings and Recommendations from the Vigi4Med Project," Drug Safety, Springer, vol. 43(9), pages 835-851, September.
Yiqing Zhao & Yue Yu & Hanyin Wang & Yikuan Li & Yu Deng & Guoqian Jiang & Yuan Luo, 2022. "Machine Learning in Causal Inference: Application in Pharmacovigilance," Drug Safety, Springer, vol. 45(5), pages 459-476, May.
Alex Gartland & Andrew Bate & Jeffery L. Painter & Tim A. Casperson & Gregory Eugene Powell, 2021. "Developing Crowdsourced Training Data Sets for Pharmacovigilance Intelligent Automation," Drug Safety, Springer, vol. 44(3), pages 373-382, March.
Andrew Bate & Steve F. Hobbiger, 2021. "Artificial Intelligence, Real-World Automation and the Safety of Medicines," Drug Safety, Springer, vol. 44(2), pages 125-132, February.

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:drugsa:v:41:y:2018:i:6:d:10.1007_s40264-018-0641-7. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

We have no bibliographic references for this item. You can help adding them by using this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com/economics/journal/40264 .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Sorting Through the Safety Data Haystack: Using Machine Learning to Identify Individual Case Safety Reports in Social-Digital Media

Author

Abstract

Suggested Citation

Download full text from publisher

Citations

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data