Using Probabilistic Record Linkage of Structured and Unstructured Data to Identify Duplicate Cases in Spontaneous Adverse Event Reporting Systems

My bibliography Save this article

Using Probabilistic Record Linkage of Structured and Unstructured Data to Identify Duplicate Cases in Spontaneous Adverse Event Reporting Systems

Author

Listed:

Kory Kreimeyer
(US Food and Drug Administration)
David Menschik
(US Food and Drug Administration)
Scott Winiecki
(US Food and Drug Administration)
Wendy Paul
(US Food and Drug Administration)
Faith Barash
(US Food and Drug Administration)
Emily Jane Woo
(US Food and Drug Administration)
Meghna Alimchandani
(US Food and Drug Administration)
Deepa Arya
(US Food and Drug Administration)
Craig Zinderman
(US Food and Drug Administration)
Richard Forshee
(US Food and Drug Administration)
Taxiarchis Botsis
(US Food and Drug Administration)

Registered:

Abstract

Introduction Duplicate case reports in spontaneous adverse event reporting systems pose a challenge for medical reviewers to efficiently perform individual and aggregate safety analyses. Duplicate cases can bias data mining by generating spurious signals of disproportional reporting of product-adverse event pairs. Objective We have developed a probabilistic record linkage algorithm for identifying duplicate cases in the US Vaccine Adverse Event Reporting System (VAERS) and the US Food and Drug Administration Adverse Event Reporting System (FAERS). Methods In addition to using structured field data, the algorithm incorporates the non-structured narrative text of adverse event reports by examining clinical and temporal information extracted by the Event-based Text-mining of Health Electronic Records system, a natural language processing tool. The final component of the algorithm is a novel duplicate confidence value that is calculated by a rule-based empirical approach that looks for similarities in a number of criteria between two case reports. Results For VAERS, the algorithm identified 77% of known duplicate pairs with a precision (or positive predictive value) of 95%. For FAERS, it identified 13% of known duplicate pairs with a precision of 100%. The textual information did not improve the algorithm’s automated classification for VAERS or FAERS. The empirical duplicate confidence value increased performance on both VAERS and FAERS, mainly by reducing the occurrence of false-positives. Conclusions The algorithm was shown to be effective at identifying pre-linked duplicate VAERS reports. The narrative text was not shown to be a key component in the automated detection evaluation; however, it is essential for supporting the semi-automated approach that is likely to be deployed at the Food and Drug Administration, where medical reviewers will perform some manual review of the most highly ranked reports identified by the algorithm.

Suggested Citation

Kory Kreimeyer & David Menschik & Scott Winiecki & Wendy Paul & Faith Barash & Emily Jane Woo & Meghna Alimchandani & Deepa Arya & Craig Zinderman & Richard Forshee & Taxiarchis Botsis, 2017. "Using Probabilistic Record Linkage of Structured and Unstructured Data to Identify Duplicate Cases in Spontaneous Adverse Event Reporting Systems," Drug Safety, Springer, vol. 40(7), pages 571-582, July.

Handle: RePEc:spr:drugsa:v:40:y:2017:i:7:d:10.1007_s40264-017-0523-4
DOI: 10.1007/s40264-017-0523-4

Download full text from publisher

As the access to this document is restricted, you may want to search for a different version of it.

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

G. Niklas Norén, 2017. "The Power of the Case Narrative - Can it be Brought to Bear on Duplicate Detection?," Drug Safety, Springer, vol. 40(7), pages 543-546, July.

More about this item

Keywords

Adverse Event Reporting System; Narrative Text; Vaccine Adverse Event Reporting System; Yellow Fever Vaccine; Vaccination Date;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:drugsa:v:40:y:2017:i:7:d:10.1007_s40264-017-0523-4. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

We have no bibliographic references for this item. You can help adding them by using this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com/economics/journal/40264 .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Using Probabilistic Record Linkage of Structured and Unstructured Data to Identify Duplicate Cases in Spontaneous Adverse Event Reporting Systems

Author

Abstract

Suggested Citation

Download full text from publisher

Citations

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data