IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0216913.html
   My bibliography  Save this article

Using distant supervision to augment manually annotated data for relation extraction

Author

Listed:
  • Peng Su
  • Gang Li
  • Cathy Wu
  • K Vijay-Shanker

Abstract

Significant progress has been made in applying deep learning on natural language processing tasks recently. However, deep learning models typically require a large amount of annotated training data while often only small labeled datasets are available for many natural language processing tasks in biomedical literature. Building large-size datasets for deep learning is expensive since it involves considerable human effort and usually requires domain expertise in specialized fields. In this work, we consider augmenting manually annotated data with large amounts of data using distant supervision. However, data obtained by distant supervision is often noisy, we first apply some heuristics to remove some of the incorrect annotations. Then using methods inspired from transfer learning, we show that the resulting models outperform models trained on the original manually annotated sets.

Suggested Citation

  • Peng Su & Gang Li & Cathy Wu & K Vijay-Shanker, 2019. "Using distant supervision to augment manually annotated data for relation extraction," PLOS ONE, Public Library of Science, vol. 14(7), pages 1-17, July.
  • Handle: RePEc:plo:pone00:0216913
    DOI: 10.1371/journal.pone.0216913
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0216913
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0216913&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0216913?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Andre Lamurias & Luka A Clarke & Francisco M Couto, 2017. "Extracting microRNA-gene relations from biomedical literature using distant supervision," PLOS ONE, Public Library of Science, vol. 12(3), pages 1-20, March.
    2. Domonkos Tikk & Philippe Thomas & Peter Palaga & Jörg Hakenberg & Ulf Leser, 2010. "A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature," PLOS Computational Biology, Public Library of Science, vol. 6(7), pages 1-19, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kanchan Jha & Sriparna Saha & Pratik Dutta, 2024. "Incorporation of gene ontology in identification of protein interactions from biomedical corpus: a multi-modal approach," Annals of Operations Research, Springer, vol. 339(3), pages 1793-1811, August.
    2. Kersten Döring & Ammar Qaseem & Michael Becer & Jianyu Li & Pankaj Mishra & Mingjie Gao & Pascal Kirchner & Florian Sauter & Kiran K Telukunta & Aurélien F A Moumbock & Philippe Thomas & Stefan Günthe, 2020. "Automated recognition of functional compound-protein relationships in literature," PLOS ONE, Public Library of Science, vol. 15(3), pages 1-14, March.
    3. Shandar Ahmad & Kenji Mizuguchi, 2011. "Partner-Aware Prediction of Interacting Residues in Protein-Protein Complexes from Sequence Data," PLOS ONE, Public Library of Science, vol. 6(12), pages 1-11, December.
    4. Behrouz Bokharaeian & Alberto Diaz & Hamidreza Chitsaz, 2016. "Enhancing Extraction of Drug-Drug Interaction from Literature Using Neutral Candidates, Negation, and Clause Dependency," PLOS ONE, Public Library of Science, vol. 11(10), pages 1-20, October.
    5. Haibin Liu & Lawrence Hunter & Vlado Kešelj & Karin Verspoor, 2013. "Approximate Subgraph Matching-Based Literature Mining for Biomedical Events and Relations," PLOS ONE, Public Library of Science, vol. 8(4), pages 1-16, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0216913. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.