IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0220925.html
   My bibliography  Save this article

Automated recognition of functional compound-protein relationships in literature

Author

Listed:
  • Kersten Döring
  • Ammar Qaseem
  • Michael Becer
  • Jianyu Li
  • Pankaj Mishra
  • Mingjie Gao
  • Pascal Kirchner
  • Florian Sauter
  • Kiran K Telukunta
  • Aurélien F A Moumbock
  • Philippe Thomas
  • Stefan Günther

Abstract

Motivation: Much effort has been invested in the identification of protein-protein interactions using text mining and machine learning methods. The extraction of functional relationships between chemical compounds and proteins from literature has received much less attention, and no ready-to-use open-source software is so far available for this task. Method: We created a new benchmark dataset of 2,613 sentences from abstracts containing annotations of proteins, small molecules, and their relationships. Two kernel methods were applied to classify these relationships as functional or non-functional, named shallow linguistic and all-paths graph kernel. Furthermore, the benefit of interaction verbs in sentences was evaluated. Results: The cross-validation of the all-paths graph kernel (AUC value: 84.6%, F1 score: 79.0%) shows slightly better results than the shallow linguistic kernel (AUC value: 82.5%, F1 score: 77.2%) on our benchmark dataset. Both models achieve state-of-the-art performance in the research area of relation extraction. Furthermore, the combination of shallow linguistic and all-paths graph kernel could further increase the overall performance slightly. We used each of the two kernels to identify functional relationships in all PubMed abstracts (29 million) and provide the results, including recorded processing time. Availability: The software for the tested kernels, the benchmark, the processed 29 million PubMed abstracts, all evaluation scripts, as well as the scripts for processing the complete PubMed database are freely available at https://github.com/KerstenDoering/CPI-Pipeline.

Suggested Citation

  • Kersten Döring & Ammar Qaseem & Michael Becer & Jianyu Li & Pankaj Mishra & Mingjie Gao & Pascal Kirchner & Florian Sauter & Kiran K Telukunta & Aurélien F A Moumbock & Philippe Thomas & Stefan Günthe, 2020. "Automated recognition of functional compound-protein relationships in literature," PLOS ONE, Public Library of Science, vol. 15(3), pages 1-14, March.
  • Handle: RePEc:plo:pone00:0220925
    DOI: 10.1371/journal.pone.0220925
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0220925
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0220925&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0220925?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Domonkos Tikk & Philippe Thomas & Peter Palaga & Jörg Hakenberg & Ulf Leser, 2010. "A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature," PLOS Computational Biology, Public Library of Science, vol. 6(7), pages 1-19, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kanchan Jha & Sriparna Saha & Pratik Dutta, 2024. "Incorporation of gene ontology in identification of protein interactions from biomedical corpus: a multi-modal approach," Annals of Operations Research, Springer, vol. 339(3), pages 1793-1811, August.
    2. Peng Su & Gang Li & Cathy Wu & K Vijay-Shanker, 2019. "Using distant supervision to augment manually annotated data for relation extraction," PLOS ONE, Public Library of Science, vol. 14(7), pages 1-17, July.
    3. Shandar Ahmad & Kenji Mizuguchi, 2011. "Partner-Aware Prediction of Interacting Residues in Protein-Protein Complexes from Sequence Data," PLOS ONE, Public Library of Science, vol. 6(12), pages 1-11, December.
    4. Behrouz Bokharaeian & Alberto Diaz & Hamidreza Chitsaz, 2016. "Enhancing Extraction of Drug-Drug Interaction from Literature Using Neutral Candidates, Negation, and Clause Dependency," PLOS ONE, Public Library of Science, vol. 11(10), pages 1-20, October.
    5. Haibin Liu & Lawrence Hunter & Vlado Kešelj & Karin Verspoor, 2013. "Approximate Subgraph Matching-Based Literature Mining for Biomedical Events and Relations," PLOS ONE, Public Library of Science, vol. 8(4), pages 1-16, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0220925. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.