IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1005182.html
   My bibliography  Save this article

WORMHOLE: Novel Least Diverged Ortholog Prediction through Machine Learning

Author

Listed:
  • George L Sutphin
  • J Matthew Mahoney
  • Keith Sheppard
  • David O Walton
  • Ron Korstanje

Abstract

The rapid advancement of technology in genomics and targeted genetic manipulation has made comparative biology an increasingly prominent strategy to model human disease processes. Predicting orthology relationships between species is a vital component of comparative biology. Dozens of strategies for predicting orthologs have been developed using combinations of gene and protein sequence, phylogenetic history, and functional interaction with progressively increasing accuracy. A relatively new class of orthology prediction strategies combines aspects of multiple methods into meta-tools, resulting in improved prediction performance. Here we present WORMHOLE, a novel ortholog prediction meta-tool that applies machine learning to integrate 17 distinct ortholog prediction algorithms to identify novel least diverged orthologs (LDOs) between 6 eukaryotic species—humans, mice, zebrafish, fruit flies, nematodes, and budding yeast. Machine learning allows WORMHOLE to intelligently incorporate predictions from a wide-spectrum of strategies in order to form aggregate predictions of LDOs with high confidence. In this study we demonstrate the performance of WORMHOLE across each combination of query and target species. We show that WORMHOLE is particularly adept at improving LDO prediction performance between distantly related species, expanding the pool of LDOs while maintaining low evolutionary distance and a high level of functional relatedness between genes in LDO pairs. We present extensive validation, including cross-validated prediction of PANTHER LDOs and evaluation of evolutionary divergence and functional similarity, and discuss future applications of machine learning in ortholog prediction. A WORMHOLE web tool has been developed and is available at http://wormhole.jax.org/.Author Summary: Identifying functionally equivalent proteins between species is a fundamental problem in comparative genetics. While orthology does not guarantee functional equivalence, the identification of orthologs—genes in different organisms that diverged by speciation—is often the first step in approaching this problem. Many methods are available for predicting orthologs. Recent approaches combine methods and filter candidate predictions by “voting”—assigning confidence to ortholog pairs based on the number of predictions by independent methods. Although voting is a heuristic, it maintains precision while increasing recall. Here we employ machine learning to optimize voting by learning which methods make better predictions and, in essence, giving those methods more votes. We present a new tool called WORMHOLE that predicts a strict subclass of orthologs called least diverged orthologs (LDOs) with a high level of functional specificity by learning features of orthology that are encoded in the patterns of predictions made by 17 constituent methods. We validate WORMHOLE using multiple measures of evolutionary divergence and functional relatedness, including community standards provided by the Quest for Orthologs consortium. WORMHOLE’s particular strength lies in predicting LDOs between distantly related species, where orthology is difficult to identify and is of critical importance for comparative biology.

Suggested Citation

  • George L Sutphin & J Matthew Mahoney & Keith Sheppard & David O Walton & Ron Korstanje, 2016. "WORMHOLE: Novel Least Diverged Ortholog Prediction through Machine Learning," PLOS Computational Biology, Public Library of Science, vol. 12(11), pages 1-35, November.
  • Handle: RePEc:plo:pcbi00:1005182
    DOI: 10.1371/journal.pcbi.1005182
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005182
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1005182&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1005182?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Adrian M Altenhoff & Romain A Studer & Marc Robinson-Rechavi & Christophe Dessimoz, 2012. "Resolving the Ortholog Conjecture: Orthologs Tend to Be Weakly, but Significantly, More Similar in Function than Paralogs," PLOS Computational Biology, Public Library of Science, vol. 8(5), pages 1-10, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Hope Dang & Raul Castro-Portuguez & Luis Espejo & Grant Backer & Samuel Freitas & Erica Spence & Jeremy Meyers & Karissa Shuck & Emily A. Gardea & Leah M. Chang & Jonah Balsa & Niall Thorns & Caroline, 2023. "On the benefits of the tryptophan metabolite 3-hydroxyanthranilic acid in Caenorhabditis elegans and mouse aging," Nature Communications, Nature, vol. 14(1), pages 1-13, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Nives Škunca & Matko Bošnjak & Anita Kriško & Panče Panov & Sašo Džeroski & Tomislav Šmuc & Fran Supek, 2013. "Phyletic Profiling with Cliques of Orthologs Is Enhanced by Signatures of Paralogy Relationships," PLOS Computational Biology, Public Library of Science, vol. 9(1), pages 1-14, January.
    2. Nadezda Kryuchkova-Mostacci & Marc Robinson-Rechavi, 2016. "Tissue-Specificity of Gene Expression Diverges Slowly between Orthologs, and Rapidly between Paralogs," PLOS Computational Biology, Public Library of Science, vol. 12(12), pages 1-13, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1005182. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.