IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/0010045.html
   My bibliography  Save this article

Protein Molecular Function Prediction by Bayesian Phylogenomics

Author

Listed:
  • Barbara E Engelhardt
  • Michael I Jordan
  • Kathryn E Muratore
  • Steven E Brenner

Abstract

We present a statistical graphical model to infer specific molecular function for unannotated protein sequences using homology. Based on phylogenomic principles, SIFTER (Statistical Inference of Function Through Evolutionary Relationships) accurately predicts molecular function for members of a protein family given a reconciled phylogeny and available function annotations, even when the data are sparse or noisy. Our method produced specific and consistent molecular function predictions across 100 Pfam families in comparison to the Gene Ontology annotation database, BLAST, GOtcha, and Orthostrapper. We performed a more detailed exploration of functional predictions on the adenosine-5′-monophosphate/adenosine deaminase family and the lactate/malate dehydrogenase family, in the former case comparing the predictions against a gold standard set of published functional characterizations. Given function annotations for 3% of the proteins in the deaminase family, SIFTER achieves 96% accuracy in predicting molecular function for experimentally characterized proteins as reported in the literature. The accuracy of SIFTER on this dataset is a significant improvement over other currently available methods such as BLAST (75%), GeneQuiz (64%), GOtcha (89%), and Orthostrapper (11%). We also experimentally characterized the adenosine deaminase from Plasmodium falciparum, confirming SIFTER's prediction. The results illustrate the predictive power of exploiting a statistical model of function evolution in phylogenomic problems. A software implementation of SIFTER is available from the authors.: New genome sequences continue to be published at a prodigious rate. However, unannotated sequences are of limited use to biologists. To computationally annotate a hypothetical protein for molecular function, researchers generally attempt to carry out some form of information transfer from evolutionarily related proteins. Such transfer is most successfully achieved within the context of phylogenetic relationships, exploiting the comprehensive knowledge that is available regarding molecular evolution within a given protein family. This general approach to molecular function annotation is known as phylogenomics, and it is the best method currently available for providing high-quality annotations. A drawback of phylogenomics, however, is that it is a time-consuming manual process requiring expert knowledge. In the current paper, the authors have developed a statistical approach—referred to as SIFTER (Statistical Inference of Function Through Evolutionary Relationships)—that allows phylogenomic analyses to be carried out automatically.

Suggested Citation

  • Barbara E Engelhardt & Michael I Jordan & Kathryn E Muratore & Steven E Brenner, 2005. "Protein Molecular Function Prediction by Bayesian Phylogenomics," PLOS Computational Biology, Public Library of Science, vol. 1(5), pages 1-1, October.
  • Handle: RePEc:plo:pcbi00:0010045
    DOI: 10.1371/journal.pcbi.0010045
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.0010045
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.0010045&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.0010045?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Michael J. Stanhope & Andrei Lupas & Michael J. Italia & Kristin K. Koretke & Craig Volker & James R. Brown, 2001. "Phylogenetic analyses do not support horizontal gene transfers from bacteria to vertebrates," Nature, Nature, vol. 411(6840), pages 940-944, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Duncan P Brown & Nandini Krishnamurthy & Kimmen Sjölander, 2007. "Automated Protein Subfamily Identification and Classification," PLOS Computational Biology, Public Library of Science, vol. 3(8), pages 1-13, August.
    2. Jianzhu Ma & Sheng Wang & Zhiyong Wang & Jinbo Xu, 2014. "MRFalign: Protein Homology Detection through Alignment of Markov Random Fields," PLOS Computational Biology, Public Library of Science, vol. 10(3), pages 1-12, March.
    3. Nils Weinhold & Oliver Sander & Francisco S Domingues & Thomas Lengauer & Ingolf Sommer, 2008. "Local Function Conservation in Sequence and Structure Space," PLOS Computational Biology, Public Library of Science, vol. 4(7), pages 1-13, July.
    4. Adrian Schröder & Johannes Eichner & Jochen Supper & Jonas Eichner & Dierk Wanke & Carsten Henneges & Andreas Zell, 2010. "Predicting DNA-Binding Specificities of Eukaryotic Transcription Factors," PLOS ONE, Public Library of Science, vol. 5(11), pages 1-15, November.
    5. David K Crockett & Stephen R Piccolo & Perry G Ridge & Rebecca L Margraf & Elaine Lyon & Marc S Williams & Joyce A Mitchell, 2011. "Predicting Phenotypic Severity of Uncertain Gene Variants in the RET Proto-Oncogene," PLOS ONE, Public Library of Science, vol. 6(3), pages 1-7, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.

      More about this item

      Statistics

      Access and download statistics

      Corrections

      All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:0010045. See general information about how to correct material in RePEc.

      If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

      If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

      If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

      For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

      Please note that corrections may take a couple of weeks to filter through the various RePEc services.

      IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.