IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1002957.html
   My bibliography  Save this article

Functional Knowledge Transfer for High-accuracy Prediction of Under-studied Biological Processes

Author

Listed:
  • Christopher Y Park
  • Aaron K Wong
  • Casey S Greene
  • Jessica Rowland
  • Yuanfang Guan
  • Lars A Bongo
  • Rebecca D Burdine
  • Olga G Troyanskaya

Abstract

A key challenge in genetics is identifying the functional roles of genes in pathways. Numerous functional genomics techniques (e.g. machine learning) that predict protein function have been developed to address this question. These methods generally build from existing annotations of genes to pathways and thus are often unable to identify additional genes participating in processes that are not already well studied. Many of these processes are well studied in some organism, but not necessarily in an investigator's organism of interest. Sequence-based search methods (e.g. BLAST) have been used to transfer such annotation information between organisms. We demonstrate that functional genomics can complement traditional sequence similarity to improve the transfer of gene annotations between organisms. Our method transfers annotations only when functionally appropriate as determined by genomic data and can be used with any prediction algorithm to combine transferred gene function knowledge with organism-specific high-throughput data to enable accurate function prediction. We show that diverse state-of-art machine learning algorithms leveraging functional knowledge transfer (FKT) dramatically improve their accuracy in predicting gene-pathway membership, particularly for processes with little experimental knowledge in an organism. We also show that our method compares favorably to annotation transfer by sequence similarity. Next, we deploy FKT with state-of-the-art SVM classifier to predict novel genes to 11,000 biological processes across six diverse organisms and expand the coverage of accurate function predictions to processes that are often ignored because of a dearth of annotated genes in an organism. Finally, we perform in vivo experimental investigation in Danio rerio and confirm the regulatory role of our top predicted novel gene, wnt5b, in leftward cell migration during heart development. FKT is immediately applicable to many bioinformatics techniques and will help biologists systematically integrate prior knowledge from diverse systems to direct targeted experiments in their organism of study. Author Summary: Due to technical and ethical challenges many human diseases or biological processes are studied in model organisms. Discoveries in these organisms are then transferred back to human or other model organisms. Traditional methods for transferring novel gene function annotations have relied on finding genes with high sequence similarity believed to share evolutionary ancestry. However, sequence similarity does not guarantee a shared functional role in molecular pathways. In this study, we show that functional genomics can complement traditional sequence similarity measures to improve the transfer of gene annotations between organisms. We coupled our knowledge transfer method with current state-of-the-art machine learning algorithms and predicted gene function for 11,000 biological processes across six organisms. We experimentally validated our prediction of wnt5b's involvement in the determination of left-right heart asymmetry in zebrafish. Our results show that functional knowledge transfer can improve the coverage and accuracy of machine learning methods used for gene function prediction in a diverse set of organisms. Such an approach can be applied to additional organisms, and will be especially beneficial in organisms that have high-throughput genomic data with sparse annotations.

Suggested Citation

  • Christopher Y Park & Aaron K Wong & Casey S Greene & Jessica Rowland & Yuanfang Guan & Lars A Bongo & Rebecca D Burdine & Olga G Troyanskaya, 2013. "Functional Knowledge Transfer for High-accuracy Prediction of Under-studied Biological Processes," PLOS Computational Biology, Public Library of Science, vol. 9(3), pages 1-14, March.
  • Handle: RePEc:plo:pcbi00:1002957
    DOI: 10.1371/journal.pcbi.1002957
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002957
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1002957&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1002957?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Edward M. Marcotte & Matteo Pellegrini & Michael J. Thompson & Todd O. Yeates & David Eisenberg, 1999. "A combined algorithm for genome-wide prediction of protein function," Nature, Nature, vol. 402(6757), pages 83-86, November.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Liqi Li & Xiang Cui & Sanjiu Yu & Yuan Zhang & Zhong Luo & Hua Yang & Yue Zhou & Xiaoqi Zheng, 2014. "PSSP-RFE: Accurate Prediction of Protein Structural Class by Recursive Feature Extraction from PSI-BLAST Profile, Physical-Chemical Property and Functional Annotations," PLOS ONE, Public Library of Science, vol. 9(3), pages 1-10, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Han Yan & Kavitha Venkatesan & John E Beaver & Niels Klitgord & Muhammed A Yildirim & Tong Hao & David E Hill & Michael E Cusick & Norbert Perrimon & Frederick P Roth & Marc Vidal, 2010. "A Genome-Wide Gene Function Prediction Resource for Drosophila melanogaster," PLOS ONE, Public Library of Science, vol. 5(8), pages 1-11, August.
    2. Antigoni Elefsinioti & Marit Ackermann & Andreas Beyer, 2009. "Accounting for Redundancy when Integrating Gene Interaction Databases," PLOS ONE, Public Library of Science, vol. 4(10), pages 1-9, October.
    3. Heiko Müller & Francesco Mancuso, 2008. "Identification and Analysis of Co-Occurrence Networks with NetCutter," PLOS ONE, Public Library of Science, vol. 3(9), pages 1-16, September.
    4. Sara Mostafavi & Anna Goldenberg & Quaid Morris, 2012. "Labeling Nodes Using Three Degrees of Propagation," PLOS ONE, Public Library of Science, vol. 7(12), pages 1-10, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1002957. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.