IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1000698.html
   My bibliography  Save this article

Systematic Planning of Genome-Scale Experiments in Poorly Studied Species

Author

Listed:
  • Yuanfang Guan
  • Maitreya Dunham
  • Amy Caudy
  • Olga Troyanskaya

Abstract

Genome-scale datasets have been used extensively in model organisms to screen for specific candidates or to predict functions for uncharacterized genes. However, despite the availability of extensive knowledge in model organisms, the planning of genome-scale experiments in poorly studied species is still based on the intuition of experts or heuristic trials. We propose that computational and systematic approaches can be applied to drive the experiment planning process in poorly studied species based on available data and knowledge in closely related model organisms. In this paper, we suggest a computational strategy for recommending genome-scale experiments based on their capability to interrogate diverse biological processes to enable protein function assignment. To this end, we use the data-rich functional genomics compendium of the model organism to quantify the accuracy of each dataset in predicting each specific biological process and the overlap in such coverage between different datasets. Our approach uses an optimized combination of these quantifications to recommend an ordered list of experiments for accurately annotating most proteins in the poorly studied related organisms to most biological processes, as well as a set of experiments that target each specific biological process. The effectiveness of this experiment- planning system is demonstrated for two related yeast species: the model organism Saccharomyces cerevisiae and the comparatively poorly studied Saccharomyces bayanus. Our system recommended a set of S. bayanus experiments based on an S. cerevisiae microarray data compendium. In silico evaluations estimate that less than 10% of the experiments could achieve similar functional coverage to the whole microarray compendium. This estimation was confirmed by performing the recommended experiments in S. bayanus, therefore significantly reducing the labor devoted to characterize the poorly studied genome. This experiment-planning framework could readily be adapted to the design of other types of large-scale experiments as well as other groups of organisms.Author Summary: Microarray expression experiments allow fast functional profiling of an organism's entire genome and significant efforts are devoted to analyzing the resulting data. Available genome sequences are also increasing quickly. However, it is unexplored how to use available functional genomics data to direct large-scale experiments in newly sequenced but poorly studied species. In this paper, we propose a strategy to systematically plan experimental treatments in the poorly studied species based on their model organism relatives. We consider both the accuracy of the datasets in capturing different biological processes and the redundancy between datasets. Quantifying the above information allows us to recommend a list of experimental treatments. We demonstrate the efficacy of this approach by designing, performing and evaluating S. bayanus microarray experiments using an available S. cerevisiae data repository. We show that this systematic planning process could reduce the labor in doing microarray experiments by 10 fold and achieve similar functional coverage.

Suggested Citation

  • Yuanfang Guan & Maitreya Dunham & Amy Caudy & Olga Troyanskaya, 2010. "Systematic Planning of Genome-Scale Experiments in Poorly Studied Species," PLOS Computational Biology, Public Library of Science, vol. 6(3), pages 1-13, March.
  • Handle: RePEc:plo:pcbi00:1000698
    DOI: 10.1371/journal.pcbi.1000698
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000698
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1000698&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1000698?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Manolis Kellis & Nick Patterson & Matthew Endrizzi & Bruce Birren & Eric S. Lander, 2003. "Sequencing and comparison of yeast species to identify genes and regulatory elements," Nature, Nature, vol. 423(6937), pages 241-254, May.
    2. Vishwanath R. Iyer & Christine E. Horak & Charles S. Scafe & David Botstein & Michael Snyder & Patrick O. Brown, 2001. "Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF," Nature, Nature, vol. 409(6819), pages 533-538, January.
    3. Sourav Bandyopadhyay & Ryan Kelley & Nevan J Krogan & Trey Ideker, 2008. "Functional Maps of Protein Complexes from Quantitative Genetic Interaction Data," PLOS Computational Biology, Public Library of Science, vol. 4(4), pages 1-8, April.
    4. Morik, Katharina & Brockhausen, Peter & Joachims, Thorsten, 1999. "Combining statistical learning with a knowledge-based approach: A case study in intensive care monitoring," Technical Reports 1999,24, Technische Universität Dortmund, Sonderforschungsbereich 475: Komplexitätsreduktion in multivariaten Datenstrukturen.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Eilon Sharon & Shai Lubliner & Eran Segal, 2008. "A Feature-Based Approach to Modeling Protein–DNA Interactions," PLOS Computational Biology, Public Library of Science, vol. 4(8), pages 1-17, August.
    2. Naomi Habib & Tommy Kaplan & Hanah Margalit & Nir Friedman, 2008. "A Novel Bayesian DNA Motif Comparison Method for Clustering and Retrieval," PLOS Computational Biology, Public Library of Science, vol. 4(2), pages 1-16, February.
    3. Tao Song & Hong Gu, 2014. "Discriminative Motif Discovery via Simulated Evolution and Random Under-Sampling," PLOS ONE, Public Library of Science, vol. 9(2), pages 1-10, February.
    4. John E Reid & Lorenz Wernisch, 2014. "STEME: A Robust, Accurate Motif Finder for Large Data Sets," PLOS ONE, Public Library of Science, vol. 9(3), pages 1-11, March.
    5. Xinyi Liu & Bin Liu & Zhimin Huang & Ting Shi & Yingyi Chen & Jian Zhang, 2012. "SPPS: A Sequence-Based Method for Predicting Probability of Protein-Protein Interaction Partners," PLOS ONE, Public Library of Science, vol. 7(1), pages 1-6, January.
    6. Lisa M Breckels & Sean B Holden & David Wojnar & Claire M Mulvey & Andy Christoforou & Arnoud Groen & Matthew W B Trotter & Oliver Kohlbacher & Kathryn S Lilley & Laurent Gatto, 2016. "Learning from Heterogeneous Data Sources: An Application in Spatial Proteomics," PLOS Computational Biology, Public Library of Science, vol. 12(5), pages 1-26, May.
    7. Alexander Kawrykow & Gary Roumanis & Alfred Kam & Daniel Kwak & Clarence Leung & Chu Wu & Eleyine Zarour & Phylo players & Luis Sarmenta & Mathieu Blanchette & Jérôme Waldispühl, 2012. "Phylo: A Citizen Science Approach for Improving Multiple Sequence Alignment," PLOS ONE, Public Library of Science, vol. 7(3), pages 1-9, March.
    8. G. Saharidis & I. Androulakis & M. Ierapetritou, 2011. "Model building using bi-level optimization," Journal of Global Optimization, Springer, vol. 49(1), pages 49-67, January.
    9. Alessandro L. V. Coradini & Christopher Ne Ville & Zachary A. Krieger & Joshua Roemer & Cara Hull & Shawn Yang & Daniel T. Lusk & Ian M. Ehrenreich, 2023. "Building synthetic chromosomes from natural DNA," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    10. Emily N Manderson & Mohan Malleshaiah & Stephen W Michnick, 2008. "A Novel Genetic Screen Implicates Elm1 in the Inactivation of the Yeast Transcription Factor SBF," PLOS ONE, Public Library of Science, vol. 3(1), pages 1-9, January.
    11. Valerie Storms & Marleen Claeys & Aminael Sanchez & Bart De Moor & Annemieke Verstuyf & Kathleen Marchal, 2010. "The Effect of Orthology and Coregulation on Detecting Regulatory Motifs," PLOS ONE, Public Library of Science, vol. 5(2), pages 1-11, February.
    12. Robert K Bradley & Adam Roberts & Michael Smoot & Sudeep Juvekar & Jaeyoung Do & Colin Dewey & Ian Holmes & Lior Pachter, 2009. "Fast Statistical Alignment," PLOS Computational Biology, Public Library of Science, vol. 5(5), pages 1-15, May.
    13. Rahul Siddharthan & Eric D Siggia & Erik van Nimwegen, 2005. "PhyloGibbs: A Gibbs Sampling Motif Finder That Incorporates Phylogeny," PLOS Computational Biology, Public Library of Science, vol. 1(7), pages 1-23, December.
    14. Harri Lähdesmäki & Alistair G Rust & Ilya Shmulevich, 2008. "Probabilistic Inference of Transcription Factor Binding from Multiple Data Sources," PLOS ONE, Public Library of Science, vol. 3(3), pages 1-24, March.
    15. Cheemeng Tan & Robert Phillip Smith & Ming-Chi Tsai & Russell Schwartz & Lingchong You, 2014. "Phenotypic Signatures Arising from Unbalanced Bacterial Growth," PLOS Computational Biology, Public Library of Science, vol. 10(8), pages 1-10, August.
    16. Leelavati Narlikar & Raluca Gordân & Alexander J Hartemink, 2007. "A Nucleosome-Guided Map of Transcription Factor Binding Sites in Yeast," PLOS Computational Biology, Public Library of Science, vol. 3(11), pages 1-10, November.
    17. J Roman Arguello & Carolina Sellanes & Yann Ru Lou & Robert A Raguso, 2013. "Can Yeast (S. cerevisiae) Metabolic Volatiles Provide Polymorphic Signaling?," PLOS ONE, Public Library of Science, vol. 8(8), pages 1-12, August.
    18. Fabio Pardi & Nick Goldman, 2005. "Species Choice for Comparative Genomics: Being Greedy Works," PLOS Genetics, Public Library of Science, vol. 1(6), pages 1-1, December.
    19. Krishna B. S. Swamy & Hsin-Yi Lee & Carmina Ladra & Chien-Fu Jeff Liu & Jung-Chi Chao & Yi-Yun Chen & Jun-Yi Leu, 2022. "Proteotoxicity caused by perturbed protein complexes underlies hybrid incompatibility in yeast," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    20. Ron X Yu & Jie Liu & Nick True & Wei Wang, 2008. "Identification of Direct Target Genes Using Joint Sequence and Expression Likelihood with Application to DAF-16," PLOS ONE, Public Library of Science, vol. 3(3), pages 1-12, March.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1000698. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.