IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1008382.html
   My bibliography  Save this article

Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development

Author

Listed:
  • Jian Zhou
  • Ignacio E Schor
  • Victoria Yao
  • Chandra L Theesfeld
  • Raquel Marco-Ferreres
  • Alicja Tadych
  • Eileen E M Furlong
  • Olga G Troyanskaya

Abstract

Comprehensive information on the timing and location of gene expression is fundamental to our understanding of embryonic development and tissue formation. While high-throughput in situ hybridization projects provide invaluable information about developmental gene expression patterns for model organisms like Drosophila, the output of these experiments is primarily qualitative, and a high proportion of protein coding genes and most non-coding genes lack any annotation. Accurate data-centric predictions of spatio-temporal gene expression will therefore complement current in situ hybridization efforts. Here, we applied a machine learning approach by training models on all public gene expression and chromatin data, even from whole-organism experiments, to provide genome-wide, quantitative spatio-temporal predictions for all genes. We developed structured in silico nano-dissection, a computational approach that predicts gene expression in >200 tissue-developmental stages. The algorithm integrates expression signals from a compendium of 6,378 genome-wide expression and chromatin profiling experiments in a cell lineage-aware fashion. We systematically evaluated our performance via cross-validation and experimentally confirmed 22 new predictions for four different embryonic tissues. The model also predicts complex, multi-tissue expression and developmental regulation with high accuracy. We further show the potential of applying these genome-wide predictions to extract tissue specificity signals from non-tissue-dissected experiments, and to prioritize tissues and stages for disease modeling. This resource, together with the exploratory tools are freely available at our webserver http://find.princeton.edu, which provides a valuable tool for a range of applications, from predicting spatio-temporal expression patterns to recognizing tissue signatures from differential gene expression profiles.Author summary: When and where a gene is expressed is fundamental information for understanding embryonic development. Current knowledge for such expression patterns is typically far from complete. Even for the long-standing model organism, Drosophila melanogaster, with large-scale in situ projects that have provided invaluable expression information for many genes, 40% of the genes still lack spatio-temporally resolved expression information. Such data is complemented by transcriptome datasets such as microarray and RNA-seq, which have whole-genome coverage and measure expression levels with greater dynamic range, but they typically lack precise spatio-temporal resolution. To bridge this gap, we developed a machine learning approach that combines the spatio-temporal resolution of in situ data with the accurate quantification and whole-genome coverage of genomic experiments, integrating information from 6,378 expression and chromatin profiling data sets. With this new approach, we present a genome-wide resource of spatio-temporal gene expression predictions for over 200 tissue-developmental stages during Drosophila embryogenesis. This resource is experimentally validated to have high-quality predictions, can guide the discovery of new tissue-specific genes, and provides a new tool to perform genome-wide analyses of spatio-temporal specificity.

Suggested Citation

  • Jian Zhou & Ignacio E Schor & Victoria Yao & Chandra L Theesfeld & Raquel Marco-Ferreres & Alicja Tadych & Eileen E M Furlong & Olga G Troyanskaya, 2019. "Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development," PLOS Genetics, Public Library of Science, vol. 15(9), pages 1-20, September.
  • Handle: RePEc:plo:pgen00:1008382
    DOI: 10.1371/journal.pgen.1008382
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008382
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1008382&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1008382?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1008382. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.