IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1008050.html
   My bibliography  Save this article

Cross-species regulatory sequence activity prediction

Author

Listed:
  • David R Kelley

Abstract

Machine learning algorithms trained to predict the regulatory activity of nucleic acid sequences have revealed principles of gene regulation and guided genetic variation analysis. While the human genome has been extensively annotated and studied, model organisms have been less explored. Model organism genomes offer both additional training sequences and unique annotations describing tissue and cell states unavailable in humans. Here, we develop a strategy to train deep convolutional neural networks simultaneously on multiple genomes and apply it to learn sequence predictors for large compendia of human and mouse data. Training on both genomes improves gene expression prediction accuracy on held out and variant sequences. We further demonstrate a novel and powerful approach to apply mouse regulatory models to analyze human genetic variants associated with molecular phenotypes and disease. Together these techniques unleash thousands of non-human epigenetic and transcriptional profiles toward more effective investigation of how gene regulation affects human disease.Author summary: Human population genetic studies have highlighted thousands of genomic sites that correlate with traits and diseases that do not modify gene sequences directly, but instead modify where and when those genes are expressed. To better understand how these sites influence traits and diseases, and consider their relevance for drug development, we need better models for how DNA sequences determine gene expression. Recently, machine learning algorithms based on deep artificial neural networks have proven to be promising tools toward this end. In this work, we improve upon the state of the art model accuracy by combining training data from both humans and mice. Using these models, we can predict the effect of a genetic variant on gene expression in any tissue or cell type with available data. We further demonstrate that predictions for human variants derived from mouse training datasets are highly informative and offer unique insight into the genetic basis of gene expression and disease.

Suggested Citation

  • David R Kelley, 2020. "Cross-species regulatory sequence activity prediction," PLOS Computational Biology, Public Library of Science, vol. 16(7), pages 1-27, July.
  • Handle: RePEc:plo:pcbi00:1008050
    DOI: 10.1371/journal.pcbi.1008050
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008050
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1008050&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1008050?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Mahmoud Ghandi & Dongwon Lee & Morteza Mohammad-Noori & Michael A Beer, 2014. "Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features," PLOS Computational Biology, Public Library of Science, vol. 10(7), pages 1-15, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Alan E. Murphy & William Beardall & Marek Rei & Mike Phuycharoen & Nathan G. Skene, 2024. "Predicting cell type-specific epigenomic profiles accounting for distal genetic effects," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
    2. Arthur S. Lee & Lauren J. Ayers & Michael Kosicki & Wai-Man Chan & Lydia N. Fozo & Brandon M. Pratt & Thomas E. Collins & Boxun Zhao & Matthew F. Rose & Alba Sanchis-Juan & Jack M. Fu & Isaac Wong & X, 2024. "A cell type-aware framework for nominating non-coding variants in Mendelian regulatory disorders," Nature Communications, Nature, vol. 15(1), pages 1-26, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jin Woo Oh & Michael A. Beer, 2024. "Gapped-kmer sequence modeling robustly identifies regulatory vocabularies and distal enhancers conserved between evolutionarily distant mammals," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    2. Seong Kyu Han & Michelle T. McNulty & Christopher J. Benway & Pei Wen & Anya Greenberg & Ana C. Onuchic-Whitford & Dongkeun Jang & Jason Flannick & Noël P. Burtt & Parker C. Wilson & Benjamin D. Humph, 2023. "Mapping genomic regulation of kidney disease and traits through high-resolution and interpretable eQTLs," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    3. Feng Jiang & Shou-Ye Hu & Wen Tian & Nai-Ning Wang & Ning Yang & Shan-Shan Dong & Hui-Miao Song & Da-Jin Zhang & Hui-Wu Gao & Chen Wang & Hao Wu & Chang-Yi He & Dong-Li Zhu & Xiao-Feng Chen & Yan Guo , 2024. "A landscape of gene expression regulation for synovium in arthritis," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    4. Manu Setty & Christina S Leslie, 2015. "SeqGL Identifies Context-Dependent Binding Signals in Genome-Wide Regulatory Element Maps," PLOS Computational Biology, Public Library of Science, vol. 11(5), pages 1-21, May.
    5. Koh Onimaru & Osamu Nishimura & Shigehiro Kuraku, 2020. "Predicting gene regulatory regions with a convolutional neural network for processing double-strand genome sequence information," PLOS ONE, Public Library of Science, vol. 15(7), pages 1-17, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1008050. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.