IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v11y2020i1d10.1038_s41467-020-17222-4.html
   My bibliography  Save this article

Large-scale DNA-based phenotypic recording and deep learning enable highly accurate sequence-function mapping

Author

Listed:
  • Simon Höllerer

    (ETH Zurich)

  • Laetitia Papaxanthos

    (ETH Zurich
    Swiss Institute of Bioinformatics)

  • Anja Cathrin Gumpinger

    (ETH Zurich
    Swiss Institute of Bioinformatics)

  • Katrin Fischer

    (ETH Zurich)

  • Christian Beisel

    (ETH Zurich)

  • Karsten Borgwardt

    (ETH Zurich
    Swiss Institute of Bioinformatics)

  • Yaakov Benenson

    (ETH Zurich)

  • Markus Jeschek

    (ETH Zurich)

Abstract

Predicting effects of gene regulatory elements (GREs) is a longstanding challenge in biology. Machine learning may address this, but requires large datasets linking GREs to their quantitative function. However, experimental methods to generate such datasets are either application-specific or technically complex and error-prone. Here, we introduce DNA-based phenotypic recording as a widely applicable, practicable approach to generate large-scale sequence-function datasets. We use a site-specific recombinase to directly record a GRE’s effect in DNA, enabling readout of both sequence and quantitative function for extremely large GRE-sets via next-generation sequencing. We record translation kinetics of over 300,000 bacterial ribosome binding sites (RBSs) in >2.7 million sequence-function pairs in a single experiment. Further, we introduce a deep learning approach employing ensembling and uncertainty modelling that predicts RBS function with high accuracy, outperforming state-of-the-art methods. DNA-based phenotypic recording combined with deep learning represents a major advance in our ability to predict function from genetic sequence.

Suggested Citation

  • Simon Höllerer & Laetitia Papaxanthos & Anja Cathrin Gumpinger & Katrin Fischer & Christian Beisel & Karsten Borgwardt & Yaakov Benenson & Markus Jeschek, 2020. "Large-scale DNA-based phenotypic recording and deep learning enable highly accurate sequence-function mapping," Nature Communications, Nature, vol. 11(1), pages 1-15, December.
  • Handle: RePEc:nat:natcom:v:11:y:2020:i:1:d:10.1038_s41467-020-17222-4
    DOI: 10.1038/s41467-020-17222-4
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-020-17222-4
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-020-17222-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Eva Yus & Jae-Seong Yang & Adrià Sogues & Luis Serrano, 2017. "A reporter system coupled with high-throughput sequencing unveils key bacterial transcription and translation determinants," Nature Communications, Nature, vol. 8(1), pages 1-12, December.
    2. Markus Jeschek & Daniel Gerngross & Sven Panke, 2016. "Rationally reduced libraries for combinatorial pathway optimization minimizing experimental effort," Nature Communications, Nature, vol. 7(1), pages 1-10, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Charlotte Cautereels & Jolien Smets & Peter Bircham & Dries De Ruysscher & Anna Zimmermann & Peter De Rijk & Jan Steensels & Anton Gorkovskiy & Joleen Masschelein & Kevin J. Verstrepen, 2024. "Combinatorial optimization of gene expression through recombinase-mediated promoter and terminator shuffling in yeast," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    2. Evangelos-Marios Nikolados & Arin Wongprommoon & Oisin Mac Aodha & Guillaume Cambray & Diego A. Oyarzún, 2022. "Accuracy and data efficiency in deep learning models of protein expression," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    3. Alicia Broto & Erika Gaspari & Samuel Miravet-Verde & Vitor A. P. Martins Santos & Mark Isalan, 2022. "A genetic toolkit and gene switches to limit Mycoplasma growth for biosafety applications," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    4. Samuel Miravet-Verde & Rocco Mazzolini & Carolina Segura-Morales & Alicia Broto & Maria Lluch-Senar & Luis Serrano, 2024. "ProTInSeq: transposon insertion tracking by ultra-deep DNA sequencing to identify translated large and small ORFs," Nature Communications, Nature, vol. 15(1), pages 1-17, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:11:y:2020:i:1:d:10.1038_s41467-020-17222-4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.