IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v14y2023i1d10.1038_s41467-023-41729-1.html
   My bibliography  Save this article

A digital twin for DNA data storage based on comprehensive quantification of errors and biases

Author

Listed:
  • Andreas L. Gimpel

    (ETH Zürich)

  • Wendelin J. Stark

    (ETH Zürich)

  • Reinhard Heckel

    (Technical University of Munich)

  • Robert N. Grass

    (ETH Zürich)

Abstract

Archiving data in synthetic DNA offers unprecedented storage density and longevity. Handling and storage introduce errors and biases into DNA-based storage systems, necessitating the use of Error Correction Coding (ECC) which comes at the cost of added redundancy. However, insufficient data on these errors and biases, as well as a lack of modeling tools, limit data-driven ECC development and experimental design. In this study, we present a comprehensive characterisation of the error sources and biases present in the most common DNA data storage workflows, including commercial DNA synthesis, PCR, decay by accelerated aging, and sequencing-by-synthesis. Using the data from 40 sequencing experiments, we build a digital twin of the DNA data storage process, capable of simulating state-of-the-art workflows and reproducing their experimental results. We showcase the digital twin’s ability to replace experiments and rationalize the design of redundancy in two case studies, highlighting opportunities for tangible cost savings and data-driven ECC development.

Suggested Citation

  • Andreas L. Gimpel & Wendelin J. Stark & Reinhard Heckel & Robert N. Grass, 2023. "A digital twin for DNA data storage based on comprehensive quantification of errors and biases," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
  • Handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-41729-1
    DOI: 10.1038/s41467-023-41729-1
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-023-41729-1
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-023-41729-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Randolph Lopez & Yuan-Jyue Chen & Siena Dumas Ang & Sergey Yekhanin & Konstantin Makarychev & Miklos Z Racz & Georg Seelig & Karin Strauss & Luis Ceze, 2019. "DNA assembly for nanopore data storage readout," Nature Communications, Nature, vol. 10(1), pages 1-9, December.
    2. Philipp L. Antkowiak & Jory Lietard & Mohammad Zalbagi Darestani & Mark M. Somoza & Wendelin J. Stark & Reinhard Heckel & Robert N. Grass, 2020. "Low cost DNA data storage using photolithographic synthesis and advanced information reconstruction and error correction," Nature Communications, Nature, vol. 11(1), pages 1-10, December.
    3. Yuan-Jyue Chen & Christopher N. Takahashi & Lee Organick & Callista Bee & Siena Dumas Ang & Patrick Weiss & Bill Peck & Georg Seelig & Luis Ceze & Karin Strauss, 2020. "Quantifying molecular bias in DNA data storage," Nature Communications, Nature, vol. 11(1), pages 1-9, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lifu Song & Feng Geng & Zi-Yi Gong & Xin Chen & Jijun Tang & Chunye Gong & Libang Zhou & Rui Xia & Ming-Zhe Han & Jing-Yi Xu & Bing-Zhi Li & Ying-Jin Yuan, 2022. "Robust data storage in DNA by de Bruijn graph-based de novo strand assembly," Nature Communications, Nature, vol. 13(1), pages 1-9, December.
    2. Cheng Kai Lim & Jing Wui Yeoh & Aurelius Andrew Kunartama & Wen Shan Yew & Chueh Loo Poh, 2023. "A biological camera that captures and stores images directly into DNA," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    3. Marius Welzel & Peter Michael Schwarz & Hannah F. Löchel & Tolganay Kabdullayeva & Sandra Clemens & Anke Becker & Bernd Freisleben & Dominik Heider, 2023. "DNA-Aeon provides flexible arithmetic coding for constraint adherence and error correction in DNA storage," Nature Communications, Nature, vol. 14(1), pages 1-10, December.
    4. Afsaneh Sadremomtaz & Robert F. Glass & Jorge Eduardo Guerrero & Dennis R. LaJeunesse & Eric A. Josephs & Reza Zadegan, 2023. "Digital data storage on DNA tape using CRISPR base editors," Nature Communications, Nature, vol. 14(1), pages 1-10, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-41729-1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.