IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v11y2020i1d10.1038_s41467-020-16958-3.html
   My bibliography  Save this article

Quantifying molecular bias in DNA data storage

Author

Listed:
  • Yuan-Jyue Chen

    (Microsoft Research)

  • Christopher N. Takahashi

    (University of Washington)

  • Lee Organick

    (University of Washington)

  • Callista Bee

    (University of Washington)

  • Siena Dumas Ang

    (Microsoft Research)

  • Patrick Weiss

    (Twist Bioscience)

  • Bill Peck

    (Twist Bioscience)

  • Georg Seelig

    (University of Washington
    University of Washington)

  • Luis Ceze

    (University of Washington)

  • Karin Strauss

    (Microsoft Research)

Abstract

DNA has recently emerged as an attractive medium for archival data storage. Recent work has demonstrated proof-of-principle prototype systems; however, very uneven (biased) sequencing coverage has been reported, which indicates inefficiencies in the storage process. Deviations from the average coverage in the sequence copy distribution can either cause wasteful provisioning in sequencing or excessive number of missing sequences. Here, we use millions of unique sequences from a DNA-based digital data archival system to study the oligonucleotide copy unevenness problem and show that the two paramount sources of bias are the synthesis and amplification (PCR) processes. Based on these findings, we develop a statistical model for each molecular process as well as the overall process. We further use our model to explore the trade-offs between synthesis bias, storage physical density, logical redundancy, and sequencing redundancy, providing insights for engineering efficient, robust DNA data storage systems.

Suggested Citation

  • Yuan-Jyue Chen & Christopher N. Takahashi & Lee Organick & Callista Bee & Siena Dumas Ang & Patrick Weiss & Bill Peck & Georg Seelig & Luis Ceze & Karin Strauss, 2020. "Quantifying molecular bias in DNA data storage," Nature Communications, Nature, vol. 11(1), pages 1-9, December.
  • Handle: RePEc:nat:natcom:v:11:y:2020:i:1:d:10.1038_s41467-020-16958-3
    DOI: 10.1038/s41467-020-16958-3
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-020-16958-3
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-020-16958-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Lifu Song & Feng Geng & Zi-Yi Gong & Xin Chen & Jijun Tang & Chunye Gong & Libang Zhou & Rui Xia & Ming-Zhe Han & Jing-Yi Xu & Bing-Zhi Li & Ying-Jin Yuan, 2022. "Robust data storage in DNA by de Bruijn graph-based de novo strand assembly," Nature Communications, Nature, vol. 13(1), pages 1-9, December.
    2. Afsaneh Sadremomtaz & Robert F. Glass & Jorge Eduardo Guerrero & Dennis R. LaJeunesse & Eric A. Josephs & Reza Zadegan, 2023. "Digital data storage on DNA tape using CRISPR base editors," Nature Communications, Nature, vol. 14(1), pages 1-10, December.
    3. Andreas L. Gimpel & Wendelin J. Stark & Reinhard Heckel & Robert N. Grass, 2023. "A digital twin for DNA data storage based on comprehensive quantification of errors and biases," Nature Communications, Nature, vol. 14(1), pages 1-12, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:11:y:2020:i:1:d:10.1038_s41467-020-16958-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.