IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v13y2022i1d10.1038_s41467-022-33046-w.html
   My bibliography  Save this article

Robust data storage in DNA by de Bruijn graph-based de novo strand assembly

Author

Listed:
  • Lifu Song

    (Tianjin University
    Tianjin University)

  • Feng Geng

    (Binzhou Medical University)

  • Zi-Yi Gong

    (Tianjin University
    Tianjin University)

  • Xin Chen

    (Tianjin University)

  • Jijun Tang

    (Tianjin University
    Chinese Academy of Sciences)

  • Chunye Gong

    (National SuperComputer Center in Tianjin)

  • Libang Zhou

    (Nanjing Agricultural University)

  • Rui Xia

    (National SuperComputer Center in Tianjin)

  • Ming-Zhe Han

    (Tianjin University
    Tianjin University)

  • Jing-Yi Xu

    (Tianjin University
    Tianjin University)

  • Bing-Zhi Li

    (Tianjin University
    Tianjin University)

  • Ying-Jin Yuan

    (Tianjin University
    Tianjin University)

Abstract

DNA data storage is a rapidly developing technology with great potential due to its high density, long-term durability, and low maintenance cost. The major technical challenges include various errors, such as strand breaks, rearrangements, and indels that frequently arise during DNA synthesis, amplification, sequencing, and preservation. In this study, a de novo strand assembly algorithm (DBGPS) is developed using de Bruijn graph and greedy path search to meet these challenges. DBGPS shows substantial advantages in handling DNA breaks, rearrangements, and indels. The robustness of DBGPS is demonstrated by accelerated aging, multiple independent data retrievals, deep error-prone PCR, and large-scale simulations. Remarkably, 6.8 MB of data is accurately recovered from a severely corrupted sample that has been treated at 70 °C for 70 days. With DBGPS, we are able to achieve a logical density of 1.30 bits/cycle and a physical density of 295 PB/g.

Suggested Citation

  • Lifu Song & Feng Geng & Zi-Yi Gong & Xin Chen & Jijun Tang & Chunye Gong & Libang Zhou & Rui Xia & Ming-Zhe Han & Jing-Yi Xu & Bing-Zhi Li & Ying-Jin Yuan, 2022. "Robust data storage in DNA by de Bruijn graph-based de novo strand assembly," Nature Communications, Nature, vol. 13(1), pages 1-9, December.
  • Handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-33046-w
    DOI: 10.1038/s41467-022-33046-w
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-022-33046-w
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-022-33046-w?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Philipp L. Antkowiak & Jory Lietard & Mohammad Zalbagi Darestani & Mark M. Somoza & Wendelin J. Stark & Reinhard Heckel & Robert N. Grass, 2020. "Low cost DNA data storage using photolithographic synthesis and advanced information reconstruction and error correction," Nature Communications, Nature, vol. 11(1), pages 1-10, December.
    2. Kevin N. Lin & Kevin Volkel & James M. Tuck & Albert J. Keung, 2020. "Dynamic and scalable DNA-based information storage," Nature Communications, Nature, vol. 11(1), pages 1-12, December.
    3. Tom van der Valk & Patrícia Pečnerová & David Díez-del-Molino & Anders Bergström & Jonas Oppenheimer & Stefanie Hartmann & Georgios Xenikoudakis & Jessica A. Thomas & Marianne Dehasque & Ekin Sağlıcan, 2021. "Million-year-old DNA sheds light on the genomic history of mammoths," Nature, Nature, vol. 591(7849), pages 265-269, March.
    4. Howon Lee & Daniel J. Wiegand & Kettner Griswold & Sukanya Punthambaker & Honggu Chun & Richie E. Kohman & George M. Church, 2020. "Photon-directed multiplexed enzymatic DNA synthesis for molecular digital data storage," Nature Communications, Nature, vol. 11(1), pages 1-9, December.
    5. S. Kasra Tabatabaei & Boya Wang & Nagendra Bala Murali Athreya & Behnam Enghiad & Alvaro Gonzalo Hernandez & Christopher J. Fields & Jean-Pierre Leburton & David Soloveichik & Huimin Zhao & Olgica Mil, 2020. "DNA punch cards for storing data on native DNA sequences via enzymatic nicking," Nature Communications, Nature, vol. 11(1), pages 1-10, December.
    6. Karishma Matange & James M. Tuck & Albert J. Keung, 2021. "DNA stability: a central design consideration for DNA data storage systems," Nature Communications, Nature, vol. 12(1), pages 1-9, December.
    7. Henry H. Lee & Reza Kalhor & Naveen Goela & Jean Bolot & George M. Church, 2019. "Terminator-free template-independent enzymatic DNA synthesis for digital information storage," Nature Communications, Nature, vol. 10(1), pages 1-12, December.
    8. Yuan-Jyue Chen & Christopher N. Takahashi & Lee Organick & Callista Bee & Siena Dumas Ang & Patrick Weiss & Bill Peck & Georg Seelig & Luis Ceze & Karin Strauss, 2020. "Quantifying molecular bias in DNA data storage," Nature Communications, Nature, vol. 11(1), pages 1-9, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Afsaneh Sadremomtaz & Robert F. Glass & Jorge Eduardo Guerrero & Dennis R. LaJeunesse & Eric A. Josephs & Reza Zadegan, 2023. "Digital data storage on DNA tape using CRISPR base editors," Nature Communications, Nature, vol. 14(1), pages 1-10, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Cheng Kai Lim & Jing Wui Yeoh & Aurelius Andrew Kunartama & Wen Shan Yew & Chueh Loo Poh, 2023. "A biological camera that captures and stores images directly into DNA," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    2. Afsaneh Sadremomtaz & Robert F. Glass & Jorge Eduardo Guerrero & Dennis R. LaJeunesse & Eric A. Josephs & Reza Zadegan, 2023. "Digital data storage on DNA tape using CRISPR base editors," Nature Communications, Nature, vol. 14(1), pages 1-10, December.
    3. Andreas L. Gimpel & Wendelin J. Stark & Reinhard Heckel & Robert N. Grass, 2023. "A digital twin for DNA data storage based on comprehensive quantification of errors and biases," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    4. Chao Pan & S. Kasra Tabatabaei & S. M. Hossein Tabatabaei Yazdi & Alvaro G. Hernandez & Charles M. Schroeder & Olgica Milenkovic, 2022. "Rewritable two-dimensional DNA-based data storage with machine learning reconstruction," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    5. Nicholas C. Tang & Jonathan C. Su & Yulia Shmidov & Garrett Kelly & Sonal Deshpande & Parul Sirohi & Nikhil Peterson & Ashutosh Chilkoti, 2024. "Synthetic intrinsically disordered protein fusion tags that enhance protein solubility," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    6. Jiongyu Zhang & Chengyu Hou & Changchun Liu, 2024. "CRISPR-powered quantitative keyword search engine in DNA data storage," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    7. Adam Kuzdraliński & Marek Miśkiewicz & Hubert Szczerba & Wojciech Mazurczyk & Jeff Nivala & Bogdan Księżopolski, 2023. "Unlocking the potential of DNA-based tagging: current market solutions and expanding horizons," Nature Communications, Nature, vol. 14(1), pages 1-7, December.
    8. Marius Welzel & Peter Michael Schwarz & Hannah F. Löchel & Tolganay Kabdullayeva & Sandra Clemens & Anke Becker & Bernd Freisleben & Dominik Heider, 2023. "DNA-Aeon provides flexible arithmetic coding for constraint adherence and error correction in DNA storage," Nature Communications, Nature, vol. 14(1), pages 1-10, December.
    9. Hinako Kawabe & Christopher A. Thomas & Shuichi Hoshika & Myong-Jung Kim & Myong-Sang Kim & Logan Miessner & Nicholas Kaplan & Jonathan M. Craig & Jens H. Gundlach & Andrew H. Laszlo & Steven A. Benne, 2023. "Enzymatic synthesis and nanopore sequencing of 12-letter supernumerary DNA," Nature Communications, Nature, vol. 14(1), pages 1-16, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-33046-w. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.