IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0069503.html
   My bibliography  Save this article

Optimizing Information in Next-Generation-Sequencing (NGS) Reads for Improving De Novo Genome Assembly

Author

Listed:
  • Tsunglin Liu
  • Cheng-Hung Tsai
  • Wen-Bin Lee
  • Jung-Hsien Chiang

Abstract

Next-Generation-Sequencing is advantageous because of its much higher data throughput and much lower cost compared with the traditional Sanger method. However, NGS reads are shorter than Sanger reads, making de novo genome assembly very challenging. Because genome assembly is essential for all downstream biological studies, great efforts have been made to enhance the completeness of genome assembly, which requires the presence of long reads or long distance information. To improve de novo genome assembly, we develop a computational program, ARF-PE, to increase the length of Illumina reads. ARF-PE takes as input Illumina paired-end (PE) reads and recovers the original DNA fragments from which two ends the paired reads are obtained. On the PE data of four bacteria, ARF-PE recovered >87% of the DNA fragments and achieved >98% of perfect DNA fragment recovery. Using Velvet, SOAPdenovo, Newbler, and CABOG, we evaluated the benefits of recovered DNA fragments to genome assembly. For all four bacteria, the recovered DNA fragments increased the assembly contiguity. For example, the N50 lengths of the P. brasiliensis contigs assembled by SOAPdenovo and Newbler increased from 80,524 bp to 166,573 bp and from 80,655 bp to 193,388 bp, respectively. ARF-PE also increased assembly accuracy in many cases. On the PE data of two fungi and a human chromosome, ARF-PE doubled and tripled the N50 length. However, the assembly accuracies dropped, but still remained >91%. In general, ARF-PE can increase both assembly contiguity and accuracy for bacterial genomes. For complex eukaryotic genomes, ARF-PE is promising because it raises assembly contiguity. But future error correction is needed for ARF-PE to also increase the assembly accuracy. ARF-PE is freely available at http://140.116.235.124/~tliu/arf-pe/.

Suggested Citation

  • Tsunglin Liu & Cheng-Hung Tsai & Wen-Bin Lee & Jung-Hsien Chiang, 2013. "Optimizing Information in Next-Generation-Sequencing (NGS) Reads for Improving De Novo Genome Assembly," PLOS ONE, Public Library of Science, vol. 8(7), pages 1-16, July.
  • Handle: RePEc:plo:pone00:0069503
    DOI: 10.1371/journal.pone.0069503
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0069503
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0069503&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0069503?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Marcel Margulies & Michael Egholm & William E. Altman & Said Attiya & Joel S. Bader & Lisa A. Bemben & Jan Berka & Michael S. Braverman & Yi-Ju Chen & Zhoutao Chen & Scott B. Dewell & Lei Du & Joseph , 2005. "Genome sequencing in microfabricated high-density picolitre reactors," Nature, Nature, vol. 437(7057), pages 376-380, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Fernando Lopez-Rios & Barbara Angulo & Belen Gomez & Debbie Mair & Rebeca Martinez & Esther Conde & Felice Shieh & Jeffrey Vaks & Rachel Langland & H Jeffrey Lawrence & David Gonzalez de Castro, 2013. "Comparison of Testing Methods for the Detection of BRAF V600E Mutations in Malignant Melanoma: Pre-Approval Validation Study of the Companion Diagnostic Test for Vemurafenib," PLOS ONE, Public Library of Science, vol. 8(1), pages 1-7, January.
    2. David J H F Knapp & Rachel A McGovern & Art F Y Poon & Xiaoyin Zhong & Dennison Chan & Luke C Swenson & Winnie Dong & P Richard Harrigan, 2014. "“Deep” Sequencing Accuracy and Reproducibility Using Roche/454 Technology for Inferring Co-Receptor Usage in HIV-1," PLOS ONE, Public Library of Science, vol. 9(6), pages 1-10, June.
    3. Chongqing Wen & Liyou Wu & Yujia Qin & Joy D Van Nostrand & Daliang Ning & Bo Sun & Kai Xue & Feifei Liu & Ye Deng & Yuting Liang & Jizhong Zhou, 2017. "Evaluation of the reproducibility of amplicon sequencing with Illumina MiSeq platform," PLOS ONE, Public Library of Science, vol. 12(4), pages 1-20, April.
    4. Jiang Du & Robert D Bjornson & Zhengdong D Zhang & Yong Kong & Michael Snyder & Mark B Gerstein, 2009. "Integrating Sequencing Technologies in Personal Genomics: Optimal Low Cost Reconstruction of Structural Variants," PLOS Computational Biology, Public Library of Science, vol. 5(7), pages 1-15, July.
    5. Peri, Alessandro, 2020. "A hardware approach to value function iteration," Journal of Economic Dynamics and Control, Elsevier, vol. 114(C).
    6. Marvin Mundry & Erich Bornberg-Bauer & Michael Sammeth & Philine G D Feulner, 2012. "Evaluating Characteristics of De Novo Assembly Software on 454 Transcriptome Data: A Simulation Approach," PLOS ONE, Public Library of Science, vol. 7(2), pages 1-10, February.
    7. Ben Jia & Liming Xuan & Kaiye Cai & Zhiqiang Hu & Liangxiao Ma & Chaochun Wei, 2013. "NeSSM: A Next-Generation Sequencing Simulator for Metagenomics," PLOS ONE, Public Library of Science, vol. 8(10), pages 1-10, October.
    8. Yen-Chun Chen & Tsunglin Liu & Chun-Hui Yu & Tzen-Yuh Chiang & Chi-Chuan Hwang, 2013. "Effects of GC Bias in Next-Generation-Sequencing Data on De Novo Genome Assembly," PLOS ONE, Public Library of Science, vol. 8(4), pages 1-20, April.
    9. Cirella, Giuseppe T. & Zerbe, Stefan, 2014. "Sustainable Water Management and Wetland Restoration Strategies in Northern China," MPRA Paper 120233, University Library of Munich, Germany.
    10. Yu-Tsueng Liu & Dennis A Carson, 2007. "A Novel Approach for Determining Cancer Genomic Breakpoints in the Presence of Normal DNA," PLOS ONE, Public Library of Science, vol. 2(4), pages 1-8, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0069503. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.