IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1000432.html
   My bibliography  Save this article

Integrating Sequencing Technologies in Personal Genomics: Optimal Low Cost Reconstruction of Structural Variants

Author

Listed:
  • Jiang Du
  • Robert D Bjornson
  • Zhengdong D Zhang
  • Yong Kong
  • Michael Snyder
  • Mark B Gerstein

Abstract

The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen), with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs). SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome.) To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of human genomes at maximum accuracy and low cost.Author Summary: In recent years, the development of high throughput sequencing and array technologies has enabled the accurate re-sequencing of individual genomes, especially in identifying and reconstructing the variants in an individual's genome compared to a “reference”. The costs and sensitivities of these technologies differ considerably from each other, and even more technologies are expected to appear in the near future. To both reduce the total cost of re-sequencing to an affordable point and be adaptive to these constantly evolving bio-technologies, we propose to build a computationally efficient simulation framework that can help us optimize the combination of different technologies to perform low cost comparative genome re-sequencing, especially in reconstructing large structural variants, which is considered in many respects the most challenging step in genome re-sequencing. Our simulation results quantitatively show how much improvement one can gain in reconstructing large structural variants by integrating different technologies in optimal ways. We envision that in the future, more experimental technologies will be incorporated into this simulation framework and its results can provide informative guidelines for the actual experimental design to achieve optimal genome re-sequencing output at low costs.

Suggested Citation

  • Jiang Du & Robert D Bjornson & Zhengdong D Zhang & Yong Kong & Michael Snyder & Mark B Gerstein, 2009. "Integrating Sequencing Technologies in Personal Genomics: Optimal Low Cost Reconstruction of Structural Variants," PLOS Computational Biology, Public Library of Science, vol. 5(7), pages 1-15, July.
  • Handle: RePEc:plo:pcbi00:1000432
    DOI: 10.1371/journal.pcbi.1000432
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000432
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1000432&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1000432?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Marcel Margulies & Michael Egholm & William E. Altman & Said Attiya & Joel S. Bader & Lisa A. Bemben & Jan Berka & Michael S. Braverman & Yi-Ju Chen & Zhoutao Chen & Scott B. Dewell & Lei Du & Joseph , 2005. "Genome sequencing in microfabricated high-density picolitre reactors," Nature, Nature, vol. 437(7057), pages 376-380, September.
    2. David A. Wheeler & Maithreyan Srinivasan & Michael Egholm & Yufeng Shen & Lei Chen & Amy McGuire & Wen He & Yi-Ju Chen & Vinod Makhijani & G. Thomas Roth & Xavier Gomes & Karrie Tartaro & Faheem Niazi, 2008. "The complete genome of an individual by massively parallel DNA sequencing," Nature, Nature, vol. 452(7189), pages 872-876, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kirk E Lohmueller & Anders Albrechtsen & Yingrui Li & Su Yeon Kim & Thorfinn Korneliussen & Nicolas Vinckenbosch & Geng Tian & Emilia Huerta-Sanchez & Alison F Feder & Niels Grarup & Torben Jørgensen , 2011. "Natural Selection Affects Multiple Aspects of Genetic Variation at Putatively Neutral Sites across the Human Genome," PLOS Genetics, Public Library of Science, vol. 7(10), pages 1-15, October.
    2. Fernando Lopez-Rios & Barbara Angulo & Belen Gomez & Debbie Mair & Rebeca Martinez & Esther Conde & Felice Shieh & Jeffrey Vaks & Rachel Langland & H Jeffrey Lawrence & David Gonzalez de Castro, 2013. "Comparison of Testing Methods for the Detection of BRAF V600E Mutations in Malignant Melanoma: Pre-Approval Validation Study of the Companion Diagnostic Test for Vemurafenib," PLOS ONE, Public Library of Science, vol. 8(1), pages 1-7, January.
    3. David J H F Knapp & Rachel A McGovern & Art F Y Poon & Xiaoyin Zhong & Dennison Chan & Luke C Swenson & Winnie Dong & P Richard Harrigan, 2014. "“Deep” Sequencing Accuracy and Reproducibility Using Roche/454 Technology for Inferring Co-Receptor Usage in HIV-1," PLOS ONE, Public Library of Science, vol. 9(6), pages 1-10, June.
    4. Chongqing Wen & Liyou Wu & Yujia Qin & Joy D Van Nostrand & Daliang Ning & Bo Sun & Kai Xue & Feifei Liu & Ye Deng & Yuting Liang & Jizhong Zhou, 2017. "Evaluation of the reproducibility of amplicon sequencing with Illumina MiSeq platform," PLOS ONE, Public Library of Science, vol. 12(4), pages 1-20, April.
    5. Richard Jiang & Prashant Singh & Fredrik Wrede & Andreas Hellander & Linda Petzold, 2022. "Identification of dynamic mass-action biochemical reaction networks using sparse Bayesian methods," PLOS Computational Biology, Public Library of Science, vol. 18(1), pages 1-21, January.
    6. Peri, Alessandro, 2020. "A hardware approach to value function iteration," Journal of Economic Dynamics and Control, Elsevier, vol. 114(C).
    7. Marvin Mundry & Erich Bornberg-Bauer & Michael Sammeth & Philine G D Feulner, 2012. "Evaluating Characteristics of De Novo Assembly Software on 454 Transcriptome Data: A Simulation Approach," PLOS ONE, Public Library of Science, vol. 7(2), pages 1-10, February.
    8. Ben Jia & Liming Xuan & Kaiye Cai & Zhiqiang Hu & Liangxiao Ma & Chaochun Wei, 2013. "NeSSM: A Next-Generation Sequencing Simulator for Metagenomics," PLOS ONE, Public Library of Science, vol. 8(10), pages 1-10, October.
    9. Tsunglin Liu & Cheng-Hung Tsai & Wen-Bin Lee & Jung-Hsien Chiang, 2013. "Optimizing Information in Next-Generation-Sequencing (NGS) Reads for Improving De Novo Genome Assembly," PLOS ONE, Public Library of Science, vol. 8(7), pages 1-16, July.
    10. Yen-Chun Chen & Tsunglin Liu & Chun-Hui Yu & Tzen-Yuh Chiang & Chi-Chuan Hwang, 2013. "Effects of GC Bias in Next-Generation-Sequencing Data on De Novo Genome Assembly," PLOS ONE, Public Library of Science, vol. 8(4), pages 1-20, April.
    11. Cirella, Giuseppe T. & Zerbe, Stefan, 2014. "Sustainable Water Management and Wetland Restoration Strategies in Northern China," MPRA Paper 120233, University Library of Munich, Germany.
    12. Yu-Tsueng Liu & Dennis A Carson, 2007. "A Novel Approach for Determining Cancer Genomic Breakpoints in the Presence of Normal DNA," PLOS ONE, Public Library of Science, vol. 2(4), pages 1-8, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1000432. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.