IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v15y2024i1d10.1038_s41467-024-44804-3.html
   My bibliography  Save this article

Utility of long-read sequencing for All of Us

Author

Listed:
  • M. Mahmoud

    (Human Genome Sequencing Center, Baylor College of Medicine
    Baylor College of Medicine)

  • Y. Huang

    (Broad Institute of MIT and Harvard)

  • K. Garimella

    (Broad Institute of MIT and Harvard)

  • P. A. Audano

    (The Jackson Laboratory for Genomic Medicine)

  • W. Wan

    (Broad Institute of MIT and Harvard)

  • N. Prasad

    (Discovery Life Sciences)

  • R. E. Handsaker

    (Harvard Medical School
    Broad Institute of MIT and Harvard)

  • S. Hall

    (Discovery Life Sciences)

  • A. Pionzio

    (Discovery Life Sciences)

  • M. C. Schatz

    (Johns Hopkins University)

  • M. E. Talkowski

    (Broad Institute of MIT and Harvard
    Massachusetts General Hospital)

  • E. E. Eichler

    (University of Washington School of Medicine
    University of Washington)

  • S. E. Levy

    (HudsonAlpha Institute for Biotechnology)

  • F. J. Sedlazeck

    (Human Genome Sequencing Center, Baylor College of Medicine
    Baylor College of Medicine
    Rice University)

Abstract

The All of Us (AoU) initiative aims to sequence the genomes of over one million Americans from diverse ethnic backgrounds to improve personalized medical care. In a recent technical pilot, we compare the performance of traditional short-read sequencing with long-read sequencing in a small cohort of samples from the HapMap project and two AoU control samples representing eight datasets. Our analysis reveals substantial differences in the ability of these technologies to accurately sequence complex medically relevant genes, particularly in terms of gene coverage and pathogenic variant identification. We also consider the advantages and challenges of using low coverage sequencing to increase sample numbers in large cohort analysis. Our results show that HiFi reads produce the most accurate results for both small and large variants. Further, we present a cloud-based pipeline to optimize SNV, indel and SV calling at scale for long-reads analysis. These results lead to widespread improvements across AoU.

Suggested Citation

  • M. Mahmoud & Y. Huang & K. Garimella & P. A. Audano & W. Wan & N. Prasad & R. E. Handsaker & S. Hall & A. Pionzio & M. C. Schatz & M. E. Talkowski & E. E. Eichler & S. E. Levy & F. J. Sedlazeck, 2024. "Utility of long-read sequencing for All of Us," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
  • Handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-44804-3
    DOI: 10.1038/s41467-024-44804-3
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-024-44804-3
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-024-44804-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Peter Edge & Vikas Bansal, 2019. "Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing," Nature Communications, Nature, vol. 10(1), pages 1-10, December.
    2. Daniel C. Jeffares & Clemency Jolly & Mimoza Hoti & Doug Speed & Liam Shaw & Charalampos Rallis & Francois Balloux & Christophe Dessimoz & Jürg Bähler & Fritz J. Sedlazeck, 2017. "Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast," Nature Communications, Nature, vol. 8(1), pages 1-11, April.
    3. Sara Reardon, 2015. "Giant study poses DNA data-sharing dilemma," Nature, Nature, vol. 525(7567), pages 16-17, September.
    4. Yunxi Liu & Joshua Kearney & Medhat Mahmoud & Bryce Kille & Fritz J. Sedlazeck & Todd J. Treangen, 2022. "Rescuing low frequency variants within intra-host viral populations directly from Oxford Nanopore sequencing data," Nature Communications, Nature, vol. 13(1), pages 1-9, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Can Luo & Yichen Henry Liu & Xin Maizie Zhou, 2024. "VolcanoSV enables accurate and robust structural variant calling in diploid genomes from single-molecule long read sequencing," Nature Communications, Nature, vol. 15(1), pages 1-20, December.
    2. Heng Du & Lei Zhou & Zhen Liu & Yue Zhuo & Meilin Zhang & Qianqian Huang & Shiyu Lu & Kai Xing & Li Jiang & Jian-Feng Liu, 2024. "The 1000 Chinese Indigenous Pig Genomes Project provides insights into the genomic architecture of pigs," Nature Communications, Nature, vol. 15(1), pages 1-18, December.
    3. Yichen Henry Liu & Can Luo & Staunton G. Golding & Jacob B. Ioffe & Xin Maizie Zhou, 2024. "Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data," Nature Communications, Nature, vol. 15(1), pages 1-22, December.
    4. Cheng-Kai Shiau & Lina Lu & Rachel Kieser & Kazutaka Fukumura & Timothy Pan & Hsiao-Yun Lin & Jie Yang & Eric L. Tong & GaHyun Lee & Yuanqing Yan & Jason T. Huse & Ruli Gao, 2023. "High throughput single cell long-read sequencing analyses of same-cell genotypes and phenotypes in human tumors," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    5. Liyuan Zhou & Qiongzi Qiu & Qing Zhou & Jianwei Li & Mengqian Yu & Kezhen Li & Lingling Xu & Xiaohui Ke & Haiming Xu & Bingjian Lu & Hui Wang & Weiguo Lu & Pengyuan Liu & Yan Lu, 2022. "Long-read sequencing unveils high-resolution HPV integration and its oncogenic progression in cervical cancer," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
    6. Qian Zhou & Fahu Ji & Dongxiao Lin & Xianming Liu & Zexuan Zhu & Jue Ruan, 2024. "KSNP: a fast de Bruijn graph-based haplotyping tool approaching data-in time cost," Nature Communications, Nature, vol. 15(1), pages 1-7, December.
    7. Jinhyun Kim & Sungsik Kim & Huiran Yeom & Seo Woo Song & Kyoungseob Shin & Sangwook Bae & Han Suk Ryu & Ji Young Kim & Ahyoun Choi & Sumin Lee & Taehoon Ryu & Yeongjae Choi & Hamin Kim & Okju Kim & Yu, 2023. "Barcoded multiple displacement amplification for high coverage sequencing in spatial genomics," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    8. Anna Zimmermann & Julian E. Prieto-Vivas & Charlotte Cautereels & Anton Gorkovskiy & Jan Steensels & Yves Peer & Kevin J. Verstrepen, 2023. "A Cas3-base editing tool for targetable in vivo mutagenesis," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    9. Yingyan Yu & Zhen Zhang & Xiaorui Dong & Ruixin Yang & Zhongqu Duan & Zhen Xiang & Jun Li & Guichao Li & Fazhe Yan & Hongzhang Xue & Du Jiao & Jinyuan Lu & Huimin Lu & Wenmin Zhang & Yangzhen Wei & Sh, 2022. "Pangenomic analysis of Chinese gastric cancer," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    10. Xiaoling Tong & Min-Jin Han & Kunpeng Lu & Shuaishuai Tai & Shubo Liang & Yucheng Liu & Hai Hu & Jianghong Shen & Anxing Long & Chengyu Zhan & Xin Ding & Shuo Liu & Qiang Gao & Bili Zhang & Linli Zhou, 2022. "High-resolution silkworm pan-genome provides genetic insights into artificial selection and ecological adaptation," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    11. Zeyu Zheng & Mingjia Zhu & Jin Zhang & Xinfeng Liu & Liqiang Hou & Wenyu Liu & Shuai Yuan & Changhong Luo & Xinhao Yao & Jianquan Liu & Yongzhi Yang, 2024. "A sequence-aware merger of genomic structural variations at population scale," Nature Communications, Nature, vol. 15(1), pages 1-9, December.
    12. Kevin A. Kovalchik & David J. Hamelin & Peter Kubiniok & Benoîte Bourdin & Fatima Mostefai & Raphaël Poujol & Bastien Paré & Shawn M. Simpson & John Sidney & Éric Bonneil & Mathieu Courcelles & Sunil , 2024. "Machine learning-enhanced immunopeptidomics applied to T-cell epitope discovery for COVID-19 vaccines," Nature Communications, Nature, vol. 15(1), pages 1-22, December.
    13. Huiying He & Yue Leng & Xinglan Cao & Yiwang Zhu & Xiaoxia Li & Qiaoling Yuan & Bin Zhang & Wenchuang He & Hua Wei & Xiangpei Liu & Qiang Xu & Mingliang Guo & Hong Zhang & Longbo Yang & Yang Lv & Xian, 2024. "The pan-tandem repeat map highlights multiallelic variants underlying gene expression and agronomic traits in rice," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    14. Cristian Groza & Xun Chen & Travis J. Wheeler & Guillaume Bourque & Clément Goubert, 2024. "A unified framework to analyze transposable element insertion polymorphisms using graph genomes," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    15. Tuomas Hämälä & Christopher Moore & Laura Cowan & Matthew Carlile & David Gopaulchan & Marie K. Brandrud & Siri Birkeland & Matthew Loose & Filip Kolář & Marcus A. Koch & Levi Yant, 2024. "Impact of whole-genome duplications on structural variant evolution in Cochlearia," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    16. Cristian Groza & Carl Schwendinger-Schreck & Warren A. Cheung & Emily G. Farrow & Isabelle Thiffault & Juniper Lake & William B. Rizzo & Gilad Evrony & Tom Curran & Guillaume Bourque & Tomi Pastinen, 2024. "Pangenome graphs improve the analysis of structural variants in rare genetic diseases," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    17. Charlotte Cautereels & Jolien Smets & Peter Bircham & Dries De Ruysscher & Anna Zimmermann & Peter De Rijk & Jan Steensels & Anton Gorkovskiy & Joleen Masschelein & Kevin J. Verstrepen, 2024. "Combinatorial optimization of gene expression through recombinase-mediated promoter and terminator shuffling in yeast," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    18. Sina Majidian & Mohammad Hossein Kahaei & Dick de Ridder, 2020. "Minimum error correction-based haplotype assembly: Considerations for long read data," PLOS ONE, Public Library of Science, vol. 15(6), pages 1-12, June.
    19. Fatemeh Mohebbi & Alex Zelikovsky & Serghei Mangul & Gerardo Chowell & Pavel Skums, 2024. "Early detection of emerging viral variants through analysis of community structure of coordinated substitution networks," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    20. Xuezhu Liao & Dejin Xie & Tingting Bao & Mengmeng Hou & Cheng Li & Bao Nie & Shichao Sun & Dan Peng & Haixiao Hu & Hongru Wang & Yongfu Tao & Yu Zhang & Wei Li & Li Wang, 2024. "Inversions encounter relaxed genetic constraints and balance birth and death of TPS genes in Curcuma," Nature Communications, Nature, vol. 15(1), pages 1-15, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-44804-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.