IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v15y2024i1d10.1038_s41467-024-53087-7.html
   My bibliography  Save this article

SurVIndel2: improving copy number variant calling from next-generation sequencing using hidden split reads

Author

Listed:
  • Ramesh Rajaby

    (The Chinese University of Hong Kong
    Hong Kong Science Park
    A*STAR Genome Institute of Singapore
    University of Tokyo)

  • Wing-Kin Sung

    (The Chinese University of Hong Kong
    Hong Kong Science Park
    A*STAR Genome Institute of Singapore
    The Chinese University of Hong Kong)

Abstract

Deletions and tandem duplications (commonly called CNVs) represent the majority of structural variations in a human genome. They can be identified using short reads, but because they frequently occur in repetitive regions, existing methods fail to detect most of them. This is because CNVs in repetitive regions often do not produce the evidence needed by existing short reads-based callers (split reads, discordant pairs or read depth change). Here, we introduce a new CNV short reads-based caller named SurVIndel2. SurVindel2 builds on statistical techniques we previously developed, but also employs a novel type of evidence, hidden split reads, that can uncover many CNVs missed by existing algorithms. We use public benchmarks to show that SurVIndel2 outperforms other popular callers, both on human and non-human datasets. Then, we demonstrate the practical utility of the method by generating a catalogue of CNVs for the 1000 Genomes Project that contains hundreds of thousands of CNVs missing from the most recent public catalogue. We also show that SurVIndel2 is able to complement small indels predicted by Google DeepVariant, and the two software used in tandem produce a remarkably complete catalogue of variants in an individual. Finally, we characterise how the limitations of current sequencing technologies contribute significantly to the missing CNVs.

Suggested Citation

  • Ramesh Rajaby & Wing-Kin Sung, 2024. "SurVIndel2: improving copy number variant calling from next-generation sequencing using hidden split reads," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
  • Handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-53087-7
    DOI: 10.1038/s41467-024-53087-7
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-024-53087-7
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-024-53087-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Haley J. Abel & David E. Larson & Allison A. Regier & Colby Chiang & Indraniel Das & Krishna L. Kanchi & Ryan M. Layer & Benjamin M. Neale & William J. Salerno & Catherine Reeves & Steven Buyske & Tar, 2020. "Mapping and characterization of structural variation in 17,795 human genomes," Nature, Nature, vol. 583(7814), pages 83-89, July.
    2. Daniel L. Cameron & Leon Stefano & Anthony T. Papenfuss, 2019. "Comprehensive evaluation and characterisation of short read general-purpose structural variant calling software," Nature Communications, Nature, vol. 10(1), pages 1-11, December.
    3. Ryan L. Collins & Harrison Brand & Konrad J. Karczewski & Xuefang Zhao & Jessica Alföldi & Laurent C. Francioli & Amit V. Khera & Chelsea Lowther & Laura D. Gauthier & Harold Wang & Nicholas A. Watts , 2020. "A structural variation reference for medical and population genetics," Nature, Nature, vol. 581(7809), pages 444-451, May.
    4. Ting Wang & Lucinda Antonacci-Fulton & Kerstin Howe & Heather A. Lawson & Julian K. Lucas & Adam M. Phillippy & Alice B. Popejoy & Mobin Asri & Caryn Carson & Mark J. P. Chaisson & Xian Chang & Robert, 2022. "The Human Pangenome Project: a global resource to map genomic diversity," Nature, Nature, vol. 604(7906), pages 437-446, April.
    5. Ramesh Rajaby & Dong-Xu Liu & Chun Hang Au & Yuen-Ting Cheung & Amy Yuet Ting Lau & Qing-Yong Yang & Wing-Kin Sung, 2023. "INSurVeyor: improving insertion calling from short read sequencing data," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ramesh Rajaby & Dong-Xu Liu & Chun Hang Au & Yuen-Ting Cheung & Amy Yuet Ting Lau & Qing-Yong Yang & Wing-Kin Sung, 2023. "INSurVeyor: improving insertion calling from short read sequencing data," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    2. Zhikun Wu & Zehang Jiang & Tong Li & Chuanbo Xie & Liansheng Zhao & Jiaqi Yang & Shuai Ouyang & Yizhi Liu & Tao Li & Zhi Xie, 2021. "Structural variants in the Chinese population and their impact on phenotypes, diseases and population adaptation," Nature Communications, Nature, vol. 12(1), pages 1-12, December.
    3. Joanna Hui Juan Tan & Zhihui Li & Mar Gonzalez Porta & Ramesh Rajaby & Weng Khong Lim & Ye An Tan & Rodrigo Toro Jimenez & Renyi Teo & Maxime Hebrard & Jack Ling Ow & Shimin Ang & Justin Jeyakani & Ya, 2024. "A Catalogue of Structural Variation across Ancestrally Diverse Asian Genomes," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    4. Ludovica Montanucci & David Lewis-Smith & Ryan L. Collins & Lisa-Marie Niestroj & Shridhar Parthasarathy & Julie Xian & Shiva Ganesan & Marie Macnee & Tobias Brünger & Rhys H. Thomas & Michael Talkows, 2023. "Genome-wide identification and phenotypic characterization of seizure-associated copy number variations in 741,075 individuals," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    5. Robin Aguilar & Conor K. Camplisson & Qiaoyi Lin & Karen H. Miga & William S. Noble & Brian J. Beliveau, 2024. "Tigerfish designs oligonucleotide-based in situ hybridization probes targeting intervals of highly repetitive DNA at the scale of genomes," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    6. Emily Kirby & Alexander Bernier & Roderic Guigó & Barbara Wold & Fabiana Arzuaga & Mayumi Kusunose & Ma’n Zawati & Bartha M. Knoppers, 2024. "Data sharing ethics toolkit: The Human Cell Atlas," Nature Communications, Nature, vol. 15(1), pages 1-7, December.
    7. Yoshitaka Sakamoto & Shuhei Miyake & Miho Oka & Akinori Kanai & Yosuke Kawai & Satoi Nagasawa & Yuichi Shiraishi & Katsushi Tokunaga & Takashi Kohno & Masahide Seki & Yutaka Suzuki & Ayako Suzuki, 2022. "Phasing analysis of lung cancer genomes using a long read sequencer," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    8. Xue Gao & Sheng Wang & Yan-Fen Wang & Shuang Li & Shi-Xin Wu & Rong-Ge Yan & Yi-Wen Zhang & Rui-Dong Wan & Zhen He & Ren-De Song & Xin-Quan Zhao & Dong-Dong Wu & Qi-En Yang, 2022. "Long read genome assemblies complemented by single cell RNA-sequencing reveal genetic and cellular mechanisms underlying the adaptive evolution of yak," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    9. Zhuoran Xu & Quan Li & Luigi Marchionni & Kai Wang, 2023. "PhenoSV: interpretable phenotype-aware model for the prioritization of genes affected by structural variants," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    10. Tingting Gong & Vanessa M Hayes & Eva K F Chan, 2020. "Shiny-SoSV: A web-based performance calculator for somatic structural variant detection," PLOS ONE, Public Library of Science, vol. 15(8), pages 1-20, August.
    11. Tobias T. Schmidt & Carly Tyer & Preeyesh Rughani & Candy Haggblom & Jeffrey R. Jones & Xiaoguang Dai & Kelly A. Frazer & Fred H. Gage & Sissel Juul & Scott Hickey & Jan Karlseder, 2024. "High resolution long-read telomere sequencing reveals dynamic mechanisms in aging and cancer," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    12. Wolfram Höps & Tobias Rausch & Michael Jendrusch & Jan O. Korbel & Fritz J. Sedlazeck, 2024. "Impact and characterization of serial structural variations across humans and great apes," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    13. Jinlong Shi & Zhilong Jia & Jinxiu Sun & Xiaoreng Wang & Xiaojing Zhao & Chenghui Zhao & Fan Liang & Xinyu Song & Jiawei Guan & Xue Jia & Jing Yang & Qi Chen & Kang Yu & Qian Jia & Jing Wu & Depeng Wa, 2023. "Structural variants involved in high-altitude adaptation detected using single-molecule long-read sequencing," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    14. Xiaoling Tong & Min-Jin Han & Kunpeng Lu & Shuaishuai Tai & Shubo Liang & Yucheng Liu & Hai Hu & Jianghong Shen & Anxing Long & Chengyu Zhan & Xin Ding & Shuo Liu & Qiang Gao & Bili Zhang & Linli Zhou, 2022. "High-resolution silkworm pan-genome provides genetic insights into artificial selection and ecological adaptation," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    15. Yilei Fu & Sergey Aganezov & Medhat Mahmoud & John Beaulaurier & Sissel Juul & Todd J. Treangen & Fritz J. Sedlazeck, 2024. "MethPhaser: methylation-based long-read haplotype phasing of human genomes," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    16. Orshay Gabay & Yoav Shoshan & Eli Kopel & Udi Ben-Zvi & Tomer D. Mann & Noam Bressler & Roni Cohen‐Fultheim & Amos A. Schaffer & Shalom Hillel Roth & Ziv Tzur & Erez Y. Levanon & Eli Eisenberg, 2022. "Landscape of adenosine-to-inosine RNA recoding across human tissues," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    17. Cristian Groza & Xun Chen & Travis J. Wheeler & Guillaume Bourque & Clément Goubert, 2024. "A unified framework to analyze transposable element insertion polymorphisms using graph genomes," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    18. Sook Wah Yee & Luis Ferrández-Peral & Pol Alentorn-Moron & Claudia Fontsere & Merve Ceylan & Megan L. Koleske & Niklas Handin & Virginia M. Artegoitia & Giovanni Lara & Huan-Chieh Chien & Xujia Zhou &, 2024. "Illuminating the function of the orphan transporter, SLC22A10, in humans and other primates," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    19. Cristian Groza & Carl Schwendinger-Schreck & Warren A. Cheung & Emily G. Farrow & Isabelle Thiffault & Juniper Lake & William B. Rizzo & Gilad Evrony & Tom Curran & Guillaume Bourque & Tomi Pastinen, 2024. "Pangenome graphs improve the analysis of structural variants in rare genetic diseases," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    20. Parithi Balachandran & Isha A. Walawalkar & Jacob I. Flores & Jacob N. Dayton & Peter A. Audano & Christine R. Beck, 2022. "Transposable element-mediated rearrangements are prevalent in human genomes," Nature Communications, Nature, vol. 13(1), pages 1-14, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-53087-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.