IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v14y2023i1d10.1038_s41467-023-38870-2.html
   My bibliography  Save this article

INSurVeyor: improving insertion calling from short read sequencing data

Author

Listed:
  • Ramesh Rajaby

    (Hong Kong Science Park
    A*STAR Genome Institute of Singapore)

  • Dong-Xu Liu

    (College of Informatics, Huazhong Agricultural University
    College of Informatics, Huazhong Agricultural University)

  • Chun Hang Au

    (Hong Kong Science Park)

  • Yuen-Ting Cheung

    (Hong Kong Science Park)

  • Amy Yuet Ting Lau

    (Hong Kong Science Park)

  • Qing-Yong Yang

    (College of Informatics, Huazhong Agricultural University
    College of Informatics, Huazhong Agricultural University)

  • Wing-Kin Sung

    (Hong Kong Science Park
    A*STAR Genome Institute of Singapore
    College of Informatics, Huazhong Agricultural University
    College of Informatics, Huazhong Agricultural University)

Abstract

Insertions are one of the major types of structural variations and are defined as the addition of 50 nucleotides or more into a DNA sequence. Several methods exist to detect insertions from next-generation sequencing short read data, but they generally have low sensitivity. Our contribution is two-fold. First, we introduce INSurVeyor, a fast, sensitive and precise method that detects insertions from next-generation sequencing paired-end data. Using publicly available benchmark datasets (both human and non-human), we show that INSurVeyor is not only more sensitive than any individual caller we tested, but also more sensitive than all of them combined. Furthermore, for most types of insertions, INSurVeyor is almost as sensitive as long reads callers. Second, we provide state-of-the-art catalogues of insertions for 1047 Arabidopsis Thaliana genomes from the 1001 Genomes Project and 3202 human genomes from the 1000 Genomes Project, both generated with INSurVeyor. We show that they are more complete and precise than existing resources, and important insertions are missed by existing methods.

Suggested Citation

  • Ramesh Rajaby & Dong-Xu Liu & Chun Hang Au & Yuen-Ting Cheung & Amy Yuet Ting Lau & Qing-Yong Yang & Wing-Kin Sung, 2023. "INSurVeyor: improving insertion calling from short read sequencing data," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
  • Handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-38870-2
    DOI: 10.1038/s41467-023-38870-2
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-023-38870-2
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-023-38870-2?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Haley J. Abel & David E. Larson & Allison A. Regier & Colby Chiang & Indraniel Das & Krishna L. Kanchi & Ryan M. Layer & Benjamin M. Neale & William J. Salerno & Catherine Reeves & Steven Buyske & Tar, 2020. "Mapping and characterization of structural variation in 17,795 human genomes," Nature, Nature, vol. 583(7814), pages 83-89, July.
    2. Peter H. Sudmant & Tobias Rausch & Eugene J. Gardner & Robert E. Handsaker & Alexej Abyzov & John Huddleston & Yan Zhang & Kai Ye & Goo Jun & Markus Hsi-Yang Fritz & Miriam K. Konkel & Ankit Malhotra , 2015. "An integrated map of structural variation in 2,504 human genomes," Nature, Nature, vol. 526(7571), pages 75-81, October.
    3. Ryan L. Collins & Harrison Brand & Konrad J. Karczewski & Xuefang Zhao & Jessica Alföldi & Laurent C. Francioli & Amit V. Khera & Chelsea Lowther & Laura D. Gauthier & Harold Wang & Nicholas A. Watts , 2020. "A structural variation reference for medical and population genetics," Nature, Nature, vol. 581(7809), pages 444-451, May.
    4. Daniel L. Cameron & Leon Stefano & Anthony T. Papenfuss, 2019. "Comprehensive evaluation and characterisation of short read general-purpose structural variant calling software," Nature Communications, Nature, vol. 10(1), pages 1-11, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ludovica Montanucci & David Lewis-Smith & Ryan L. Collins & Lisa-Marie Niestroj & Shridhar Parthasarathy & Julie Xian & Shiva Ganesan & Marie Macnee & Tobias Brünger & Rhys H. Thomas & Michael Talkows, 2023. "Genome-wide identification and phenotypic characterization of seizure-associated copy number variations in 741,075 individuals," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    2. Zhikun Wu & Zehang Jiang & Tong Li & Chuanbo Xie & Liansheng Zhao & Jiaqi Yang & Shuai Ouyang & Yizhi Liu & Tao Li & Zhi Xie, 2021. "Structural variants in the Chinese population and their impact on phenotypes, diseases and population adaptation," Nature Communications, Nature, vol. 12(1), pages 1-12, December.
    3. Xiaoling Tong & Min-Jin Han & Kunpeng Lu & Shuaishuai Tai & Shubo Liang & Yucheng Liu & Hai Hu & Jianghong Shen & Anxing Long & Chengyu Zhan & Xin Ding & Shuo Liu & Qiang Gao & Bili Zhang & Linli Zhou, 2022. "High-resolution silkworm pan-genome provides genetic insights into artificial selection and ecological adaptation," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    4. Xue Gao & Sheng Wang & Yan-Fen Wang & Shuang Li & Shi-Xin Wu & Rong-Ge Yan & Yi-Wen Zhang & Rui-Dong Wan & Zhen He & Ren-De Song & Xin-Quan Zhao & Dong-Dong Wu & Qi-En Yang, 2022. "Long read genome assemblies complemented by single cell RNA-sequencing reveal genetic and cellular mechanisms underlying the adaptive evolution of yak," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    5. Yirong Shi & Yiwei Niu & Peng Zhang & Huaxia Luo & Shuai Liu & Sijia Zhang & Jiajia Wang & Yanyan Li & Xinyue Liu & Tingrui Song & Tao Xu & Shunmin He, 2023. "Characterization of genome-wide STR variation in 6487 human genomes," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    6. Arthur S. Lee & Lauren J. Ayers & Michael Kosicki & Wai-Man Chan & Lydia N. Fozo & Brandon M. Pratt & Thomas E. Collins & Boxun Zhao & Matthew F. Rose & Alba Sanchis-Juan & Jack M. Fu & Isaac Wong & X, 2024. "A cell type-aware framework for nominating non-coding variants in Mendelian regulatory disorders," Nature Communications, Nature, vol. 15(1), pages 1-26, December.
    7. Yichen Henry Liu & Can Luo & Staunton G. Golding & Jacob B. Ioffe & Xin Maizie Zhou, 2024. "Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data," Nature Communications, Nature, vol. 15(1), pages 1-22, December.
    8. Yu Chen & Amy Y. Wang & Courtney A. Barkley & Yixin Zhang & Xinyang Zhao & Min Gao & Mick D. Edmonds & Zechen Chong, 2023. "Deciphering the exact breakpoints of structural variations using long sequencing reads with DeBreak," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    9. Orshay Gabay & Yoav Shoshan & Eli Kopel & Udi Ben-Zvi & Tomer D. Mann & Noam Bressler & Roni Cohen‐Fultheim & Amos A. Schaffer & Shalom Hillel Roth & Ziv Tzur & Erez Y. Levanon & Eli Eisenberg, 2022. "Landscape of adenosine-to-inosine RNA recoding across human tissues," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    10. Yoshitaka Sakamoto & Shuhei Miyake & Miho Oka & Akinori Kanai & Yosuke Kawai & Satoi Nagasawa & Yuichi Shiraishi & Katsushi Tokunaga & Takashi Kohno & Masahide Seki & Yutaka Suzuki & Ayako Suzuki, 2022. "Phasing analysis of lung cancer genomes using a long read sequencer," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    11. Cristian Groza & Xun Chen & Travis J. Wheeler & Guillaume Bourque & Clément Goubert, 2024. "A unified framework to analyze transposable element insertion polymorphisms using graph genomes," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    12. Sook Wah Yee & Luis Ferrández-Peral & Pol Alentorn-Moron & Claudia Fontsere & Merve Ceylan & Megan L. Koleske & Niklas Handin & Virginia M. Artegoitia & Giovanni Lara & Huan-Chieh Chien & Xujia Zhou &, 2024. "Illuminating the function of the orphan transporter, SLC22A10, in humans and other primates," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    13. Zhuoran Xu & Quan Li & Luigi Marchionni & Kai Wang, 2023. "PhenoSV: interpretable phenotype-aware model for the prioritization of genes affected by structural variants," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    14. Cristian Groza & Carl Schwendinger-Schreck & Warren A. Cheung & Emily G. Farrow & Isabelle Thiffault & Juniper Lake & William B. Rizzo & Gilad Evrony & Tom Curran & Guillaume Bourque & Tomi Pastinen, 2024. "Pangenome graphs improve the analysis of structural variants in rare genetic diseases," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    15. Qiliang Ding & Matthew M. Edwards & Ning Wang & Xiang Zhu & Alexa N. Bracci & Michelle L. Hulke & Ya Hu & Yao Tong & Joyce Hsiao & Christine J. Charvet & Sulagna Ghosh & Robert E. Handsaker & Kevin Eg, 2021. "The genetic architecture of DNA replication timing in human pluripotent stem cells," Nature Communications, Nature, vol. 12(1), pages 1-18, December.
    16. Tingting Gong & Vanessa M Hayes & Eva K F Chan, 2020. "Shiny-SoSV: A web-based performance calculator for somatic structural variant detection," PLOS ONE, Public Library of Science, vol. 15(8), pages 1-20, August.
    17. Parithi Balachandran & Isha A. Walawalkar & Jacob I. Flores & Jacob N. Dayton & Peter A. Audano & Christine R. Beck, 2022. "Transposable element-mediated rearrangements are prevalent in human genomes," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    18. Wenmin Sun & Dan Xiong & Jiamin Ouyang & Xueshan Xiao & Yi Jiang & Yingwei Wang & Shiqiang Li & Ziying Xie & Junwen Wang & Zhonghui Tang & Qingjiong Zhang, 2024. "Altered chromatin topologies caused by balanced chromosomal translocation lead to central iris hypoplasia," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    19. Sudha Sunil Rajderkar & Kitt Paraiso & Maria Luisa Amaral & Michael Kosicki & Laura E. Cook & Fabrice Darbellay & Cailyn H. Spurrell & Marco Osterwalder & Yiwen Zhu & Han Wu & Sarah Yasmeen Afzal & Ma, 2024. "Dynamic enhancer landscapes in human craniofacial development," Nature Communications, Nature, vol. 15(1), pages 1-18, December.
    20. Can Luo & Yichen Henry Liu & Xin Maizie Zhou, 2024. "VolcanoSV enables accurate and robust structural variant calling in diploid genomes from single-molecule long read sequencing," Nature Communications, Nature, vol. 15(1), pages 1-20, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-38870-2. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.