IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v14y2023i1d10.1038_s41467-023-42336-w.html
   My bibliography  Save this article

Identification of errors in draft genome assemblies at single-nucleotide resolution for quality assessment and improvement

Author

Listed:
  • Kunpeng Li

    (the Chinese Academy of Sciences
    University of Chinese Academy of Sciences)

  • Peng Xu

    (the Chinese Academy of Sciences
    University of Chinese Academy of Sciences)

  • Jinpeng Wang

    (the Chinese Academy of Sciences
    University of Chinese Academy of Sciences)

  • Xin Yi

    (the Chinese Academy of Sciences
    China National Botanical Garden)

  • Yuannian Jiao

    (the Chinese Academy of Sciences
    University of Chinese Academy of Sciences
    China National Botanical Garden)

Abstract

Assembly of a high-quality genome is important for downstream comparative and functional genomic studies. However, most tools for genome assembly assessment only give qualitative reports, which do not pinpoint assembly errors at specific regions. Here, we develop a new reference-free tool, Clipping information for Revealing Assembly Quality (CRAQ), which maps raw reads back to assembled sequences to identify regional and structural assembly errors based on effective clipped alignment information. Error counts are transformed into corresponding assembly evaluation indexes to reflect the assembly quality at single-nucleotide resolution. Notably, CRAQ distinguishes assembly errors from heterozygous sites or structural differences between haplotypes. This tool can clearly indicate low-quality regions and potential structural error breakpoints; thus, it can identify misjoined regions that should be split for further scaffold building and improvement of the assembly. We have benchmarked CRAQ on multiple genomes assembled using different strategies, and demonstrated the misjoin correction for improving the constructed pseudomolecules.

Suggested Citation

  • Kunpeng Li & Peng Xu & Jinpeng Wang & Xin Yi & Yuannian Jiao, 2023. "Identification of errors in draft genome assemblies at single-nucleotide resolution for quality assessment and improvement," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
  • Handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-42336-w
    DOI: 10.1038/s41467-023-42336-w
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-023-42336-w
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-023-42336-w?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Ying Chen & Fan Nie & Shang-Qian Xie & Ying-Feng Zheng & Qi Dai & Thomas Bray & Yao-Xin Wang & Jian-Feng Xing & Zhi-Jian Huang & De-Peng Wang & Li-Juan He & Feng Luo & Jian-Xin Wang & Yi-Zhi Liu & Chu, 2021. "Efficient assembly of nanopore reads via highly accurate and intact error correction," Nature Communications, Nature, vol. 12(1), pages 1-10, December.
    2. Huilong Du & Ying Yu & Yanfei Ma & Qiang Gao & Yinghao Cao & Zhuo Chen & Bin Ma & Ming Qi & Yan Li & Xianfeng Zhao & Jing Wang & Kunfan Liu & Peng Qin & Xin Yang & Lihuang Zhu & Shigui Li & Chengzhi L, 2017. "Sequencing and de novo assembly of a near complete indica rice genome," Nature Communications, Nature, vol. 8(1), pages 1-12, August.
    3. Karen H. Miga & Sergey Koren & Arang Rhie & Mitchell R. Vollger & Ariel Gershman & Andrey Bzikadze & Shelise Brooks & Edmund Howe & David Porubsky & Glennis A. Logsdon & Valerie A. Schneider & Tamara , 2020. "Telomere-to-telomere assembly of a complete human X chromosome," Nature, Nature, vol. 585(7823), pages 79-84, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zhikun Wu & Zehang Jiang & Tong Li & Chuanbo Xie & Liansheng Zhao & Jiaqi Yang & Shuai Ouyang & Yizhi Liu & Tao Li & Zhi Xie, 2021. "Structural variants in the Chinese population and their impact on phenotypes, diseases and population adaptation," Nature Communications, Nature, vol. 12(1), pages 1-12, December.
    2. Ana Paula Zotta Mota & Georgios D. Koutsovoulos & Laetitia Perfus-Barbeoch & Evelin Despot-Slade & Karine Labadie & Jean-Marc Aury & Karine Robbe-Sermesant & Marc Bailly-Bechet & Caroline Belser & Art, 2024. "Unzipped genome assemblies of polyploid root-knot nematodes reveal unusual and clade-specific telomeric repeats," Nature Communications, Nature, vol. 15(1), pages 1-18, December.
    3. Rubén Barcia-Cruz & David Goudenège & Jorge A. Moura de Sousa & Damien Piel & Martial Marbouty & Eduardo P. C. Rocha & Frédérique Roux, 2024. "Phage-inducible chromosomal minimalist islands (PICMIs), a novel family of small marine satellites of virulent phages," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    4. Temitayo A. Olagunju & Benjamin D. Rosen & Holly L. Neibergs & Gabrielle M. Becker & Kimberly M. Davenport & Christine G. Elsik & Tracy S. Hadfield & Sergey Koren & Kristen L. Kuhn & Arang Rhie & Kati, 2024. "Telomere-to-telomere assemblies of cattle and sheep Y-chromosomes uncover divergent structure and gene content," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    5. Mohamed Awad & Xiangchao Gan, 2023. "GALA: a computational framework for de novo chromosome-by-chromosome assembly with long reads," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    6. Sarah Morrison-Smith & Christina Boucher & Aleksandra Sarcevic & Noelle Noyes & Catherine O’Brien & Nazaret Cuadros & Jaime Ruiz, 2022. "Challenges in large-scale bioinformatics projects," Palgrave Communications, Palgrave Macmillan, vol. 9(1), pages 1-9, December.
    7. Jingfen Huang & Yilin Zhang & Yapeng Li & Meng Xing & Cailin Lei & Shizhuang Wang & Yamin Nie & Yanyan Wang & Mingchao Zhao & Zhenyun Han & Xianjun Sun & Han Zhou & Yan Wang & Xiaoming Zheng & Xiaoron, 2024. "Haplotype-resolved gapless genome and chromosome segment substitution lines facilitate gene identification in wild rice," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    8. Tobias T. Schmidt & Carly Tyer & Preeyesh Rughani & Candy Haggblom & Jeffrey R. Jones & Xiaoguang Dai & Kelly A. Frazer & Fred H. Gage & Sissel Juul & Scott Hickey & Jan Karlseder, 2024. "High resolution long-read telomere sequencing reveals dynamic mechanisms in aging and cancer," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    9. Zhe Zhao & Zhi Ding & Jingjing Huang & Hengjun Meng & Zixu Zhang & Xin Gou & Huiwu Tang & Xianrong Xie & Jingyao Ping & Fangming Xiao & Yao-Guang Liu & Yongyao Xie & Letian Chen, 2023. "Copy number variation of the restorer Rf4 underlies human selection of three-line hybrid rice breeding," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    10. Joanna Hård & Jeff E. Mold & Jesper Eisfeldt & Christian Tellgren-Roth & Susana Häggqvist & Ignas Bunikis & Orlando Contreras-Lopez & Chen-Shan Chin & Jessica Nordlund & Carl-Johan Rubin & Lars Feuk &, 2023. "Long-read whole-genome analysis of human single cells," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    11. Gabriel E. Rech & Santiago Radío & Sara Guirao-Rico & Laura Aguilera & Vivien Horvath & Llewellyn Green & Hannah Lindstadt & Véronique Jamilloux & Hadi Quesneville & Josefa González, 2022. "Population-scale long-read sequencing uncovers transposable elements associated with gene expression variation and adaptive signatures in Drosophila," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    12. Yaohua You & H. M. Suraj & Linda Matz & A. Lorena Herrera Valderrama & Paul Ruigrok & Xiaoqian Shi-Kunne & Frank P. J. Pieterse & Anne Oostlander & Henriek G. Beenen & Edgar A. Chavarro-Carrero & Si Q, 2024. "Botrytis cinerea combines four molecular strategies to tolerate membrane-permeating plant compounds and to increase virulence," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    13. Souren Paul & Mark H. Kaplan & Dinesh Khanna & Preston M. McCourt & Anjan K. Saha & Pei-Suen Tsou & Mahek Anand & Alexander Radecki & Mohamad Mourad & Amr H. Sawalha & David M. Markovitz & Rafael Cont, 2022. "Centromere defects, chromosome instability, and cGAS-STING activation in systemic sclerosis," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    14. Xiao Luo & Xiongbin Kang & Alexander Schönhuth, 2022. "VeChat: correcting errors in long reads using variation graphs," Nature Communications, Nature, vol. 13(1), pages 1-12, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-42336-w. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.