IDEAS home Printed from https://ideas.repec.org/a/nat/nature/v592y2021i7856d10.1038_s41586-021-03451-0.html
   My bibliography  Save this article

Towards complete and error-free genome assemblies of all vertebrate species

Author

Listed:
  • Arang Rhie

    (National Human Genome Research Institute, National Institutes of Health)

  • Shane A. McCarthy

    (University of Cambridge
    Wellcome Sanger Institute)

  • Olivier Fedrigo

    (The Rockefeller University)

  • Joana Damas

    (University of California Davis)

  • Giulio Formenti

    (The Rockefeller University
    The Rockefeller University)

  • Sergey Koren

    (National Human Genome Research Institute, National Institutes of Health)

  • Marcela Uliano-Silva

    (Department of Evolutionary Genetics
    Berlin Center for Genomics in Biodiversity Research)

  • William Chow

    (Wellcome Sanger Institute)

  • Arkarachai Fungtammasan

    (DNAnexus Inc.)

  • Juwan Kim

    (Seoul National University)

  • Chul Lee

    (Seoul National University)

  • Byung June Ko

    (Seoul National University)

  • Mark Chaisson

    (University of Southern California)

  • Gregory L. Gedman

    (The Rockefeller University)

  • Lindsey J. Cantin

    (The Rockefeller University)

  • Francoise Thibaud-Nissen

    (National Library of Medicine, NIH)

  • Leanne Haggerty

    (Wellcome Genome Campus)

  • Iliana Bista

    (University of Cambridge
    Wellcome Sanger Institute)

  • Michelle Smith

    (Wellcome Sanger Institute)

  • Bettina Haase

    (The Rockefeller University)

  • Jacquelyn Mountcastle

    (The Rockefeller University)

  • Sylke Winkler

    (Max Planck Institute of Molecular Cell Biology and Genetics
    DRESDEN-concept Genome Center)

  • Sadye Paez

    (The Rockefeller University
    The Rockefeller University)

  • Jason Howard

    (Novogene)

  • Sonja C. Vernes

    (Max Planck Institute for Psycholinguistics
    Cognition and Behaviour
    University of St Andrews)

  • Tanya M. Lama

    (University of Massachusetts Cooperative Fish and Wildlife Research Unit)

  • Frank Grutzner

    (University of Adelaide)

  • Wesley C. Warren

    (University of Missouri)

  • Christopher N. Balakrishnan

    (East Carolina University)

  • Dave Burt

    (University of Queensland)

  • Julia M. George

    (Clemson University)

  • Matthew T. Biegler

    (The Rockefeller University)

  • David Iorns

    (The Genetic Rescue Foundation)

  • Andrew Digby

    (Kākāpō Recovery, Department of Conservation)

  • Daryl Eason

    (Kākāpō Recovery, Department of Conservation)

  • Bruce Robertson

    (University of Otago)

  • Taylor Edwards

    (University of Arizona Genetics Core)

  • Mark Wilkinson

    (Natural History Museum)

  • George Turner

    (Bangor University)

  • Axel Meyer

    (University of Konstanz)

  • Andreas F. Kautt

    (University of Konstanz
    Harvard University)

  • Paolo Franchini

    (University of Konstanz)

  • H. William Detrich

    (Northeastern University Marine Science Center)

  • Hannes Svardal

    (University of Antwerp
    Naturalis Biodiversity Center)

  • Maximilian Wagner

    (Karl-Franzens University of Graz)

  • Gavin J. P. Naylor

    (University of Florida)

  • Martin Pippel

    (Max Planck Institute of Molecular Cell Biology and Genetics
    Center for Systems Biology)

  • Milan Malinsky

    (Wellcome Sanger Institute
    University of Basel)

  • Mark Mooney

    (Tag.bio)

  • Maria Simbirsky

    (DNAnexus Inc.)

  • Brett T. Hannigan

    (DNAnexus Inc.)

  • Trevor Pesout

    (University of California)

  • Marlys Houck

    (San Diego Zoo Global)

  • Ann Misuraca

    (San Diego Zoo Global)

  • Sarah B. Kingan

    (Pacific Biosciences)

  • Richard Hall

    (Pacific Biosciences)

  • Zev Kronenberg

    (Pacific Biosciences)

  • Ivan Sović

    (Pacific Biosciences
    Digital BioLogic)

  • Christopher Dunn

    (Pacific Biosciences)

  • Zemin Ning

    (Wellcome Sanger Institute)

  • Alex Hastie

    (Bionano Genomics)

  • Joyce Lee

    (Bionano Genomics)

  • Siddarth Selvaraj

    (Arima Genomics)

  • Richard E. Green

    (University of California
    Dovetail Genomics)

  • Nicholas H. Putnam

    (Independent Researcher)

  • Ivo Gut

    (Barcelona Institute of Science and Technology
    Universitat Pompeu Fabra)

  • Jay Ghurye

    (Dovetail Genomics
    University of Maryland College Park)

  • Erik Garrison

    (University of California)

  • Ying Sims

    (Wellcome Sanger Institute)

  • Joanna Collins

    (Wellcome Sanger Institute)

  • Sarah Pelan

    (Wellcome Sanger Institute)

  • James Torrance

    (Wellcome Sanger Institute)

  • Alan Tracey

    (Wellcome Sanger Institute)

  • Jonathan Wood

    (Wellcome Sanger Institute)

  • Robel E. Dagnew

    (University of Southern California)

  • Dengfeng Guan

    (University of Cambridge
    Harbin Institute of Technology)

  • Sarah E. London

    (University of Chicago)

  • David F. Clayton

    (Clemson University)

  • Claudio V. Mello

    (Oregon Health and Science University)

  • Samantha R. Friedrich

    (Oregon Health and Science University)

  • Peter V. Lovell

    (Oregon Health and Science University)

  • Ekaterina Osipova

    (Max Planck Institute of Molecular Cell Biology and Genetics
    Center for Systems Biology
    Max Planck Institute for the Physics of Complex Systems)

  • Farooq O. Al-Ajli

    (Monash University Malaysia Genomics Facility, School of Science
    Monash University Malaysia
    Qatar Falcon Genome Project)

  • Simona Secomandi

    (University of Milan)

  • Heebal Kim

    (Seoul National University
    Seoul National University
    eGnome, Inc.)

  • Constantina Theofanopoulou

    (The Rockefeller University)

  • Michael Hiller

    (LOEWE Centre for Translational Biodiversity Genomics
    Senckenberg Research Institute
    Goethe-University, Faculty of Biosciences)

  • Yang Zhou

    (BGI-Shenzhen)

  • Robert S. Harris

    (Pennsylvania State University)

  • Kateryna D. Makova

    (Pennsylvania State University
    Pennsylvania State University
    Pennsylvania State University)

  • Paul Medvedev

    (Pennsylvania State University
    Pennsylvania State University
    Pennsylvania State University
    Pennsylvania State University)

  • Jinna Hoffman

    (National Library of Medicine, NIH)

  • Patrick Masterson

    (National Library of Medicine, NIH)

  • Karen Clark

    (National Library of Medicine, NIH)

  • Fergal Martin

    (Wellcome Genome Campus)

  • Kevin Howe

    (Wellcome Genome Campus)

  • Paul Flicek

    (Wellcome Genome Campus)

  • Brian P. Walenz

    (National Human Genome Research Institute, National Institutes of Health)

  • Woori Kwak

    (eGnome, Inc.
    Hoonygen)

  • Hiram Clawson

    (University of California)

  • Mark Diekhans

    (University of California)

  • Luis Nassar

    (University of California)

  • Benedict Paten

    (University of California)

  • Robert H. S. Kraus

    (University of Konstanz
    Max Planck Institute of Animal Behavior)

  • Andrew J. Crawford

    (Universidad de los Andes)

  • M. Thomas P. Gilbert

    (University of Copenhagen
    University Museum, NTNU)

  • Guojie Zhang

    (China National Genebank, BGI-Shenzhen
    University of Copenhagen
    Kunming Institute of Zoology, Chinese Academy of Sciences
    Chinese Academy of Sciences)

  • Byrappa Venkatesh

    (Institute of Molecular and Cell Biology, A*STAR, Biopolis)

  • Robert W. Murphy

    (Centre for Biodiversity, Royal Ontario Museum)

  • Klaus-Peter Koepfli

    (Smithsonian Conservation Biology Institute, Center for Species Survival, National Zoological Park)

  • Beth Shapiro

    (University of California Santa Cruz
    Howard Hughes Medical Institute)

  • Warren E. Johnson

    (Smithsonian Conservation Biology Institute, Center for Species Survival, National Zoological Park
    Smithsonian Institution
    Walter Reed Army Institute of Research)

  • Federica Palma

    (University of East Anglia)

  • Tomas Marques-Bonet

    (Institute of Evolutionary Biology (UPF-CSIC), PRBB
    Catalan Institution of Research and Advanced Studies (ICREA)
    Barcelona Institute of Science and Technology (BIST)
    Universitat Autònoma de Barcelona)

  • Emma C. Teeling

    (University College Dublin)

  • Tandy Warnow

    (The University of Illinois at Urbana-Champaign)

  • Jennifer Marshall Graves

    (La Trobe University)

  • Oliver A. Ryder

    (San Diego Zoo Global
    University of California San Diego)

  • David Haussler

    (University of California
    University of California Santa Cruz)

  • Stephen J. O’Brien

    (ITMO University
    Nova Southeastern University)

  • Jonas Korlach

    (Pacific Biosciences)

  • Harris A. Lewin

    (University of California Davis
    University of California Davis
    University of California Davis)

  • Kerstin Howe

    (Wellcome Sanger Institute)

  • Eugene W. Myers

    (Max Planck Institute of Molecular Cell Biology and Genetics
    Center for Systems Biology
    Technical University Dresden)

  • Richard Durbin

    (University of Cambridge
    Wellcome Sanger Institute)

  • Adam M. Phillippy

    (National Human Genome Research Institute, National Institutes of Health)

  • Erich D. Jarvis

    (The Rockefeller University
    The Rockefeller University
    Howard Hughes Medical Institute)

Abstract

High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1–4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.

Suggested Citation

  • Arang Rhie & Shane A. McCarthy & Olivier Fedrigo & Joana Damas & Giulio Formenti & Sergey Koren & Marcela Uliano-Silva & William Chow & Arkarachai Fungtammasan & Juwan Kim & Chul Lee & Byung June Ko &, 2021. "Towards complete and error-free genome assemblies of all vertebrate species," Nature, Nature, vol. 592(7856), pages 737-746, April.
  • Handle: RePEc:nat:nature:v:592:y:2021:i:7856:d:10.1038_s41586-021-03451-0
    DOI: 10.1038/s41586-021-03451-0
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41586-021-03451-0
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1038/s41586-021-03451-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Joanna Hård & Jeff E. Mold & Jesper Eisfeldt & Christian Tellgren-Roth & Susana Häggqvist & Ignas Bunikis & Orlando Contreras-Lopez & Chen-Shan Chin & Jessica Nordlund & Carl-Johan Rubin & Lars Feuk &, 2023. "Long-read whole-genome analysis of human single cells," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    2. Sarah Morrison-Smith & Christina Boucher & Aleksandra Sarcevic & Noelle Noyes & Catherine O’Brien & Nazaret Cuadros & Jaime Ruiz, 2022. "Challenges in large-scale bioinformatics projects," Palgrave Communications, Palgrave Macmillan, vol. 9(1), pages 1-9, December.
    3. Heiner Kuhl & Kang Du & Manfred Schartl & Lukáš Kalous & Matthias Stöck & Dunja K. Lamatsch, 2022. "Equilibrated evolution of the mixed auto-/allopolyploid haplotype-resolved genome of the invasive hexaploid Prussian carp," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    4. Steen W. B. Bender & Marcus W. Dreisler & Min Zhang & Jacob Kæstel-Hansen & Nikos S. Hatzakis, 2024. "SEMORE: SEgmentation and MORphological fingErprinting by machine learning automates super-resolution data analysis," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    5. Max Lundberg & Alexander Mackintosh & Anna Petri & Staffan Bensch, 2023. "Inversions maintain differences between migratory phenotypes of a songbird," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    6. Ashley T. Sendell-Price & Frank J. Tulenko & Mats Pettersson & Du Kang & Margo Montandon & Sylke Winkler & Kathleen Kulb & Gavin P. Naylor & Adam Phillippy & Olivier Fedrigo & Jacquelyn Mountcastle & , 2023. "Low mutation rate in epaulette sharks is consistent with a slow rate of evolution in sharks," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    7. Lewis Stevens & Isaac Martínez-Ugalde & Erna King & Martin Wagah & Dominic Absolon & Rowan Bancroft & Pablo Gonzalez de la Rosa & Jessica L. Hall & Manuela Kieninger & Agnieszka Kloch & Sarah Pelan & , 2023. "Ancient diversity in host-parasite interaction genes in a model parasitic nematode," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    8. Zhen Huang & Ivanete De O. Furo & Jing Liu & Valentina Peona & Anderson J. B. Gomes & Wan Cen & Hao Huang & Yanding Zhang & Duo Chen & Ting Xue & Qiujin Zhang & Zhicao Yue & Quanxi Wang & Lingyu Yu & , 2022. "Recurrent chromosome reshuffling and the evolution of neo-sex chromosomes in parrots," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    9. Matteo Sebastianelli & Sifiso M. Lukhele & Simona Secomandi & Stacey G. Souza & Bettina Haase & Michaella Moysi & Christos Nikiforou & Alexander Hutfluss & Jacquelyn Mountcastle & Jennifer Balacco & S, 2024. "A genomic basis of vocal rhythm in birds," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    10. Xiao Luo & Xiongbin Kang & Alexander Schönhuth, 2022. "VeChat: correcting errors in long reads using variation graphs," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    11. Iliana Bista & Jonathan M. D. Wood & Thomas Desvignes & Shane A. McCarthy & Michael Matschiner & Zemin Ning & Alan Tracey & James Torrance & Ying Sims & William Chow & Michelle Smith & Karen Oliver & , 2023. "Genomics of cold adaptations in the Antarctic notothenioid fish radiation," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    12. Donna M. Bond & Oscar Ortega-Recalde & Melanie K. Laird & Takashi Hayakawa & Kyle S. Richardson & Finlay.C. B. Reese & Bruce Kyle & Brooke E. McIsaac-Williams & Bruce C. Robertson & Yolanda Heezik & A, 2023. "The admixed brushtail possum genome reveals invasion history in New Zealand and novel imprinted genes," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    13. Alexander S. Leonard & Danang Crysnanto & Zih-Hua Fang & Michael P. Heaton & Brian L. Vander Ley & Carolina Herrera & Heinrich Bollwein & Derek M. Bickhart & Kristen L. Kuhn & Timothy P. L. Smith & Be, 2022. "Structural variant-based pangenome construction has low sensitivity to variability of haplotype-resolved bovine assemblies," Nature Communications, Nature, vol. 13(1), pages 1-13, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:nature:v:592:y:2021:i:7856:d:10.1038_s41586-021-03451-0. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.