IDEAS home Printed from https://ideas.repec.org/a/nat/nature/v587y2020i7833d10.1038_s41586-020-2871-y.html
   My bibliography  Save this article

Progressive Cactus is a multiple-genome aligner for the thousand-genome era

Author

Listed:
  • Joel Armstrong

    (UC Santa Cruz Genomics Institute, UC Santa Cruz)

  • Glenn Hickey

    (UC Santa Cruz Genomics Institute, UC Santa Cruz)

  • Mark Diekhans

    (UC Santa Cruz Genomics Institute, UC Santa Cruz)

  • Ian T. Fiddes

    (UC Santa Cruz Genomics Institute, UC Santa Cruz)

  • Adam M. Novak

    (UC Santa Cruz Genomics Institute, UC Santa Cruz)

  • Alden Deran

    (UC Santa Cruz Genomics Institute, UC Santa Cruz)

  • Qi Fang

    (BGI-Shenzhen, Beishan Industrial Zone
    University of Copenhagen)

  • Duo Xie

    (BGI-Shenzhen, Beishan Industrial Zone
    University of Chinese Academy of Sciences)

  • Shaohong Feng

    (BGI-Shenzhen, Beishan Industrial Zone
    Kunming Institute of Zoology, Chinese Academy of Sciences)

  • Josefin Stiller

    (University of Copenhagen)

  • Diane Genereux

    (Broad Institute of Harvard and Massachusetts Institute of Technology (MIT))

  • Jeremy Johnson

    (Broad Institute of Harvard and Massachusetts Institute of Technology (MIT))

  • Voichita Dana Marinescu

    (Uppsala University)

  • Jessica Alföldi

    (Broad Institute of Harvard and Massachusetts Institute of Technology (MIT))

  • Robert S. Harris

    (The Pennsylvania State University)

  • Kerstin Lindblad-Toh

    (Broad Institute of Harvard and Massachusetts Institute of Technology (MIT)
    Uppsala University)

  • David Haussler

    (UC Santa Cruz Genomics Institute, UC Santa Cruz
    Howard Hughes Medical Institute)

  • Elinor Karlsson

    (Broad Institute of Harvard and Massachusetts Institute of Technology (MIT)
    University of Massachusetts Medical School
    University of Massachusetts Medical School)

  • Erich D. Jarvis

    (Howard Hughes Medical Institute
    The Rockefeller University)

  • Guojie Zhang

    (University of Copenhagen
    Kunming Institute of Zoology, Chinese Academy of Sciences
    Chinese Academy of Sciences
    China National GeneBank, BGI-Shenzhen)

  • Benedict Paten

    (UC Santa Cruz Genomics Institute, UC Santa Cruz)

Abstract

New genome assemblies have been arriving at a rapidly increasing pace, thanks to decreases in sequencing costs and improvements in third-generation sequencing technologies1–3. For example, the number of vertebrate genome assemblies currently in the NCBI (National Center for Biotechnology Information) database4 increased by more than 50% to 1,485 assemblies in the year from July 2018 to July 2019. In addition to this influx of assemblies from different species, new human de novo assemblies5 are being produced, which enable the analysis of not only small polymorphisms, but also complex, large-scale structural differences between human individuals and haplotypes. This coming era and its unprecedented amount of data offer the opportunity to uncover many insights into genome evolution but also present challenges in how to adapt current analysis methods to meet the increased scale. Cactus6, a reference-free multiple genome alignment program, has been shown to be highly accurate, but the existing implementation scales poorly with increasing numbers of genomes, and struggles in regions of highly duplicated sequences. Here we describe progressive extensions to Cactus to create Progressive Cactus, which enables the reference-free alignment of tens to thousands of large vertebrate genomes while maintaining high alignment quality. We describe results from an alignment of more than 600 amniote genomes, which is to our knowledge the largest multiple vertebrate genome alignment created so far.

Suggested Citation

  • Joel Armstrong & Glenn Hickey & Mark Diekhans & Ian T. Fiddes & Adam M. Novak & Alden Deran & Qi Fang & Duo Xie & Shaohong Feng & Josefin Stiller & Diane Genereux & Jeremy Johnson & Voichita Dana Mari, 2020. "Progressive Cactus is a multiple-genome aligner for the thousand-genome era," Nature, Nature, vol. 587(7833), pages 246-251, November.
  • Handle: RePEc:nat:nature:v:587:y:2020:i:7833:d:10.1038_s41586-020-2871-y
    DOI: 10.1038/s41586-020-2871-y
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41586-020-2871-y
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1038/s41586-020-2871-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Matthew I. M. Louder & Hannah Justen & Abigail A. Kimmitt & Koedi S. Lawley & Leslie M. Turner & J. David Dickman & Kira E. Delmore, 2024. "Gene regulation and speciation in a migratory divide between songbirds," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    2. Botong Zhou & Ping Hu & Guichun Liu & Zhou Chang & Zhiwei Dong & Zihe Li & Yuan Yin & Zunzhe Tian & Ge Han & Wen Wang & Xueyan Li, 2024. "Evolutionary patterns and functional effects of 3D chromatin structures in butterflies with extensive genome rearrangements," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    3. Cristian Groza & Carl Schwendinger-Schreck & Warren A. Cheung & Emily G. Farrow & Isabelle Thiffault & Juniper Lake & William B. Rizzo & Gilad Evrony & Tom Curran & Guillaume Bourque & Tomi Pastinen, 2024. "Pangenome graphs improve the analysis of structural variants in rare genetic diseases," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    4. Guangji Chen & Dan Yu & Yu Yang & Xiang Li & Xiaojing Wang & Danyang Sun & Yanlin Lu & Rongqin Ke & Guojie Zhang & Jie Cui & Shaohong Feng, 2024. "Adaptive expansion of ERVK solo-LTRs is associated with Passeriformes speciation events," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    5. Francesco Cicconardi & Edoardo Milanetti & Erika C. Pinheiro de Castro & Anyi Mazo-Vargas & Steven M. Van Belleghem & Angelo Alberto Ruggieri & Pasi Rastas & Joseph Hanly & Elizabeth Evans & Chris D. , 2023. "Evolutionary dynamics of genome size and content during the adaptive radiation of Heliconiini butterflies," Nature Communications, Nature, vol. 14(1), pages 1-24, December.
    6. Junhui Peng & Li Zhao, 2024. "The origin and structural evolution of de novo genes in Drosophila," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    7. Jin Woo Oh & Michael A. Beer, 2024. "Gapped-kmer sequence modeling robustly identifies regulatory vocabularies and distal enhancers conserved between evolutionarily distant mammals," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    8. A. Talenti & J. Powell & J. D. Hemmink & E. A. J. Cook & D. Wragg & S. Jayaraman & E. Paxton & C. Ezeasor & E. T. Obishakin & E. R. Agusi & A. Tijjani & W. Amanyire & D. Muhanguzi & K. Marshall & A. F, 2022. "A cattle graph genome incorporating global breed diversity," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    9. Alexander S. Leonard & Danang Crysnanto & Zih-Hua Fang & Michael P. Heaton & Brian L. Vander Ley & Carolina Herrera & Heinrich Bollwein & Derek M. Bickhart & Kristen L. Kuhn & Timothy P. L. Smith & Be, 2022. "Structural variant-based pangenome construction has low sensitivity to variability of haplotype-resolved bovine assemblies," Nature Communications, Nature, vol. 13(1), pages 1-13, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:nature:v:587:y:2020:i:7833:d:10.1038_s41586-020-2871-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.