IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1007732.html
   My bibliography  Save this article

PPanGGOLiN: Depicting microbial diversity via a partitioned pangenome graph

Author

Listed:
  • Guillaume Gautreau
  • Adelme Bazin
  • Mathieu Gachet
  • Rémi Planel
  • Laura Burlot
  • Mathieu Dubois
  • Amandine Perrin
  • Claudine Médigue
  • Alexandra Calteau
  • Stéphane Cruveiller
  • Catherine Matias
  • Christophe Ambroise
  • Eduardo P C Rocha
  • David Vallenet

Abstract

The use of comparative genomics for functional, evolutionary, and epidemiological studies requires methods to classify gene families in terms of occurrence in a given species. These methods usually lack multivariate statistical models to infer the partitions and the optimal number of classes and don’t account for genome organization. We introduce a graph structure to model pangenomes in which nodes represent gene families and edges represent genomic neighborhood. Our method, named PPanGGOLiN, partitions nodes using an Expectation-Maximization algorithm based on multivariate Bernoulli Mixture Model coupled with a Markov Random Field. This approach takes into account the topology of the graph and the presence/absence of genes in pangenomes to classify gene families into persistent, cloud, and one or several shell partitions. By analyzing the partitioned pangenome graphs of isolate genomes from 439 species and metagenome-assembled genomes from 78 species, we demonstrate that our method is effective in estimating the persistent genome. Interestingly, it shows that the shell genome is a key element to understand genome dynamics, presumably because it reflects how genes present at intermediate frequencies drive adaptation of species, and its proportion in genomes is independent of genome size. The graph-based approach proposed by PPanGGOLiN is useful to depict the overall genomic diversity of thousands of strains in a compact structure and provides an effective basis for very large scale comparative genomics. The software is freely available at https://github.com/labgem/PPanGGOLiN.Author summary: Microorganisms have the greatest biodiversity and evolutionary history on earth. At the genomic level, it is reflected by a highly variable gene content even among organisms from the same species which explains the ability of microbes to be pathogenic or to grow in specific environments. We developed a new method called PPanGGOLiN which accurately represents the genomic diversity of a species (i.e. its pangenome) using a compact graph structure. Based on this pangenome graph, we classify genes by a statistical method according to their occurrence in the genomes. This method allowed us to build pangenomes even for uncultivated species at an unprecedented scale. We applied our method on all available genomes in databanks in order to depict the overall diversity of hundreds of species. Overall, our work enables microbiologists to explore and visualize pangenomes alike a subway map.

Suggested Citation

  • Guillaume Gautreau & Adelme Bazin & Mathieu Gachet & Rémi Planel & Laura Burlot & Mathieu Dubois & Amandine Perrin & Claudine Médigue & Alexandra Calteau & Stéphane Cruveiller & Catherine Matias & Chr, 2020. "PPanGGOLiN: Depicting microbial diversity via a partitioned pangenome graph," PLOS Computational Biology, Public Library of Science, vol. 16(3), pages 1-27, March.
  • Handle: RePEc:plo:pcbi00:1007732
    DOI: 10.1371/journal.pcbi.1007732
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007732
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1007732&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1007732?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Bouguila, Nizar, 2010. "On multivariate binary data clustering and feature weighting," Computational Statistics & Data Analysis, Elsevier, vol. 54(1), pages 120-134, January.
    2. Pedro H. Oliveira & Marie Touchon & Jean Cury & Eduardo P. C. Rocha, 2017. "The chromosomal organization of horizontal gene transfer in bacteria," Nature Communications, Nature, vol. 8(1), pages 1-11, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Lucie Semenec & Amy K. Cain & Catherine J. Dawson & Qi Liu & Hue Dinh & Hannah Lott & Anahit Penesyan & Ram Maharjan & Francesca L. Short & Karl A. Hassan & Ian T. Paulsen, 2023. "Cross-protection and cross-feeding between Klebsiella pneumoniae and Acinetobacter baumannii promotes their co-existence," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    2. Michael J. Tisza & Derek D. N. Smith & Andrew E. Clark & Jung-Ho Youn & Pavel P. Khil & John P. Dekker, 2023. "Roving methyltransferases generate a mosaic epigenetic landscape and influence evolution in Bacteroides fragilis group," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    3. Guilhem Royer & Olivier Clermont & Julie Marin & Bénédicte Condamine & Sara Dion & François Blanquart & Marco Galardini & Erick Denamur, 2023. "Epistatic interactions between the high pathogenicity island and other iron uptake systems shape Escherichia coli extra-intestinal virulence," Nature Communications, Nature, vol. 14(1), pages 1-15, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.

      More about this item

      Statistics

      Access and download statistics

      Corrections

      All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1007732. See general information about how to correct material in RePEc.

      If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

      If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

      If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

      For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

      Please note that corrections may take a couple of weeks to filter through the various RePEc services.

      IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.