IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0258693.html
   My bibliography  Save this article

Large-scale k-mer-based analysis of the informational properties of genomes, comparative genomics and taxonomy

Author

Listed:
  • Yuval Bussi
  • Ruti Kapon
  • Ziv Reich

Abstract

Information theoretic approaches are ubiquitous and effective in a wide variety of bioinformatics applications. In comparative genomics, alignment-free methods, based on short DNA words, or k-mers, are particularly powerful. We evaluated the utility of varying k-mer lengths for genome comparisons by analyzing their sequence space coverage of 5805 genomes in the KEGG GENOME database. In subsequent analyses on four k-mer lengths spanning the relevant range (11, 21, 31, 41), hierarchical clustering of 1634 genus-level representative genomes using pairwise 21- and 31-mer Jaccard similarities best recapitulated a phylogenetic/taxonomic tree of life with clear boundaries for superkingdom domains and high subtree similarity for named taxons at lower levels (family through phylum). By analyzing ~14.2M prokaryotic genome comparisons by their lowest-common-ancestor taxon levels, we detected many potential misclassification errors in a curated database, further demonstrating the need for wide-scale adoption of quantitative taxonomic classifications based on whole-genome similarity.

Suggested Citation

  • Yuval Bussi & Ruti Kapon & Ziv Reich, 2021. "Large-scale k-mer-based analysis of the informational properties of genomes, comparative genomics and taxonomy," PLOS ONE, Public Library of Science, vol. 16(10), pages 1-27, October.
  • Handle: RePEc:plo:pone00:0258693
    DOI: 10.1371/journal.pone.0258693
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0258693
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0258693&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0258693?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Chirag Jain & Luis M. Rodriguez-R & Adam M. Phillippy & Konstantinos T. Konstantinidis & Srinivas Aluru, 2018. "High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries," Nature Communications, Nature, vol. 9(1), pages 1-8, December.
    2. Fionn Murtagh & Pierre Legendre, 2014. "Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?," Journal of Classification, Springer;The Classification Society, vol. 31(3), pages 274-295, October.
    3. Alexandra M Schnoes & Shoshana D Brown & Igor Dodevski & Patricia C Babbitt, 2009. "Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies," PLOS Computational Biology, Public Library of Science, vol. 5(12), pages 1-13, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Maurizio Vichi & Carlo Cavicchia & Patrick J. F. Groenen, 2022. "Hierarchical Means Clustering," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 553-577, November.
    2. Jiao Jieying & Hu Guanyu & Yan Jun, 2021. "A Bayesian marked spatial point processes model for basketball shot chart," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 17(2), pages 77-90, June.
    3. Paulus, Michal & Kristoufek, Ladislav, 2015. "Worldwide clustering of the corruption perception," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 428(C), pages 351-358.
    4. Rui Fa & Domenico Cozzetto & Cen Wan & David T Jones, 2018. "Predicting human protein function with multi-task deep neural networks," PLOS ONE, Public Library of Science, vol. 13(6), pages 1-16, June.
    5. Hyeri Choi & Min Jae Park, 2019. "Evaluating the Efficiency of Governmental Excellence for Social Progress: Focusing on Low- and Lower-Middle-Income Countries," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 141(1), pages 111-130, January.
    6. Maksym Polyakov & Morteza Chalak & Md. Sayed Iftekhar & Ram Pandit & Sorada Tapsuwan & Fan Zhang & Chunbo Ma, 2018. "Authorship, Collaboration, Topics, and Research Gaps in Environmental and Resource Economics 1991–2015," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 71(1), pages 217-239, September.
    7. Grzegorz Maciejewski & Mirosława Malinowska & Barbara Kucharska & Michał Kucia & Beata Kolny, 2021. "Sustainable Development as a Factor Differentiating Consumer Behavior: The Case of Poland," European Research Studies Journal, European Research Studies Journal, vol. 0(3), pages 934-948.
    8. Giger, Markus & Mutea, Emily & Kiteme, Boniface & Eckert, Sandra & Anseeuw, Ward & Zaehringer, Julie G., 2020. "Large agricultural investments in Kenya’s Nanyuki Area: Inventory and analysis of business models," Land Use Policy, Elsevier, vol. 99(C).
    9. Walker, Nathan L. & Styles, David & Coughlan, Paul & Williams, A. Prysor, 2022. "Cross-sector sustainability benchmarking of major utilities in the United Kingdom," Utilities Policy, Elsevier, vol. 78(C).
    10. Pierre H. H. Schneeberger & Morgan Gueuning & Sophie Welsche & Eveline Hürlimann & Julian Dommann & Cécile Häberli & Jürg E. Frey & Somphou Sayasone & Jennifer Keiser, 2022. "Different gut microbial communities correlate with efficacy of albendazole-ivermectin against soil-transmitted helminthiases," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    11. Max E. Schön & Vasily V. Zlatogursky & Rohan P. Singh & Camille Poirier & Susanne Wilken & Varsha Mathur & Jürgen F. H. Strassert & Jarone Pinhassi & Alexandra Z. Worden & Patrick J. Keeling & Thijs J, 2021. "Single cell genomics reveals plastid-lacking Picozoa are close relatives of red algae," Nature Communications, Nature, vol. 12(1), pages 1-10, December.
    12. Abang Zainoren Abang Abdurahman & Syerina Azlin Md Nasir & Wan Fairos Wan Yaacob & Serah Jaya & Suhaili Mokhtar, 2021. "Spatio-Temporal Clustering of Sarawak Malaysia Total Protected Area Visitors," Sustainability, MDPI, vol. 13(21), pages 1-19, October.
    13. Mulu Abraha Woldegiorgis & Janet E. Hiller & Wubegzier Mekonnen & Jahar Bhowmik, 2018. "Disparities in maternal health services in sub-Saharan Africa," International Journal of Public Health, Springer;Swiss School of Public Health (SSPH+), vol. 63(4), pages 525-535, May.
    14. Monika Stanny & Łukasz Komorowski & Andrzej Rosner, 2021. "The Socio-Economic Heterogeneity of Rural Areas: Towards a Rural Typology of Poland," Energies, MDPI, vol. 14(16), pages 1-23, August.
    15. Renato Amorim, 2015. "Feature Relevance in Ward’s Hierarchical Clustering Using the L p Norm," Journal of Classification, Springer;The Classification Society, vol. 32(1), pages 46-62, April.
    16. Anca Gabriela Ilie & Marinela Luminita Emanuela Zlatea & Cristina Negreanu & Dan Dumitriu & Alma Pentescu, 2023. "Reliance on Russian Federation Energy Imports and Renewable Energy in the European Union," The AMFITEATRU ECONOMIC journal, Academy of Economic Studies - Bucharest, Romania, vol. 25(64), pages 780-780, August.
    17. Luiza Ossowska & Dorota Janiszewska & Natalia Bartkowiak-Bakun & Grzegorz Kwiatkowski, 2020. "Energy Consumption Versus Greenhouse Gas Emissions in EU," European Research Studies Journal, European Research Studies Journal, vol. 0(3), pages 185-198.
    18. Nenad Macesic & Jane Hawkey & Ben Vezina & Jessica A. Wisniewski & Hugh Cottingham & Luke V. Blakeway & Taylor Harshegyi & Katherine Pragastis & Gnei Zweena Badoordeen & Amanda Dennison & Denis W. Spe, 2023. "Genomic dissection of endemic carbapenem resistance reveals metallo-beta-lactamase dissemination through clonal, plasmid and integron transfer," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    19. Lerato Lerato & Thomas Niesler, 2015. "Clustering Acoustic Segments Using Multi-Stage Agglomerative Hierarchical Clustering," PLOS ONE, Public Library of Science, vol. 10(10), pages 1-24, October.
    20. Jon Ellingsen & Vegard H. Larsen & Leif Anders Thorsrud, 2020. "News Media vs. FRED-MD for Macroeconomic Forecasting," CESifo Working Paper Series 8639, CESifo.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0258693. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.