IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1005958.html
   My bibliography  Save this article

A phylogenetic method to perform genome-wide association studies in microbes that accounts for population structure and recombination

Author

Listed:
  • Caitlin Collins
  • Xavier Didelot

Abstract

Genome-Wide Association Studies (GWAS) in microbial organisms have the potential to vastly improve the way we understand, manage, and treat infectious diseases. Yet, microbial GWAS methods established thus far remain insufficiently able to capitalise on the growing wealth of bacterial and viral genetic sequence data. Facing clonal population structure and homologous recombination, existing GWAS methods struggle to achieve both the precision necessary to reject spurious findings and the power required to detect associations in microbes. In this paper, we introduce a novel phylogenetic approach that has been tailor-made for microbial GWAS, which is applicable to organisms ranging from purely clonal to frequently recombining, and to both binary and continuous phenotypes. Our approach is robust to the confounding effects of both population structure and recombination, while maintaining high statistical power to detect associations. Thorough testing via application to simulated data provides strong support for the power and specificity of our approach and demonstrates the advantages offered over alternative cluster-based and dimension-reduction methods. Two applications to Neisseria meningitidis illustrate the versatility and potential of our method, confirming previously-identified penicillin resistance loci and resulting in the identification of both well-characterised and novel drivers of invasive disease. Our method is implemented as an open-source R package called treeWAS which is freely available at https://github.com/caitiecollins/treeWAS.Author summary: Measurable differences often exist within a microbial population, with important ecological or epidemiological consequences. Examples include differences in growth rates, host range, transmissibility, antimicrobial resistance, virulence, etc. Understanding the genetic factors involved in these phenotypic properties is a crucial aim in microbial genomics. A fundamental approach for doing so is to perform a Genome-Wide Association Study (GWAS), where genomes are compared to search for genetic markers systematically correlated with the property of interest. If this strategy were implemented naively in microbes, it could lead to spurious results due to the confounding effects of population structure and recombination. Here we present treeWAS, a new phylogenetic method to perform microbial GWAS that avoids these pitfalls. We show, using simulated datasets, that treeWAS is able to distinguish between genetic markers that are truly associated with the property of interest and those that are not. Furthermore, we demonstrate that treeWAS offers advantages in both sensitivity and specificity over alternative cluster-based and dimension-reduction techniques. We also showcase treeWAS in two applications to real datasets from N. meningitidis. We have developed an easy-to-use implementation of treeWAS in the R environment, which should be useful to a wide range of researchers in microbial genomics.

Suggested Citation

  • Caitlin Collins & Xavier Didelot, 2018. "A phylogenetic method to perform genome-wide association studies in microbes that accounts for population structure and recombination," PLOS Computational Biology, Public Library of Science, vol. 14(2), pages 1-21, February.
  • Handle: RePEc:plo:pcbi00:1005958
    DOI: 10.1371/journal.pcbi.1005958
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005958
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1005958&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1005958?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1005958. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.