IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0129183.html
   My bibliography  Save this article

Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses

Author

Listed:
  • Md Shamsuzzoha Bayzid
  • Siavash Mirarab
  • Bastien Boussau
  • Tandy Warnow

Abstract

Because biological processes can result in different loci having different evolutionary histories, species tree estimation requires multiple loci from across multiple genomes. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called "summary methods". Because summary methods are generally fast (and much faster than more complicated coalescent-based methods that co-estimate gene trees and species trees), they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate in biologically realistic conditions. Mirarab et al. (Science 2014) presented the "statistical binning" technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple heuristic to evaluate "combinability" and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomic pipeline does not have the desirable property of being statistically consistent. We show that weighting the re-calculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, "weighted statistical binning" enables highly accurate genome-scale species tree estimation, and is also statistically consistent under the multi-species coalescent model. New data used in this study are available at DOI: http://dx.doi.org/10.6084/m9.figshare.1411146, and the software is available at https://github.com/smirarab/binning.

Suggested Citation

  • Md Shamsuzzoha Bayzid & Siavash Mirarab & Bastien Boussau & Tandy Warnow, 2015. "Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses," PLOS ONE, Public Library of Science, vol. 10(6), pages 1-40, June.
  • Handle: RePEc:plo:pone00:0129183
    DOI: 10.1371/journal.pone.0129183
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0129183
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0129183&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0129183?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Alexander Suh & Martin Paus & Martin Kiefmann & Gennady Churakov & Franziska Anni Franke & Jürgen Brosius & Jan Ole Kriegs & Jürgen Schmitz, 2011. "Mesozoic retroposons reveal parrots as the closest living relatives of passerine birds," Nature Communications, Nature, vol. 2(1), pages 1-7, September.
    2. Leonidas Salichos & Antonis Rokas, 2013. "Inferring ancient divergences requires genes with strong phylogenetic signals," Nature, Nature, vol. 497(7449), pages 327-331, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ruriko Yoshida & Kenji Fukumizu & Chrysafis Vogiatzis, 2019. "Multilocus phylogenetic analysis with gene tree clustering," Annals of Operations Research, Springer, vol. 276(1), pages 293-313, May.
    2. Pahl, Cameron C. & Ruedas, Luis A., 2021. "Carnosaurs as Apex Scavengers: Agent-based simulations reveal possible vulture analogues in late Jurassic Dinosaurs," Ecological Modelling, Elsevier, vol. 458(C).
    3. Andrej Kuritzin & Tabea Kischka & Jürgen Schmitz & Gennady Churakov, 2016. "Incomplete Lineage Sorting and Hybridization Statistics for Large-Scale Retroposon Insertion Data," PLOS Computational Biology, Public Library of Science, vol. 12(3), pages 1-20, March.
    4. Justin C Havird & Scott R Santos, 2014. "Performance of Single and Concatenated Sets of Mitochondrial Genes at Inferring Metazoan Relationships Relative to Full Mitogenome Data," PLOS ONE, Public Library of Science, vol. 9(1), pages 1-10, January.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0129183. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.