IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1000501.html
   My bibliography  Save this article

Species Tree Inference by Minimizing Deep Coalescences

Author

Listed:
  • Cuong Than
  • Luay Nakhleh

Abstract

In a 1997 seminal paper, W. Maddison proposed minimizing deep coalescences, or MDC, as an optimization criterion for inferring the species tree from a set of incongruent gene trees, assuming the incongruence is exclusively due to lineage sorting. In a subsequent paper, Maddison and Knowles provided and implemented a search heuristic for optimizing the MDC criterion, given a set of gene trees. However, the heuristic is not guaranteed to compute optimal solutions, and its hill-climbing search makes it slow in practice.In this paper, we provide two exact solutions to the problem of inferring the species tree from a set of gene trees under the MDC criterion. In other words, our solutions are guaranteed to find the tree that minimizes the total number of deep coalescences from a set of gene trees. One solution is based on a novel integer linear programming (ILP) formulation, and another is based on a simple dynamic programming (DP) approach. Powerful ILP solvers, such as CPLEX, make the first solution appealing, particularly for very large-scale instances of the problem, whereas the DP-based solution eliminates dependence on proprietary tools, and its simplicity makes it easy to integrate with other genomic events that may cause gene tree incongruence.Using the exact solutions, we analyze a data set of 106 loci from eight yeast species, a data set of 268 loci from eight Apicomplexan species, and several simulated data sets. We show that the MDC criterion provides very accurate estimates of the species tree topologies, and that our solutions are very fast, thus allowing for the accurate analysis of genome-scale data sets. Further, the efficiency of the solutions allow for quick exploration of sub-optimal solutions, which is important for a parsimony-based criterion such as MDC, as we show. We show that searching for the species tree in the compatibility graph of the clusters induced by the gene trees may be sufficient in practice, a finding that helps ameliorate the computational requirements of optimization solutions. Further, we study the statistical consistency and convergence rate of the MDC criterion, as well as its optimality in inferring the species tree. Finally, we show how our solutions can be used to identify potential horizontal gene transfer events that may have caused some of the incongruence in the data, thus augmenting Maddison's original framework. We have implemented our solutions in the PhyloNet software package, which is freely available at: http://bioinfo.cs.rice.edu/phylonet.Author Summary: Inferring the evolutionary history of a set of species, known as the species tree, is a task of utmost significance in biology and beyond. The traditional approach to accomplishing this task from molecular sequences entails sequencing a gene in the set of species under consideration, reconstructing the gene's evolutionary history, and declaring it to be the species tree. However, recent analyses of multiple gene data sets, made available thanks to advances in sequencing technologies, have indicated that gene trees in the same group of species may disagree with each other, as well as with the species tree. Therefore, the development of methods for inferring the species tree despite such disagreements is imperative.

Suggested Citation

  • Cuong Than & Luay Nakhleh, 2009. "Species Tree Inference by Minimizing Deep Coalescences," PLOS Computational Biology, Public Library of Science, vol. 5(9), pages 1-12, September.
  • Handle: RePEc:plo:pcbi00:1000501
    DOI: 10.1371/journal.pcbi.1000501
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000501
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1000501&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1000501?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Antonis Rokas & Barry L. Williams & Nicole King & Sean B. Carroll, 2003. "Genome-scale approaches to resolving incongruence in molecular phylogenies," Nature, Nature, vol. 425(6960), pages 798-804, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wang Yuancheng & Degnan James H, 2011. "Performance of Matrix Representation with Parsimony for Inferring Species from Gene Trees," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-39, May.
    2. Martín Espariz & Federico A Zuljan & Luis Esteban & Christian Magni, 2016. "Taxonomic Identity Resolution of Highly Phylogenetically Related Strains and Selection of Phylogenetic Markers by Using Genome-Scale Methods: The Bacillus pumilus Group Case," PLOS ONE, Public Library of Science, vol. 11(9), pages 1-17, September.
    3. Rahul Siddharthan & Eric D Siggia & Erik van Nimwegen, 2005. "PhyloGibbs: A Gibbs Sampling Motif Finder That Incorporates Phylogeny," PLOS Computational Biology, Public Library of Science, vol. 1(7), pages 1-23, December.
    4. Wei-Bung Wang & Tao Jiang & Shea Gardner, 2013. "Detection of Homologous Recombination Events in Bacterial Genomes," PLOS ONE, Public Library of Science, vol. 8(10), pages 1-14, October.
    5. Roch, Sebastien & Steel, Mike, 2015. "Likelihood-based tree reconstruction on a concatenation of aligned sequence data sets can be statistically inconsistent," Theoretical Population Biology, Elsevier, vol. 100(C), pages 56-62.
    6. David Peris & Emily J. Ubbelohde & Meihua Christina Kuang & Jacek Kominek & Quinn K. Langdon & Marie Adams & Justin A. Koshalek & Amanda Beth Hulfachor & Dana A. Opulente & David J. Hall & Katie Hyma , 2023. "Macroevolutionary diversity of traits and genomes in the model yeast genus Saccharomyces," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    7. Haque Md Rejuan & Kubatko Laura, 2024. "A global test of hybrid ancestry from genome-scale data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 23(1), pages 1-18, January.
    8. Alexei J Drummond & Simon Y W Ho & Matthew J Phillips & Andrew Rambaut, 2006. "Relaxed Phylogenetics and Dating with Confidence," PLOS Biology, Public Library of Science, vol. 4(5), pages 1-1, March.
    9. Sergio Consoli & Jan Korst & Steffen Pauws & Gijs Geleijnse, 2020. "Improved metaheuristics for the quartet method of hierarchical clustering," Journal of Global Optimization, Springer, vol. 78(2), pages 241-270, October.
    10. Siewert Elizabeth A & Kechris Katerina J, 2009. "Prediction of Motifs Based on a Repeated-Measures Model for Integrating Cross-Species Sequence and Expression Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-36, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1000501. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.