IDEAS home Printed from https://ideas.repec.org/a/oup/biomet/v104y2017i4p901-922..html
   My bibliography  Save this article

Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees

Author

Listed:
  • Tom M W Nye
  • Xiaoxian Tang
  • Grady Weyenberg
  • Ruriko Yoshida

Abstract

SummaryEvolutionary relationships are represented by phylogenetic trees, and a phylogenetic analysis of gene sequences typically produces a collection of these trees, one for each gene in the analysis. Analysis of samples of trees is difficult due to the multi-dimensionality of the space of possible trees. In Euclidean spaces, principal component analysis is a popular method of reducing high-dimensional data to a low-dimensional representation that preserves much of the sample’s structure. However, the space of all phylogenetic trees on a fixed set of species does not form a Euclidean vector space, and methods adapted to tree space are needed. Previous work introduced the notion of a principal geodesic in this space, analogous to the first principal component. Here we propose a geometric object for tree space similar to the $k$th principal component in Euclidean space: the locus of the weighted Fréchet mean of $k+1$ vertex trees when the weights vary over the $k$-simplex. We establish some basic properties of these objects, in particular showing that they have dimension $k$, and propose algorithms for projection onto these surfaces and for finding the principal locus associated with a sample of trees. Simulation studies demonstrate that these algorithms perform well, and analyses of two datasets, containing Apicomplexa and African coelacanth genomes respectively, reveal important structure from the second principal components.

Suggested Citation

  • Tom M W Nye & Xiaoxian Tang & Grady Weyenberg & Ruriko Yoshida, 2017. "Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees," Biometrika, Biometrika Trust, vol. 104(4), pages 901-922.
  • Handle: RePEc:oup:biomet:v:104:y:2017:i:4:p:901-922.
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1093/biomet/asx047
    Download Restriction: Access to full text is restricted to subscribers.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Lovato, Ilenia & Pini, Alessia & Stamm, Aymeric & Vantini, Simone, 2020. "Model-free two-sample test for network-valued data," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    2. Weiyi Ding & Xiaoxian Tang, 2021. "Projections of Tropical Fermat-Weber Points," Mathematics, MDPI, vol. 9(23), pages 1-23, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:oup:biomet:v:104:y:2017:i:4:p:901-922.. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Oxford University Press (email available below). General contact details of provider: https://academic.oup.com/biomet .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.