IDEAS home Printed from https://ideas.repec.org/a/bpj/sagmbi/v13y2014i4p17n5.html
   My bibliography  Save this article

Investigating the performance of AIC in selecting phylogenetic models

Author

Listed:
  • Jhwueng Dwueng-Chwuan

    (Department of Statistics, Feng-Chia University, Taichung, Taiwan 40724, R.O.C.)

  • Huzurbazar Snehalata

    (Statistical and Applied Mathematical Sciences Institute, Research Triangle Park, NC 27709, USA Department of Statistics, University of Wyoming, Laramie, WY 82071, USA Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA)

  • O’Meara Brian C.

    (Department of Ecology and Evolutionary Biology, University of Tennessee, Knoxville, TN 37996, USA)

  • Liu Liang

    (Department of Statistics and Institute of Bioinformatics, University of Georgia, 101 Cedar Street, Athens, GA 30606 USA)

Abstract

The popular likelihood-based model selection criterion, Akaike’s Information Criterion (AIC), is a breakthrough mathematical result derived from information theory. AIC is an approximation to Kullback-Leibler (KL) divergence with the derivation relying on the assumption that the likelihood function has finite second derivatives. However, for phylogenetic estimation, given that tree space is discrete with respect to tree topology, the assumption of a continuous likelihood function with finite second derivatives is violated. In this paper, we investigate the relationship between the expected log likelihood of a candidate model, and the expected KL divergence in the context of phylogenetic tree estimation. We find that given the tree topology, AIC is an unbiased estimator of the expected KL divergence. However, when the tree topology is unknown, AIC tends to underestimate the expected KL divergence for phylogenetic models. Simulation results suggest that the degree of underestimation varies across phylogenetic models so that even for large sample sizes, the bias of AIC can result in selecting a wrong model. As the choice of phylogenetic models is essential for statistical phylogenetic inference, it is important to improve the accuracy of model selection criteria in the context of phylogenetics.

Suggested Citation

  • Jhwueng Dwueng-Chwuan & Huzurbazar Snehalata & O’Meara Brian C. & Liu Liang, 2014. "Investigating the performance of AIC in selecting phylogenetic models," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 13(4), pages 459-475, August.
  • Handle: RePEc:bpj:sagmbi:v:13:y:2014:i:4:p:17:n:5
    DOI: 10.1515/sagmb-2013-0048
    as

    Download full text from publisher

    File URL: https://doi.org/10.1515/sagmb-2013-0048
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.1515/sagmb-2013-0048?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Makio Ishiguro & Yosiyuki Sakamoto & Genshiro Kitagawa, 1997. "Bootstrapping Log Likelihood and EIC, an Extension of AIC," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 49(3), pages 411-434, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Julieta Carril & Ricardo S. De Mendoza & Federico J. Degrange & Claudio G. Barbeito & Claudia P. Tambussi, 2024. "Evolution of avian foot morphology through anatomical network analysis," Nature Communications, Nature, vol. 15(1), pages 1-9, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shuichi Kawano, 2014. "Selection of tuning parameters in bridge regression models via Bayesian information criterion," Statistical Papers, Springer, vol. 55(4), pages 1207-1223, November.
    2. Ogasawara, Haruhiko, 2015. "Asymptotic cumulants of some information criteria," ビジネス創造センターディスカッション・ペーパー (Discussion papers of the Center for Business Creation) 10252/5446, Otaru University of Commerce.
    3. Davide fiaschi & Angela Parenti, 2013. "An Estimate of the Degree of Interconnectedness between European Regions: A Bayesian Model Averaging Approach," Discussion Papers 2013/171, Dipartimento di Economia e Management (DEM), University of Pisa, Pisa, Italy.
    4. Alok Tiwari & Mohammed Aljoufie, 2021. "Modeling Spatial Distribution and Determinant of PM 2.5 at Micro-Level Using Geographically Weighted Regression (GWR) to Inform Sustainable Mobility Policies in Campus Based on Evidence from King Abdu," Sustainability, MDPI, vol. 13(21), pages 1-14, October.
    5. Ogasawara, Haruhiko, 2015. "Asymptotic cumulants of some information criteria (2nd version)," ビジネス創造センターディスカッション・ペーパー (Discussion papers of the Center for Business Creation) 10252/5497, Otaru University of Commerce.
    6. Andrew Neath & Joseph Cavanaugh & Adam Weyhaupt, 2015. "Model evaluation, discrepancy function estimation, and social choice theory," Computational Statistics, Springer, vol. 30(1), pages 231-249, March.
    7. Fábio Bayer & Francisco Cribari-Neto, 2015. "Bootstrap-based model selection criteria for beta regressions," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 24(4), pages 776-795, December.
    8. Philip Reiss & Lei Huang & Joseph Cavanaugh & Amy Roy, 2012. "Resampling-based information criteria for best-subset regression," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 64(6), pages 1161-1186, December.
    9. Christopher H. Jackson & Simon G. Thompson & Linda D. Sharples, 2009. "Accounting for uncertainty in health economic decision models by using model averaging," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 172(2), pages 383-404, April.
    10. Genshiro Kitagawa & Sadanori Konishi, 2010. "Bias and variance reduction techniques for bootstrap information criteria," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 62(1), pages 209-234, February.
    11. Dimova, Rositsa B. & Markatou, Marianthi & Talal, Andrew H., 2011. "Information methods for model selection in linear mixed effects models with application to HCV data," Computational Statistics & Data Analysis, Elsevier, vol. 55(9), pages 2677-2697, September.
    12. Patrick Ten Eyck & Joseph E. Cavanaugh, 2018. "An Alternate Approach to Pseudo-Likelihood Model Selection in the Generalized Linear Mixed Modeling Framework," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 80(1), pages 98-122, May.
    13. Sugiyama Masashi & Müller Klaus-Robert, 2005. "Input-dependent estimation of generalization error under covariate shift," Statistics & Risk Modeling, De Gruyter, vol. 23(4), pages 249-279, April.
    14. Yanagihara, Hirokazu & Tonda, Tetsuji & Matsumoto, Chieko, 2006. "Bias correction of cross-validation criterion based on Kullback-Leibler information under a general condition," Journal of Multivariate Analysis, Elsevier, vol. 97(9), pages 1965-1975, October.
    15. Kato Kengo, 2011. "A note on moment convergence of bootstrap M-estimators," Statistics & Risk Modeling, De Gruyter, vol. 28(1), pages 51-61, March.
    16. Davide fiaschi & Lisa Gianmoena & Angela Parenti, 2013. "The Determinants of Growth Rate Volatility in European Regions," Discussion Papers 2013/170, Dipartimento di Economia e Management (DEM), University of Pisa, Pisa, Italy.
    17. B. Liquet & C. Sakarovitch & D. Commenges, 2003. "Bootstrap Choice of Estimators in Parametric and Semiparametric Families: An Extension of EIC," Biometrics, The International Biometric Society, vol. 59(1), pages 172-178, March.
    18. Yang, Ji-Chung, 2005. "Impact measurement for public investment evaluation: An application to Korea," Journal of Policy Modeling, Elsevier, vol. 27(5), pages 535-551, July.
    19. Patrick Ten Eyck & Joseph E. Cavanaugh, 2018. "Model selection criteria based on cross-validatory concordance statistics," Computational Statistics, Springer, vol. 33(2), pages 595-621, June.
    20. Yanagihara, Hirokazu, 2006. "Corrected version of AIC for selecting multivariate normal linear regression models in a general nonnormal case," Journal of Multivariate Analysis, Elsevier, vol. 97(5), pages 1070-1089, May.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:13:y:2014:i:4:p:17:n:5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.