IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v105y2017icp1-10.html
   My bibliography  Save this article

Model-based simultaneous clustering and ordination of multivariate abundance data in ecology

Author

Listed:
  • Hui, Francis K.C.

Abstract

When studying multivariate abundance data, one of the main patterns ecologists are often interested in is whether the sites exhibit clustering on the low-dimensional, ordination space representing species composition. A new model-based approach called CORAL (Clustering and Ordination Regression AnaLysis) is developed for tackling this question, based on performing simultaneous clustering and ordination using latent variable regression. By drawing the latent variables from a finite mixture density, CORAL probabilistically classifies sites based on their positions on an underlying signal space. This is similar to mixtures of factor analyzers, except CORAL is designed for non-normal responses and uses species-specific rather than cluster-specific factor loadings (regression coefficients). Estimation is performed via Bayesian MCMC sampling, with code provided in the Supplementary Material. Simulations demonstrate that, by utilizing the joint information available in the data for both classification and dimension reduction, CORAL outperforms several popular, algorithm-based methods for clustering and ordination in ecology. CORAL is applied to a dataset of presence–absence records collected at sites along the Doubs River near the France–Switzerland border, with results revealing two clusters or ecological regions partly resembling the spatial separation of upstream and downstream sites.

Suggested Citation

  • Hui, Francis K.C., 2017. "Model-based simultaneous clustering and ordination of multivariate abundance data in ecology," Computational Statistics & Data Analysis, Elsevier, vol. 105(C), pages 1-10.
  • Handle: RePEc:eee:csdana:v:105:y:2017:i:c:p:1-10
    DOI: 10.1016/j.csda.2016.07.008
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947316301724
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2016.07.008?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. M. O. Hill, 1974. "Correspondence Analysis: A Neglected Multivariate Method," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 23(3), pages 340-354, November.
    2. Murray, Paula M. & Browne, Ryan P. & McNicholas, Paul D., 2014. "Mixtures of skew-t factor analyzers," Computational Statistics & Data Analysis, Elsevier, vol. 77(C), pages 326-335.
    3. Francis K.C. Hui & David I. Warton & Scott D. Foster, 2015. "Order selection in finite mixture models: complete or observed likelihood information criteria?," Biometrika, Biometrika Trust, vol. 102(3), pages 724-730.
    4. Russell B. Millar, 2009. "Comparison of Hierarchical Bayesian Models for Overdispersed Count Data using DIC and Bayes' Factors," Biometrics, The International Biometric Society, vol. 65(3), pages 962-969, September.
    5. Dray, Stéphane & Dufour, Anne-Béatrice, 2007. "The ade4 Package: Implementing the Duality Diagram for Ecologists," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 22(i04).
    6. Polak, Marike & Heiser, Willem J. & de Rooij, Mark, 2009. "Two types of single-peaked data: Correspondence analysis as an alternative to principal component analysis," Computational Statistics & Data Analysis, Elsevier, vol. 53(8), pages 3117-3128, June.
    7. Glenn Milligan & Martha Cooper, 1985. "An examination of procedures for determining the number of clusters in a data set," Psychometrika, Springer;The Psychometric Society, vol. 50(2), pages 159-179, June.
    8. David J. Spiegelhalter & Nicola G. Best & Bradley P. Carlin & Angelika Van Der Linde, 2002. "Bayesian measures of model complexity and fit," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(4), pages 583-639, October.
    9. Irène Gijbels & Marek Omelka, 2013. "Testing for Homogeneity of Multivariate Dispersions Using Dissimilarity Measures," Biometrics, The International Biometric Society, vol. 69(1), pages 137-145, March.
    10. Matthew Stephens, 2000. "Dealing with label switching in mixture models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 62(4), pages 795-809.
    11. Philippe Huber & Elvezio Ronchetti & Maria‐Pia Victoria‐Feser, 2004. "Estimation of generalized linear latent variable models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 66(4), pages 893-908, November.
    12. Pledger, Shirley & Arnold, Richard, 2014. "Multivariate methods using mixtures: Correspondence analysis, scaling and pattern-detection," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 241-261.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Francis K. C. Hui & Samuel Müller & Alan H. Welsh, 2021. "Random Effects Misspecification Can Have Severe Consequences for Random Effects Inference in Linear Mixed Models," International Statistical Review, International Statistical Institute, vol. 89(1), pages 186-206, April.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Cai, Jing-Heng & Song, Xin-Yuan & Lam, Kwok-Hap & Ip, Edward Hak-Sing, 2011. "A mixture of generalized latent variable models for mixed mode and heterogeneous data," Computational Statistics & Data Analysis, Elsevier, vol. 55(11), pages 2889-2907, November.
    2. Park, Byung-Jung & Zhang, Yunlong & Lord, Dominique, 2010. "Bayesian mixture modeling approach to account for heterogeneity in speed data," Transportation Research Part B: Methodological, Elsevier, vol. 44(5), pages 662-673, June.
    3. Papastamoulis, Panagiotis, 2018. "Overfitting Bayesian mixtures of factor analyzers with an unknown number of components," Computational Statistics & Data Analysis, Elsevier, vol. 124(C), pages 220-234.
    4. Blasius, J. & Greenacre, M. & Groenen, P.J.F. & van de Velden, M., 2009. "Special issue on correspondence analysis and related methods," Computational Statistics & Data Analysis, Elsevier, vol. 53(8), pages 3103-3106, June.
    5. Marco, Nicholas & Şentürk, Damla & Jeste, Shafali & DiStefano, Charlotte C. & Dickinson, Abigail & Telesca, Donatello, 2024. "Flexible regularized estimation in high-dimensional mixed membership models," Computational Statistics & Data Analysis, Elsevier, vol. 194(C).
    6. Royce Anders & William Batchelder, 2015. "Cultural Consensus Theory for the Ordinal Data Case," Psychometrika, Springer;The Psychometric Society, vol. 80(1), pages 151-181, March.
    7. Lu, Xiaosun & Huang, Yangxin & Zhu, Yiliang, 2016. "Finite mixture of nonlinear mixed-effects joint models in the presence of missing and mismeasured covariate, with application to AIDS studies," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 119-130.
    8. Lubrano, Michel & Ndoye, Abdoul Aziz Junior, 2016. "Income inequality decomposition using a finite mixture of log-normal distributions: A Bayesian approach," Computational Statistics & Data Analysis, Elsevier, vol. 100(C), pages 830-846.
    9. Yuan Fang & Dimitris Karlis & Sanjeena Subedi, 2022. "Infinite Mixtures of Multivariate Normal-Inverse Gaussian Distributions for Clustering of Skewed Data," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 510-552, November.
    10. Murray, Paula M. & Browne, Ryan P. & McNicholas, Paul D., 2017. "Hidden truncation hyperbolic distributions, finite mixtures thereof, and their application for clustering," Journal of Multivariate Analysis, Elsevier, vol. 161(C), pages 141-156.
    11. Terrance Savitsky & Daniel McCaffrey, 2014. "Bayesian Hierarchical Multivariate Formulation with Factor Analysis for Nested Ordinal Data," Psychometrika, Springer;The Psychometric Society, vol. 79(2), pages 275-302, April.
    12. Kathryn M. Irvine & T. J. Rodhouse & Ilai N. Keren, 2016. "Extending Ordinal Regression with a Latent Zero-Augmented Beta Distribution," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 21(4), pages 619-640, December.
    13. Iraj Kazemi & Fatemeh Hassanzadeh, 2021. "Marginalized random-effects models for clustered binomial data through innovative link functions," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 105(2), pages 197-228, June.
    14. Li, Yong & Yu, Jun & Zeng, Tao, 2018. "Integrated Deviance Information Criterion for Latent Variable Models," Economics and Statistics Working Papers 6-2018, Singapore Management University, School of Economics.
    15. Kyu Ha Lee & Virginie Rondeau & Sebastien Haneuse, 2017. "Accelerated failure time models for semi‐competing risks data in the presence of complex censoring," Biometrics, The International Biometric Society, vol. 73(4), pages 1401-1412, December.
    16. Gilles Celeux & Florence Forbes & Christian P, Robert & Michael Titterington, 2003. "Deviance Information Criteria for Missing Data Models," Working Papers 2003-30, Center for Research in Economics and Statistics.
    17. Bei Jiang & Michael R. Elliott & Mary D. Sammel & Naisyin Wang, 2015. "Joint modeling of cross-sectional health outcomes and longitudinal predictors via mixtures of means and variances," Biometrics, The International Biometric Society, vol. 71(2), pages 487-497, June.
    18. Tenan, Simone & O’Hara, Robert B. & Hendriks, Iris & Tavecchia, Giacomo, 2014. "Bayesian model selection: The steepest mountain to climb," Ecological Modelling, Elsevier, vol. 283(C), pages 62-69.
    19. Ye Yang & Osman Doğan & Süleyman Taşpınar, 2023. "Observed-data DIC for spatial panel data models," Empirical Economics, Springer, vol. 64(3), pages 1281-1314, March.
    20. Oliver J. Rutz & Garrett P. Sonnier, 2019. "VANISH regularization for generalized linear models," Quantitative Marketing and Economics (QME), Springer, vol. 17(4), pages 415-437, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:105:y:2017:i:c:p:1-10. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.