IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v194y2024ics016794732400015x.html
   My bibliography  Save this article

Flexible regularized estimation in high-dimensional mixed membership models

Author

Listed:
  • Marco, Nicholas
  • Şentürk, Damla
  • Jeste, Shafali
  • DiStefano, Charlotte C.
  • Dickinson, Abigail
  • Telesca, Donatello

Abstract

Mixed membership models are an extension of finite mixture models, where each observation can partially belong to more than one mixture component. A probabilistic framework for mixed membership models of high-dimensional continuous data is proposed with a focus on scalability and interpretability. The novel probabilistic representation of mixed membership is based on convex combinations of dependent multivariate Gaussian random vectors. In this setting, scalability is ensured through approximations of a tensor covariance structure through multivariate eigen-approximations with adaptive regularization imposed through shrinkage priors. Conditional weak posterior consistency is established on an unconstrained model, allowing for a simple posterior sampling scheme while keeping many of the desired theoretical properties of our model. The model is motivated by two biomedical case studies: a case study on functional brain imaging of children with autism spectrum disorder (ASD) and a case study on gene expression data from breast cancer tissue. These applications highlight how the typical assumption made in cluster analysis, that each observation comes from one homogeneous subgroup, may often be restrictive in several applications, leading to unnatural interpretations of data features.

Suggested Citation

  • Marco, Nicholas & Şentürk, Damla & Jeste, Shafali & DiStefano, Charlotte C. & Dickinson, Abigail & Telesca, Donatello, 2024. "Flexible regularized estimation in high-dimensional mixed membership models," Computational Statistics & Data Analysis, Elsevier, vol. 194(C).
  • Handle: RePEc:eee:csdana:v:194:y:2024:i:c:s016794732400015x
    DOI: 10.1016/j.csda.2024.107931
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S016794732400015X
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2024.107931?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Choi, Taeryon & Schervish, Mark J., 2007. "On posterior consistency in nonparametric regression problems," Journal of Multivariate Analysis, Elsevier, vol. 98(10), pages 1969-1987, November.
    2. Christina Curtis & Sohrab P. Shah & Suet-Feung Chin & Gulisa Turashvili & Oscar M. Rueda & Mark J. Dunning & Doug Speed & Andy G. Lynch & Shamith Samarajiwa & Yinyin Yuan & Stefan Gräf & Gavin Ha & Gh, 2012. "The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups," Nature, Nature, vol. 486(7403), pages 346-352, June.
    3. repec:dau:papers:123456789/4648 is not listed on IDEAS
    4. Carl Eckart & Gale Young, 1936. "The approximation of one matrix by another of lower rank," Psychometrika, Springer;The Psychometric Society, vol. 1(3), pages 211-218, September.
    5. Jason Hou-Liu & Ryan P. Browne, 2022. "Chimeral Clustering," Journal of Classification, Springer;The Classification Society, vol. 39(1), pages 171-190, March.
    6. Yanxun Xu & Peter Müller & Donatello Telesca, 2016. "Bayesian inference for latent biologic structure with determinantal point processes (DPP)," Biometrics, The International Biometric Society, vol. 72(3), pages 955-964, September.
    7. Juhee Lee & Peter Müller & Subhajit Sengupta & Kamalakar Gulukota & Yuan Ji, 2016. "Bayesian inference for intratumour heterogeneity in mutations and copy number variation," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 65(4), pages 547-563, August.
    8. A. Bhattacharya & D. B. Dunson, 2011. "Sparse Bayesian infinite factor models," Biometrika, Biometrika Trust, vol. 98(2), pages 291-306.
    9. David J. Spiegelhalter & Nicola G. Best & Bradley P. Carlin & Angelika Van Der Linde, 2002. "Bayesian measures of model complexity and fit," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(4), pages 583-639, October.
    10. L Schiavon & A Canale & D B Dunson, 2022. "Generalized infinite factorization models [A latent factor linear mixed model for high-dimensional longitudinal data analysis]," Biometrika, Biometrika Trust, vol. 109(3), pages 817-835.
    11. Matthew Stephens, 2000. "Dealing with label switching in mixture models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 62(4), pages 795-809.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Park, Byung-Jung & Zhang, Yunlong & Lord, Dominique, 2010. "Bayesian mixture modeling approach to account for heterogeneity in speed data," Transportation Research Part B: Methodological, Elsevier, vol. 44(5), pages 662-673, June.
    2. Papastamoulis, Panagiotis, 2018. "Overfitting Bayesian mixtures of factor analyzers with an unknown number of components," Computational Statistics & Data Analysis, Elsevier, vol. 124(C), pages 220-234.
    3. Debamita Kundu & Riten Mitra & Jeremy T. Gaskins, 2021. "Bayesian variable selection for multioutcome models through shared shrinkage," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 48(1), pages 295-320, March.
    4. Royce Anders & William Batchelder, 2015. "Cultural Consensus Theory for the Ordinal Data Case," Psychometrika, Springer;The Psychometric Society, vol. 80(1), pages 151-181, March.
    5. Lu, Xiaosun & Huang, Yangxin & Zhu, Yiliang, 2016. "Finite mixture of nonlinear mixed-effects joint models in the presence of missing and mismeasured covariate, with application to AIDS studies," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 119-130.
    6. Lubrano, Michel & Ndoye, Abdoul Aziz Junior, 2016. "Income inequality decomposition using a finite mixture of log-normal distributions: A Bayesian approach," Computational Statistics & Data Analysis, Elsevier, vol. 100(C), pages 830-846.
    7. Simon Beyeler & Sylvia Kaufmann, 2021. "Reduced‐form factor augmented VAR—Exploiting sparsity to include meaningful factors," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 36(7), pages 989-1012, November.
    8. Yuan Fang & Dimitris Karlis & Sanjeena Subedi, 2022. "Infinite Mixtures of Multivariate Normal-Inverse Gaussian Distributions for Clustering of Skewed Data," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 510-552, November.
    9. Crespo Cuaresma, Jesús & Huber, Florian & Onorante, Luca, 2020. "Fragility and the effect of international uncertainty shocks," Journal of International Money and Finance, Elsevier, vol. 108(C).
    10. Terrance Savitsky & Daniel McCaffrey, 2014. "Bayesian Hierarchical Multivariate Formulation with Factor Analysis for Nested Ordinal Data," Psychometrika, Springer;The Psychometric Society, vol. 79(2), pages 275-302, April.
    11. Dimitris Korobilis & Kenichi Shimizu, 2022. "Bayesian Approaches to Shrinkage and Sparse Estimation," Foundations and Trends(R) in Econometrics, now publishers, vol. 11(4), pages 230-354, June.
    12. Hui, Francis K.C., 2017. "Model-based simultaneous clustering and ordination of multivariate abundance data in ecology," Computational Statistics & Data Analysis, Elsevier, vol. 105(C), pages 1-10.
    13. Gilles Celeux & Florence Forbes & Christian P, Robert & Michael Titterington, 2003. "Deviance Information Criteria for Missing Data Models," Working Papers 2003-30, Center for Research in Economics and Statistics.
    14. Bei Jiang & Michael R. Elliott & Mary D. Sammel & Naisyin Wang, 2015. "Joint modeling of cross-sectional health outcomes and longitudinal predictors via mixtures of means and variances," Biometrics, The International Biometric Society, vol. 71(2), pages 487-497, June.
    15. Oliver J. Rutz & Garrett P. Sonnier, 2019. "VANISH regularization for generalized linear models," Quantitative Marketing and Economics (QME), Springer, vol. 17(4), pages 415-437, December.
    16. Brian Neelon & A. James O'Malley & Sharon-Lise T. Normand, 2011. "A Bayesian Two-Part Latent Class Model for Longitudinal Medical Expenditure Data: Assessing the Impact of Mental Health and Substance Abuse Parity," Biometrics, The International Biometric Society, vol. 67(1), pages 280-289, March.
    17. Kim, Gwangsu & Choi, Taeryon, 2019. "Asymptotic properties of nonparametric estimation and quantile regression in Bayesian structural equation models," Journal of Multivariate Analysis, Elsevier, vol. 171(C), pages 68-82.
    18. Shotwell Matthew S & Slate Elizabeth H, 2010. "Bayesian Modeling of Footrace Finishing Times," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 6(3), pages 1-21, July.
    19. Komárek, Arnost, 2009. "A new R package for Bayesian estimation of multivariate normal mixtures allowing for selection of the number of components and interval-censored data," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 3932-3947, October.
    20. You, Na & Dai, Hongsheng & Wang, Xueqin & Yu, Qingyun, 2024. "Sequential estimation for mixture of regression models for heterogeneous population," Computational Statistics & Data Analysis, Elsevier, vol. 194(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:194:y:2024:i:c:s016794732400015x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.