IDEAS home Printed from https://ideas.repec.org/a/spr/stabio/v13y2021i2d10.1007_s12561-020-09283-2.html
   My bibliography  Save this article

Regression Models for Compositional Data: General Log-Contrast Formulations, Proximal Optimization, and Microbiome Data Applications

Author

Listed:
  • Patrick L. Combettes

    (North Carolina State University)

  • Christian L. Müller

    (Flatiron Institute
    Institute of Computational Biology, Helmholtz Zentrum München
    Ludwig-Maxmilians-Universität München)

Abstract

Compositional data sets are ubiquitous in science, including geology, ecology, and microbiology. In microbiome research, compositional data primarily arise from high-throughput sequence-based profiling experiments. These data comprise microbial compositions in their natural habitat and are often paired with covariate measurements that characterize physicochemical habitat properties or the physiology of the host. Inferring parsimonious statistical associations between microbial compositions and habitat- or host-specific covariate data is an important step in exploratory data analysis. A standard statistical model linking compositional covariates to continuous outcomes is the linear log-contrast model. This model describes the response as a linear combination of log-ratios of the original compositions and has been extended to the high-dimensional setting via regularization. In this contribution, we propose a general convex optimization model for linear log-contrast regression which includes many previous proposals as special cases. We introduce a proximal algorithm that solves the resulting constrained optimization problem exactly with rigorous convergence guarantees. We illustrate the versatility of our approach by investigating the performance of several model instances on soil and gut microbiome data analysis tasks.

Suggested Citation

  • Patrick L. Combettes & Christian L. Müller, 2021. "Regression Models for Compositional Data: General Log-Contrast Formulations, Proximal Optimization, and Microbiome Data Applications," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 13(2), pages 217-242, July.
  • Handle: RePEc:spr:stabio:v:13:y:2021:i:2:d:10.1007_s12561-020-09283-2
    DOI: 10.1007/s12561-020-09283-2
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s12561-020-09283-2
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s12561-020-09283-2?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. She, Yiyuan & Owen, Art B., 2011. "Outlier Detection Using Nonconvex Penalized Regression," Journal of the American Statistical Association, American Statistical Association, vol. 106(494), pages 626-639.
    2. K. Hron & P. Filzmoser & K. Thompson, 2012. "Linear regression with compositional explanatory variables," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(5), pages 1115-1128, November.
    3. Wei Lin & Pixu Shi & Rui Feng & Hongzhe Li, 2014. "Variable selection in regression with compositional covariates," Biometrika, Biometrika Trust, vol. 101(4), pages 785-797.
    4. Tingni Sun & Cun-Hui Zhang, 2012. "Scaled sparse linear regression," Biometrika, Biometrika Trust, vol. 99(4), pages 879-898.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Cristofari, Andrea, 2023. "A decomposition method for lasso problems with zero-sum constraint," European Journal of Operational Research, Elsevier, vol. 306(1), pages 358-369.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mishra, Aditya & Müller, Christian L., 2022. "Robust regression with compositional covariates," Computational Statistics & Data Analysis, Elsevier, vol. 165(C).
    2. Umberto Amato & Anestis Antoniadis & Italia De Feis & Irene Gijbels, 2021. "Penalised robust estimators for sparse and high-dimensional linear models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(1), pages 1-48, March.
    3. Jacob Fiksel & Scott Zeger & Abhirup Datta, 2022. "A transformation‐free linear regression for compositional outcomes and predictors," Biometrics, The International Biometric Society, vol. 78(3), pages 974-987, September.
    4. Zemin Zheng & Jinchi Lv & Wei Lin, 2021. "Nonsparse Learning with Latent Variables," Operations Research, INFORMS, vol. 69(1), pages 346-359, January.
    5. Huiwen Wang & Zhichao Wang & Shanshan Wang, 2021. "Sliced inverse regression method for multivariate compositional data modeling," Statistical Papers, Springer, vol. 62(1), pages 361-393, February.
    6. Rieser, Christopher & Filzmoser, Peter, 2023. "Extending compositional data analysis from a graph signal processing perspective," Journal of Multivariate Analysis, Elsevier, vol. 198(C).
    7. Haixiang Zhang & Jun Chen & Zhigang Li & Lei Liu, 2021. "Testing for Mediation Effect with Application to Human Microbiome Data," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 13(2), pages 313-328, July.
    8. Juan José Egozcue & Vera Pawlowsky-Glahn, 2019. "Compositional data: the sample space and its structure," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(3), pages 599-638, September.
    9. Jiajia Chen & Xiaoqin Zhang & Shengjia Li, 2017. "Multiple linear regression with compositional response and covariates," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(12), pages 2270-2285, September.
    10. Nikola Štefelová & Andreas Alfons & Javier Palarea-Albaladejo & Peter Filzmoser & Karel Hron, 2021. "Robust regression with compositional covariates including cellwise outliers," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(4), pages 869-909, December.
    11. Mauricio Velasquez, 2016. "Compositions vs Gini: A new metric to evaluate the effects of land-income disparities," 2016 Papers pve364, Job Market Papers.
    12. Janina Janurek & Sascha Abdel Hadi & Andreas Mojzisch & Jan Alexander Häusser, 2018. "The Association of the 24 Hour Distribution of Time Spent in Physical Activity, Work, and Sleep with Emotional Exhaustion," IJERPH, MDPI, vol. 15(9), pages 1-14, September.
    13. Seunghwan Lee & Sang Cheol Kim & Donghyeon Yu, 2023. "An efficient GPU-parallel coordinate descent algorithm for sparse precision matrix estimation via scaled lasso," Computational Statistics, Springer, vol. 38(1), pages 217-242, March.
    14. Zemin Zheng & Jie Zhang & Yang Li, 2022. "L 0 -Regularized Learning for High-Dimensional Additive Hazards Regression," INFORMS Journal on Computing, INFORMS, vol. 34(5), pages 2762-2775, September.
    15. J. A. Martín-Fernández, 2021. "“Compositional Data Analysis in Practice” by Michael Greenacre Universitat Pompeu Fabra (Barcelona, Spain), Chapman and Hall/CRC, 2018," Journal of Classification, Springer;The Classification Society, vol. 38(1), pages 109-111, April.
    16. Young‐Geun Choi & Lawrence P. Hanrahan & Derek Norton & Ying‐Qi Zhao, 2022. "Simultaneous spatial smoothing and outlier detection using penalized regression, with application to childhood obesity surveillance from electronic health records," Biometrics, The International Biometric Society, vol. 78(1), pages 324-336, March.
    17. Andriansyah, Andriansyah & Messinis, George, 2016. "Intended use of IPO proceeds and firm performance: A quantile regression approach," Pacific-Basin Finance Journal, Elsevier, vol. 36(C), pages 14-30.
    18. Peter Bühlmann & Jacopo Mandozzi, 2014. "High-dimensional variable screening and bias in subsequent inference, with an empirical comparison," Computational Statistics, Springer, vol. 29(3), pages 407-430, June.
    19. Wentao Qu & Xianchao Xiu & Huangyue Chen & Lingchen Kong, 2023. "A Survey on High-Dimensional Subspace Clustering," Mathematics, MDPI, vol. 11(2), pages 1-39, January.
    20. Wang, Yihe & Zhao, Sihai Dave, 2021. "A nonparametric empirical Bayes approach to large-scale multivariate regression," Computational Statistics & Data Analysis, Elsevier, vol. 156(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stabio:v:13:y:2021:i:2:d:10.1007_s12561-020-09283-2. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.