IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v148y2020ics0167947320300499.html
   My bibliography  Save this article

Primal path algorithm for compositional data analysis

Author

Listed:
  • Jeon, Jong-June
  • Kim, Yongdai
  • Won, Sungho
  • Choi, Hosik

Abstract

We consider the LASSO estimator for compositional data in which covariates are nonnegative, and their sum is always one. Due to the linear constraint of the regression coefficients caused by the sum to one condition, standard algorithms for LASSO cannot be applied directly to compositional data. Hence, a specific regularized regression model with linear constraints is commonly used. However, linear constraints incur additional computational time, which becomes severe in high-dimensional cases. Additionally, the exact computation for the regression is not investigated under existing methods. In this paper, we first propose an exact solution path algorithm for a l1 regularized regression with high-dimensional compositional data and extend to a classification model. We also compare its computational speed with that of previously developed algorithms and then apply the proposed algorithm to analyzing income inequality data in economics and human gut microbiome data in biology. By analyzing simulated and real data sets, we illustrate that our specialized algorithm is significantly more efficient than the generalized LASSO algorithm for compositional data.

Suggested Citation

  • Jeon, Jong-June & Kim, Yongdai & Won, Sungho & Choi, Hosik, 2020. "Primal path algorithm for compositional data analysis," Computational Statistics & Data Analysis, Elsevier, vol. 148(C).
  • Handle: RePEc:eee:csdana:v:148:y:2020:i:c:s0167947320300499
    DOI: 10.1016/j.csda.2020.106958
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947320300499
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2020.106958?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Wei Lin & Pixu Shi & Rui Feng & Hongzhe Li, 2014. "Variable selection in regression with compositional covariates," Biometrika, Biometrika Trust, vol. 101(4), pages 785-797.
    2. Nicolai Meinshausen & Peter Bühlmann, 2010. "Stability selection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(4), pages 417-473, September.
    3. P. Tseng & S. Yun, 2009. "Block-Coordinate Gradient Descent Method for Linearly Constrained Nonsmooth Separable Optimization," Journal of Optimization Theory and Applications, Springer, vol. 140(3), pages 513-535, March.
    4. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    5. Howard D. Bondell & Brian J. Reich, 2009. "Simultaneous Factor Selection and Collapsing Levels in ANOVA," Biometrics, The International Biometric Society, vol. 65(1), pages 169-177, March.
    6. Hua Zhou & Yichao Wu, 2014. "A Generic Path Algorithm for Regularized Statistical Estimation," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(506), pages 686-699, June.
    7. Aigner, D J & Amemiya, Takeshi & Poirier, Dale J, 1976. "On the Estimation of Production Frontiers: Maximum Likelihood Estimation of the Parameters of a Discontinuous Density Function," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 17(2), pages 377-396, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Cristofari, Andrea, 2023. "A decomposition method for lasso problems with zero-sum constraint," European Journal of Operational Research, Elsevier, vol. 306(1), pages 358-369.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Capanu, Marinela & Giurcanu, Mihai & Begg, Colin B. & Gönen, Mithat, 2023. "Subsampling based variable selection for generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 184(C).
    2. Skripnikov, A. & Michailidis, G., 2019. "Joint estimation of multiple network Granger causal models," Econometrics and Statistics, Elsevier, vol. 10(C), pages 120-133.
    3. Pei Wang & Shunjie Chen & Sijia Yang, 2022. "Recent Advances on Penalized Regression Models for Biological Data," Mathematics, MDPI, vol. 10(19), pages 1-24, October.
    4. Soyeon Kim & Veerabhadran Baladandayuthapani & J. Jack Lee, 2017. "Prediction-Oriented Marker Selection (PROMISE): With Application to High-Dimensional Regression," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 9(1), pages 217-245, June.
    5. Denis Chetverikov & Jesper R.-V. Sørensen, 2021. "Analytic and Bootstrap-after-Cross-Validation Methods for Selecting Penalty Parameters of High-Dimensional M-Estimators," Discussion Papers 21-04, University of Copenhagen. Department of Economics.
    6. Hsu, David, 2015. "Identifying key variables and interactions in statistical models of building energy consumption using regularization," Energy, Elsevier, vol. 83(C), pages 144-155.
    7. Rieser, Christopher & Filzmoser, Peter, 2023. "Extending compositional data analysis from a graph signal processing perspective," Journal of Multivariate Analysis, Elsevier, vol. 198(C).
    8. Raheem, S.M. Enayetur & Ahmed, S. Ejaz & Doksum, Kjell A., 2012. "Absolute penalty and shrinkage estimation in partially linear models," Computational Statistics & Data Analysis, Elsevier, vol. 56(4), pages 874-891.
    9. Pan, Yuqing & Mai, Qing, 2020. "Efficient computation for differential network analysis with applications to quadratic discriminant analysis," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    10. Eric P Xing & Ross E Curtis & Georg Schoenherr & Seunghak Lee & Junming Yin & Kriti Puniyani & Wei Wu & Peter Kinnaird, 2014. "GWAS in a Box: Statistical and Visual Analytics of Structured Associations via GenAMap," PLOS ONE, Public Library of Science, vol. 9(6), pages 1-19, June.
    11. Gerhard Tutz & Gunther Schauberger, 2015. "Extended ordered paired comparison models with application to football data from German Bundesliga," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 99(2), pages 209-227, April.
    12. Enora Belz & Arthur Charpentier, 2019. "Aggregated Data and Compositional Variables: Methodological Note [Données Agrégées et Variables Compositionnelles : Note Méthodologique]," Working Papers hal-02097031, HAL.
    13. Abhik Ghosh & Magne Thoresen, 2018. "Non-concave penalization in linear mixed-effect models and regularized selection of fixed effects," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 102(2), pages 179-210, April.
    14. Jan Pablo Burgard & Joscha Krause & Dennis Kreber & Domingo Morales, 2021. "The generalized equivalence of regularization and min–max robustification in linear mixed models," Statistical Papers, Springer, vol. 62(6), pages 2857-2883, December.
    15. Benjamin G. Stokell & Rajen D. Shah & Ryan J. Tibshirani, 2021. "Modelling high‐dimensional categorical data using nonconvex fusion penalties," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(3), pages 579-611, July.
    16. Paul Tseng & Sangwoon Yun, 2014. "Incrementally Updated Gradient Methods for Constrained and Regularized Optimization," Journal of Optimization Theory and Applications, Springer, vol. 160(3), pages 832-853, March.
    17. Roberts, S. & Nowak, G., 2014. "Stabilizing the lasso against cross-validation variability," Computational Statistics & Data Analysis, Elsevier, vol. 70(C), pages 198-211.
    18. Huang, Shih-Ting & Xie, Fang & Lederer, Johannes, 2021. "Tuning-free ridge estimators for high-dimensional generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 159(C).
    19. Sean M Devlin & Axel Martin & Irina Ostrovnaya, 2021. "Identifying prognostic pairwise relationships among bacterial species in microbiome studies," PLOS Computational Biology, Public Library of Science, vol. 17(11), pages 1-12, November.
    20. R. Lopes & S. A. Santos & P. J. S. Silva, 2019. "Accelerating block coordinate descent methods with identification strategies," Computational Optimization and Applications, Springer, vol. 72(3), pages 609-640, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:148:y:2020:i:c:s0167947320300499. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.