IDEAS home Printed from https://ideas.repec.org/p/yor/hectdg/23-16.html
   My bibliography  Save this paper

Use of compositional covariates in linear regression: problems and solutions

Author

Listed:
  • Zhao, T.;
  • Sutton, M.;
  • Meacock, M.;

Abstract

Compositional variables such as proportions by age group are commonly included as covariates in aggregate-level health research. Since these proportions sum to one and only contain relative information, directly including them as covariates violates the fundamental assumptions made in linear regression analysis. We explain the compositional nature of such data and, using practice-level elective admissions rates in England as an example outcome variable, demonstrate the consequences of directly using proportions in regressions. We also provide an overview of compositional data analysis (CoDA) techniques with a focus on isometric log-ratio (ILR) transformation. Applying ILR to our example data shows that the regression results can differ significantly from those obtained using raw proportions. Health economists should apply appropriate CoDA methods when using compositional data in their research.

Suggested Citation

  • Zhao, T.; & Sutton, M.; & Meacock, M.;, 2023. "Use of compositional covariates in linear regression: problems and solutions," Health, Econometrics and Data Group (HEDG) Working Papers 23/16, HEDG, c/o Department of Economics, University of York.
  • Handle: RePEc:yor:hectdg:23/16
    as

    Download full text from publisher

    File URL: https://www.york.ac.uk/media/economics/documents/hedg/workingpapers/2023/2316.pdf
    File Function: Main text
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. K. Hron & P. Filzmoser & K. Thompson, 2012. "Linear regression with compositional explanatory variables," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(5), pages 1115-1128, November.
    2. Sean Urwin & Yiu‐Shing Lau & Gunn Grande & Matt Sutton, 2023. "Informal caregiving, time use and experienced wellbeing," Health Economics, John Wiley & Sons, Ltd., vol. 32(2), pages 356-374, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. J. A. Martín-Fernández, 2021. "“Compositional Data Analysis in Practice” by Michael Greenacre Universitat Pompeu Fabra (Barcelona, Spain), Chapman and Hall/CRC, 2018," Journal of Classification, Springer;The Classification Society, vol. 38(1), pages 109-111, April.
    2. Biyun Guo & Taiping Xie & M.V. Subrahmanyam, 2019. "The Impact of China’s Grain for Green Program on Rural Economy and Precipitation: A Case Study of Yan River Basin in the Loess Plateau," Sustainability, MDPI, vol. 11(19), pages 1-18, September.
    3. Thomas-Agnan, Christine & Morais, Joanna, 2019. "Covariates impacts in compositional models and simplicial derivatives," TSE Working Papers 19-1057, Toulouse School of Economics (TSE).
    4. Haixiang Zhang & Jun Chen & Zhigang Li & Lei Liu, 2021. "Testing for Mediation Effect with Application to Human Microbiome Data," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 13(2), pages 313-328, July.
    5. Defever, F. & Riaño, A., 2022. "Firm-Destination Heterogeneity and the Distribution of Export Intensity," Working Papers 22/01, Department of Economics, City University London.
    6. Mishra, Aditya & Müller, Christian L., 2022. "Robust regression with compositional covariates," Computational Statistics & Data Analysis, Elsevier, vol. 165(C).
    7. Roel Verbelen & Katrien Antonio & Gerda Claeskens, 2018. "Unravelling the predictive power of telematics data in car insurance pricing," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 67(5), pages 1275-1304, November.
    8. Dargel, Lukas & Thomas-Agnan, Christine, 2023. "Share-ratio interpretations of compositional regression models," TSE Working Papers 23-1456, Toulouse School of Economics (TSE), revised 20 Sep 2023.
    9. Morais, Joanna & Thomas-Agnan, Christine & Simioni, Michel, 2017. "Interpreting the impact of explanatory variables in compositional models," TSE Working Papers 17-805, Toulouse School of Economics (TSE).
    10. Urwin, Sean & Lau, Yiu-Shing & Grande, Gunn & Sutton, Matthew, 2023. "Informal caregiving and the allocation of time: implications for opportunity costs and measurement," Social Science & Medicine, Elsevier, vol. 334(C).
    11. Patrick L. Combettes & Christian L. Müller, 2021. "Regression Models for Compositional Data: General Log-Contrast Formulations, Proximal Optimization, and Microbiome Data Applications," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 13(2), pages 217-242, July.
    12. Thomas-Agnan, Christine & Simioni, Michel & Trinh, Thi-Huong, 2023. "Discrete and Smooth Scalar-on-Density Compositional Regression for Assessing the Impact of Climate Change on Rice Yield in Vietnam," TSE Working Papers 23-1410, Toulouse School of Economics (TSE), revised Apr 2024.
    13. Dargel, Lukas & Thomas-Agnan, Christine, 2024. "Pairwise share ratio interpretations of compositional regression models," Computational Statistics & Data Analysis, Elsevier, vol. 195(C).
    14. Francesca Bruno & Fedele Greco & Massimo Ventrucci, 2016. "Non-parametric regression on compositional covariates using Bayesian P-splines," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 25(1), pages 75-88, March.
    15. Francesca Bruno & Fedele Greco & Massimo Ventrucci, 2016. "Non-parametric regression on compositional covariates using Bayesian P-splines," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 25(1), pages 75-88, March.
    16. Monique Graf, 2020. "Regression for compositions based on a generalization of the Dirichlet distribution," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 29(4), pages 913-936, December.
    17. Nikola Štefelová & Andreas Alfons & Javier Palarea-Albaladejo & Peter Filzmoser & Karel Hron, 2021. "Robust regression with compositional covariates including cellwise outliers," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(4), pages 869-909, December.
    18. Mauricio Velasquez, 2016. "Compositions vs Gini: A new metric to evaluate the effects of land-income disparities," 2016 Papers pve364, Job Market Papers.
    19. Janina Janurek & Sascha Abdel Hadi & Andreas Mojzisch & Jan Alexander Häusser, 2018. "The Association of the 24 Hour Distribution of Time Spent in Physical Activity, Work, and Sleep with Emotional Exhaustion," IJERPH, MDPI, vol. 15(9), pages 1-14, September.
    20. Andriansyah, Andriansyah & Messinis, George, 2016. "Intended use of IPO proceeds and firm performance: A quantile regression approach," Pacific-Basin Finance Journal, Elsevier, vol. 36(C), pages 14-30.

    More about this item

    Keywords

    compositional data; CoDA; age group proportion; isometric log-ratio;
    All these keywords.

    JEL classification:

    • C18 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Methodolical Issues: General
    • C13 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Estimation: General
    • I10 - Health, Education, and Welfare - - Health - - - General

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:yor:hectdg:23/16. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Jane Rawlings (email available below). General contact details of provider: https://edirc.repec.org/data/deyoruk.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.