IDEAS home Printed from https://ideas.repec.org/p/yor/hectdg/23-16.html
   My bibliography  Save this paper

Use of compositional covariates in linear regression: problems and solutions

Author

Listed:
  • Zhao, T.;
  • Sutton, M.;
  • Meacock, M.;

Abstract

Compositional variables such as proportions by age group are commonly included as covariates in aggregate-level health research. Since these proportions sum to one and only contain relative information, directly including them as covariates violates the fundamental assumptions made in linear regression analysis. We explain the compositional nature of such data and, using practice-level elective admissions rates in England as an example outcome variable, demonstrate the consequences of directly using proportions in regressions. We also provide an overview of compositional data analysis (CoDA) techniques with a focus on isometric log-ratio (ILR) transformation. Applying ILR to our example data shows that the regression results can differ significantly from those obtained using raw proportions. Health economists should apply appropriate CoDA methods when using compositional data in their research.

Suggested Citation

  • Zhao, T.; & Sutton, M.; & Meacock, M.;, 2023. "Use of compositional covariates in linear regression: problems and solutions," Health, Econometrics and Data Group (HEDG) Working Papers 23/16, HEDG, c/o Department of Economics, University of York.
  • Handle: RePEc:yor:hectdg:23/16
    as

    Download full text from publisher

    File URL: https://www.york.ac.uk/media/economics/documents/hedg/workingpapers/2023/2316.pdf
    File Function: Main text
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Sean Urwin & Yiu‐Shing Lau & Gunn Grande & Matt Sutton, 2023. "Informal caregiving, time use and experienced wellbeing," Health Economics, John Wiley & Sons, Ltd., vol. 32(2), pages 356-374, February.
    2. K. Hron & P. Filzmoser & K. Thompson, 2012. "Linear regression with compositional explanatory variables," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(5), pages 1115-1128, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Nikola Štefelová & Andreas Alfons & Javier Palarea-Albaladejo & Peter Filzmoser & Karel Hron, 2021. "Robust regression with compositional covariates including cellwise outliers," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(4), pages 869-909, December.
    2. Mauricio Velasquez, 2016. "Compositions vs Gini: A new metric to evaluate the effects of land-income disparities," 2016 Papers pve364, Job Market Papers.
    3. Janina Janurek & Sascha Abdel Hadi & Andreas Mojzisch & Jan Alexander Häusser, 2018. "The Association of the 24 Hour Distribution of Time Spent in Physical Activity, Work, and Sleep with Emotional Exhaustion," IJERPH, MDPI, vol. 15(9), pages 1-14, September.
    4. J. A. Martín-Fernández, 2021. "“Compositional Data Analysis in Practice” by Michael Greenacre Universitat Pompeu Fabra (Barcelona, Spain), Chapman and Hall/CRC, 2018," Journal of Classification, Springer;The Classification Society, vol. 38(1), pages 109-111, April.
    5. Andriansyah, Andriansyah & Messinis, George, 2016. "Intended use of IPO proceeds and firm performance: A quantile regression approach," Pacific-Basin Finance Journal, Elsevier, vol. 36(C), pages 14-30.
    6. Jacob Fiksel & Scott Zeger & Abhirup Datta, 2022. "A transformation‐free linear regression for compositional outcomes and predictors," Biometrics, The International Biometric Society, vol. 78(3), pages 974-987, September.
    7. Biyun Guo & Taiping Xie & M.V. Subrahmanyam, 2019. "The Impact of China’s Grain for Green Program on Rural Economy and Precipitation: A Case Study of Yan River Basin in the Loess Plateau," Sustainability, MDPI, vol. 11(19), pages 1-18, September.
    8. Thomas-Agnan, Christine & Morais, Joanna, 2019. "Covariates impacts in compositional models and simplicial derivatives," TSE Working Papers 19-1057, Toulouse School of Economics (TSE).
    9. Dorothea Dumuid & Željko Pedišić & Javier Palarea-Albaladejo & Josep Antoni Martín-Fernández & Karel Hron & Timothy Olds, 2020. "Compositional Data Analysis in Time-Use Epidemiology: What, Why, How," IJERPH, MDPI, vol. 17(7), pages 1-17, March.
    10. Huiwen Wang & Zhichao Wang & Shanshan Wang, 2021. "Sliced inverse regression method for multivariate compositional data modeling," Statistical Papers, Springer, vol. 62(1), pages 361-393, February.
    11. Rieser, Christopher & Filzmoser, Peter, 2023. "Extending compositional data analysis from a graph signal processing perspective," Journal of Multivariate Analysis, Elsevier, vol. 198(C).
    12. Haixiang Zhang & Jun Chen & Zhigang Li & Lei Liu, 2021. "Testing for Mediation Effect with Application to Human Microbiome Data," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 13(2), pages 313-328, July.
    13. Defever, F. & Riaño, A., 2022. "Firm-Destination Heterogeneity and the Distribution of Export Intensity," Working Papers 22/01, Department of Economics, City University London.
    14. Quim Zaldo-Aubanell & Ferran Campillo i López & Albert Bach & Isabel Serra & Joan Olivet-Vila & Marc Saez & David Pino & Roser Maneja, 2021. "Community Risk Factors in the COVID-19 Incidence and Mortality in Catalonia (Spain). A Population-Based Study," IJERPH, MDPI, vol. 18(7), pages 1-20, April.
    15. Mishra, Aditya & Müller, Christian L., 2022. "Robust regression with compositional covariates," Computational Statistics & Data Analysis, Elsevier, vol. 165(C).
    16. Roel Verbelen & Katrien Antonio & Gerda Claeskens, 2018. "Unravelling the predictive power of telematics data in car insurance pricing," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 67(5), pages 1275-1304, November.
    17. Dargel, Lukas & Thomas-Agnan, Christine, 2023. "Share-ratio interpretations of compositional regression models," TSE Working Papers 23-1456, Toulouse School of Economics (TSE), revised 20 Sep 2023.
    18. Marie Blaise & Sandrine Juin & Hélène Le Forner & Quitterie Roquebert, 2024. "I care, you clean? Gendered effects of informal care on couple housework and leisure time," LISER Working Paper Series 2024-05, Luxembourg Institute of Socio-Economic Research (LISER).
    19. Morais, Joanna & Thomas-Agnan, Christine & Simioni, Michel, 2017. "Interpreting the impact of explanatory variables in compositional models," TSE Working Papers 17-805, Toulouse School of Economics (TSE).
    20. Charlotte Lund Rasmussen & Javier Palarea-Albaladejo & Adrian Bauman & Nidhi Gupta & Kirsten Nabe-Nielsen & Marie Birk Jørgensen & Andreas Holtermann, 2018. "Does Physically Demanding Work Hinder a Physically Active Lifestyle in Low Socioeconomic Workers? A Compositional Data Analysis Based on Accelerometer Data," IJERPH, MDPI, vol. 15(7), pages 1-23, June.

    More about this item

    Keywords

    compositional data; CoDA; age group proportion; isometric log-ratio;
    All these keywords.

    JEL classification:

    • C18 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Methodolical Issues: General
    • C13 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Estimation: General
    • I10 - Health, Education, and Welfare - - Health - - - General

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:yor:hectdg:23/16. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Jane Rawlings (email available below). General contact details of provider: https://edirc.repec.org/data/deyoruk.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.