IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v110y2017icp103-114.html
   My bibliography  Save this article

Variable selection for multiply-imputed data with penalized generalized estimating equations

Author

Listed:
  • Geronimi, J.
  • Saporta, G.

Abstract

Generalized estimating equations (GEE) are useful tools for marginal regression analysis for longitudinal data. Having a high number of variables along with the presence of missing data presents complex issues when working in a longitudinal context. In variable selection for instance, penalized generalized estimating equations have not been systematically developed to integrate missing data. The MI-PGEE: multiple imputation-penalized generalized estimating equations, an extension of the multiple imputation-least absolute shrinkage and selection operator (MI-LASSO) is presented. MI-PGEE allows integration of missing data and within-subject correlation in variable selection procedures. Missing data are dealt with using multiple imputation, and variable selection is performed using a group LASSO penalty. Estimated coefficients for the same variable across multiply-imputed datasets are considered as a group while applying penalized generalized estimating equations, leading to a unique model across multiply-imputed datasets. In order to select the tuning parameter, a new BIC-like criterion is proposed. In a simulation study, the advantage of using MI-PGEE compared to simple imputation PGEE is shown. The usefulness of the new method is illustrated by an application to a subgroup of the placebo arm of the strontium ranelate efficacy in knee osteoarthritis trial study.

Suggested Citation

  • Geronimi, J. & Saporta, G., 2017. "Variable selection for multiply-imputed data with penalized generalized estimating equations," Computational Statistics & Data Analysis, Elsevier, vol. 110(C), pages 103-114.
  • Handle: RePEc:eee:csdana:v:110:y:2017:i:c:p:103-114
    DOI: 10.1016/j.csda.2017.01.001
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947317300051
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2017.01.001?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Eva Cantoni & Joanna Mills Flemming & Elvezio Ronchetti, 2005. "Variable Selection for Marginal Longitudinal Generalized Linear Models," Biometrics, The International Biometric Society, vol. 61(2), pages 507-514, June.
    2. van Buuren, Stef & Groothuis-Oudshoorn, Karin, 2011. "mice: Multivariate Imputation by Chained Equations in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i03).
    3. Wei Pan, 2001. "Akaike's Information Criterion in Generalized Estimating Equations," Biometrics, The International Biometric Society, vol. 57(1), pages 120-125, March.
    4. Lan Wang & Jianhui Zhou & Annie Qu, 2012. "Penalized Generalized Estimating Equations for High-Dimensional Longitudinal Data Analysis," Biometrics, The International Biometric Society, vol. 68(2), pages 353-360, June.
    5. Li, Gaorong & Lian, Heng & Feng, Sanying & Zhu, Lixing, 2013. "Automatic variable selection for longitudinal generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 61(C), pages 174-186.
    6. Masao Ueki, 2009. "A note on automatic variable selection using smooth-threshold estimating equations," Biometrika, Biometrika Trust, vol. 96(4), pages 1005-1011.
    7. Tze Leung Lai & Dylan Small, 2007. "Marginal regression analysis of longitudinal data with time‐dependent covariates: a generalized method‐of‐moments approach," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 69(1), pages 79-99, February.
    8. Horton N. J. & Lipsitz S. R., 2001. "Multiple Imputation in Practice: Comparison of Software Packages for Regression Models With Missing Variables," The American Statistician, American Statistical Association, vol. 55, pages 244-254, August.
    9. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    10. Ming Yuan & Yi Lin, 2006. "Model selection and estimation in regression with grouped variables," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 68(1), pages 49-67, February.
    11. Wenjiang J. Fu, 2003. "Penalized Estimating Equations," Biometrics, The International Biometric Society, vol. 59(1), pages 126-132, March.
    12. Blommaert, A. & Hens, N. & Beutels, Ph., 2014. "Data mining for longitudinal data under multicollinearity and time dependence using penalized generalized estimating equations," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 667-680.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Li, Gaorong & Lian, Heng & Feng, Sanying & Zhu, Lixing, 2013. "Automatic variable selection for longitudinal generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 61(C), pages 174-186.
    2. Wang, Kangning & Li, Shaomin & Sun, Xiaofei & Lin, Lu, 2019. "Modal regression statistical inference for longitudinal data semivarying coefficient models: Generalized estimating equations, empirical likelihood and variable selection," Computational Statistics & Data Analysis, Elsevier, vol. 133(C), pages 257-276.
    3. Lv, Jing & Yang, Hu & Guo, Chaohui, 2015. "An efficient and robust variable selection method for longitudinal generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 74-88.
    4. Fan, Yali & Qin, Guoyou & Zhu, Zhongyi, 2012. "Variable selection in robust regression models for longitudinal data," Journal of Multivariate Analysis, Elsevier, vol. 109(C), pages 156-167.
    5. Lan Wang & Jianhui Zhou & Annie Qu, 2012. "Penalized Generalized Estimating Equations for High-Dimensional Longitudinal Data Analysis," Biometrics, The International Biometric Society, vol. 68(2), pages 353-360, June.
    6. Gregory Vaughan & Robert Aseltine & Kun Chen & Jun Yan, 2017. "Stagewise generalized estimating equations with grouped variables," Biometrics, The International Biometric Society, vol. 73(4), pages 1332-1342, December.
    7. Rahul Ghosal & Arnab Maity & Timothy Clark & Stefano B. Longo, 2020. "Variable selection in functional linear concurrent regression," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 69(3), pages 565-587, June.
    8. Blommaert, A. & Hens, N. & Beutels, Ph., 2014. "Data mining for longitudinal data under multicollinearity and time dependence using penalized generalized estimating equations," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 667-680.
    9. Jakub Stoklosa & Heloise Gibb & David I. Warton, 2014. "Fast forward selection for generalized estimating equations with a large number of predictor variables," Biometrics, The International Biometric Society, vol. 70(1), pages 110-120, March.
    10. Lu Tang & Peter X.‐K. Song, 2021. "Poststratification fusion learning in longitudinal data analysis," Biometrics, The International Biometric Society, vol. 77(3), pages 914-928, September.
    11. Wenning Feng & Abdhi Sarkar & Chae Young Lim & Tapabrata Maiti, 2016. "Variable selection for binary spatial regression: Penalized quasi‐likelihood approach," Biometrics, The International Biometric Society, vol. 72(4), pages 1164-1172, December.
    12. Heng Lian & Peng Lai & Hua Liang, 2013. "Partially Linear Structure Selection in Cox Models with Varying Coefficients," Biometrics, The International Biometric Society, vol. 69(2), pages 348-357, June.
    13. Faisal Maqbool Zahid & Shahla Faisal & Christian Heumann, 2020. "Variable selection techniques after multiple imputation in high-dimensional data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 29(3), pages 553-580, September.
    14. Fang, Jianglin, 2023. "A split-and-conquer variable selection approach for high-dimensional general semiparametric models with massive data," Journal of Multivariate Analysis, Elsevier, vol. 194(C).
    15. Tian, Ruiqin & Xue, Liugen & Xu, Dengke, 2016. "Automatic variable selection for varying coefficient models with longitudinal data," Statistics & Probability Letters, Elsevier, vol. 119(C), pages 84-90.
    16. Jie Ding & Vahid Tarokh & Yuhong Yang, 2018. "Model Selection Techniques -- An Overview," Papers 1810.09583, arXiv.org.
    17. Yang, Yuan & McMahan, Christopher S. & Wang, Yu-Bo & Ouyang, Yuyuan, 2024. "Estimation of l0 norm penalized models: A statistical treatment," Computational Statistics & Data Analysis, Elsevier, vol. 192(C).
    18. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    19. Jun Yan & Jian Huang, 2012. "Model Selection for Cox Models with Time-Varying Coefficients," Biometrics, The International Biometric Society, vol. 68(2), pages 419-428, June.
    20. Shuichi Kawano, 2014. "Selection of tuning parameters in bridge regression models via Bayesian information criterion," Statistical Papers, Springer, vol. 55(4), pages 1207-1223, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:110:y:2017:i:c:p:103-114. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.