Multiply imputing missing values in data sets with mixed measurement scales using a sequence of generalised linear models

My bibliography Save this article

Multiply imputing missing values in data sets with mixed measurement scales using a sequence of generalised linear models

Author

Listed:

Lee, Min Cherng
Mitra, Robin

Registered:

Abstract

Multiple imputation is a commonly used approach to deal with missing values. In this approach, an imputer repeatedly imputes the missing values by taking draws from the posterior predictive distribution for the missing values conditional on the observed values, and releases these completed data sets to analysts. With each completed data set the analyst performs the analysis of interest, treating the data as if it were fully observed. These analyses are then combined with standard combining rules, allowing the analyst to make appropriate inferences which take into account the uncertainty present due to the missing data. In order to preserve the statistical properties present in the data, the imputer must use a plausible distribution to generate the imputed values. In data sets containing variables with different measurement scales, e.g. some categorical and some continuous variables, this is a challenging problem. A method is proposed to multiply impute missing values in such data sets by modelling the joint distribution of the variables in the data through a sequence of generalised linear models, and data augmentation methods are used to draw imputations from a proper posterior distribution using Markov Chain Monte Carlo (MCMC). The performance of the proposed method is illustrated using simulation studies and on a data set taken from a breast feeding study.

Suggested Citation

Lee, Min Cherng & Mitra, Robin, 2016. "Multiply imputing missing values in data sets with mixed measurement scales using a sequence of generalised linear models," Computational Statistics & Data Analysis, Elsevier, vol. 95(C), pages 24-38.

Handle: RePEc:eee:csdana:v:95:y:2016:i:c:p:24-38
DOI: 10.1016/j.csda.2015.08.004

Download full text from publisher

As the access to this document is restricted, you may want to search for a different version of it.

References listed on IDEAS

Donald B. Rubin, 2003. "Nested multiple imputation of NMES via partially incompatible MCMC," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 57(1), pages 3-18, February.
van Buuren, Stef & Groothuis-Oudshoorn, Karin, 2011. "mice: Multivariate Imputation by Chained Equations in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i03).
Bernhardt, Paul W. & Wang, Huixia Judy & Zhang, Daowen, 2014. "Flexible modeling of survival data with covariates subject to detection limits via multiple imputation," Computational Statistics & Data Analysis, Elsevier, vol. 69(C), pages 81-91.
Ming‐Hui Chen & Joseph G. Ibrahim, 2001. "Maximum Likelihood Methods for Cure Rate Models with Missing Covariates," Biometrics, The International Biometric Society, vol. 57(1), pages 43-52, March.
Rashid, S. & Mitra, R. & Steele, R.J., 2015. "Using mixtures of t densities to make inferences in the presence of missing data with a small number of multiply imputed data sets," Computational Statistics & Data Analysis, Elsevier, vol. 92(C), pages 84-96.
Xu, Linzhi & Zhang, Jiajia, 2010. "Multiple imputation method for the semiparametric accelerated failure time mixture cure model," Computational Statistics & Data Analysis, Elsevier, vol. 54(7), pages 1808-1816, July.
Hapfelmeier, A. & Ulm, K., 2014. "Variable selection by Random Forests using data with missing values," Computational Statistics & Data Analysis, Elsevier, vol. 80(C), pages 129-139.
W. R. Gilks & P. Wild, 1992. "Adaptive Rejection Sampling for Gibbs Sampling," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 41(2), pages 337-348, June.
Joseph G. Ibrahim & Ming-Hui Chen & Stuart R. Lipsitz & Amy H. Herring, 2005. "Missing-Data Methods for Generalized Linear Models: A Comparative Review," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 332-346, March.
Mitra Robin & Dunson David, 2010. "Two-Level Stochastic Search Variable Selection in GLMs with Missing Predictors," The International Journal of Biostatistics, De Gruyter, vol. 6(1), pages 1-40, October.
Consentino, Fabrizio & Claeskens, Gerda, 2010. "Order selection tests with multiply imputed data," Computational Statistics & Data Analysis, Elsevier, vol. 54(10), pages 2284-2295, October.
J. G. Ibrahim & S. R. Lipsitz & M.‐H. Chen, 1999. "Missing covariates in generalized linear models when the missing data mechanism is non‐ignorable," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(1), pages 173-190.
Andrew Gelman & Iven Van Mechelen & Geert Verbeke & Daniel F. Heitjan & Michel Meulders, 2005. "Multiple Imputation for Model Checking: Completed-Data Plots with Missing and Latent Data," Biometrics, The International Biometric Society, vol. 61(1), pages 74-85, March.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Jiang, Depeng & Zhao, Puying & Tang, Niansheng, 2016. "A propensity score adjustment method for regression models with nonignorable missing covariates," Computational Statistics & Data Analysis, Elsevier, vol. 94(C), pages 98-119.
Jiang, Wei & Josse, Julie & Lavielle, Marc, 2020. "Logistic regression with missing covariates—Parameter estimation, model selection and prediction within a joint-modeling framework," Computational Statistics & Data Analysis, Elsevier, vol. 145(C).
Simon Grund & Oliver LÃ¼dtke & Alexander Robitzsch, 2021. "On the Treatment of Missing Data in Background Questionnaires in Educational Large-Scale Assessments: An Evaluation of Different Procedures," Journal of Educational and Behavioral Statistics, , vol. 46(4), pages 430-465, August.
Norah Alyabs & Sy Han Chiou, 2022. "The Missing Indicator Approach for Accelerated Failure Time Model with Covariates Subject to Limits of Detection," Stats, MDPI, vol. 5(2), pages 1-13, May.
Simon Grund & Oliver LÃ¼dtke & Alexander Robitzsch, 2018. "Multiple Imputation of Missing Data at Level 2: A Comparison of Fully Conditional and Joint Modeling in Multilevel Designs," Journal of Educational and Behavioral Statistics, , vol. 43(3), pages 316-353, June.
Hammon, Angelina & Zinn, Sabine, 2020. "Multiple imputation of binary multilevel missing not at random data," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 69(3), pages 547-564.
S. Eftekhari Mahabadi & M. Ganjali, 2012. "An index of local sensitivity to non-ignorability for parametric survival models with potential non-random missing covariate: an application to the SEER cancer registry data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(11), pages 2327-2348, July.
Lei Jin & Suojin Wang, 2010. "A Model Validation Procedure when Covariate Data are Missing at Random," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 37(3), pages 403-421, September.
Amy H. Herring & Joseph G. Ibrahim & Stuart R. Lipsitz, 2002. "Frailty Models with Missing Covariates," Biometrics, The International Biometric Society, vol. 58(1), pages 98-109, March.
Burns, Christopher & Prager, Daniel & Ghosh, Sujit & Goodwin, Barry, 2015. "Imputing for Missing Data in the ARMS Household Section: A Multivariate Imputation Approach," 2015 AAEA & WAEA Joint Annual Meeting, July 26-28, San Francisco, California 205291, Agricultural and Applied Economics Association.
Jared S. Murray & Jerome P. Reiter, 2016. "Multiple Imputation of Missing Categorical and Continuous Values via Bayesian Mixture Models With Local Dependence," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1466-1479, October.
Hongbin Zhang & Lang Wu, 2018. "A non‐linear model for censored and mismeasured time varying covariates in survival models, with applications in human immunodeficiency virus and acquired immune deficiency syndrome studies," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 67(5), pages 1437-1450, November.
Tian Li & Julian M. Somers & Xiaoqiong J. Hu & Lawrence C. McCandless, 2019. "Bayesian Sensitivity Analysis for Non-ignorable Missing Data in Longitudinal Studies," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 11(1), pages 184-205, April.
Ton Waal & Jacco Daalmans, 2024. "Calibrated imputation for multivariate categorical data," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 108(3), pages 545-576, September.
Chen, Xue-Dong & Fu, Ying-Zi, 2011. "Model selection for zero-inflated regression with missing covariates," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 765-773, January.
Hongtu Zhu & Joseph G. Ibrahim & Xiaoyan Shi, 2009. "Diagnostic Measures for Generalized Linear Models with Missing Covariates," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 36(4), pages 686-712, December.
Humera Razzak & Christian Heumann, 2019. "Hybrid Multiple Imputation In A Large Scale Complex Survey," Statistics in Transition New Series, Polish Statistical Association, vol. 20(4), pages 33-58, December.
Ruiwen Zhou & Huiqiong Li & Jianguo Sun & Niansheng Tang, 2022. "A new approach to estimation of the proportional hazards model based on interval-censored data with missing covariates," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 28(3), pages 335-355, July.
Qingxia Chen & Joseph G. Ibrahim, 2006. "Semiparametric Models for Missing Covariate and Response Data in Regression Models," Biometrics, The International Biometric Society, vol. 62(1), pages 177-184, March.
Austin Menger & Md. Tuhin Sheikh & Ming-Hui Chen, 2024. "Bayesian Modeling of Survival Data in the Presence of Competing Risks with Cure Fractions and Masked Causes," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 86(1), pages 199-227, November.

More about this item

Keywords

Data augmentation; Latent variable; Missing data; Multiple imputation;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:95:y:2016:i:c:p:24-38. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Multiply imputing missing values in data sets with mixed measurement scales using a sequence of generalised linear models

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data