IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v95y2016icp24-38.html
   My bibliography  Save this article

Multiply imputing missing values in data sets with mixed measurement scales using a sequence of generalised linear models

Author

Listed:
  • Lee, Min Cherng
  • Mitra, Robin

Abstract

Multiple imputation is a commonly used approach to deal with missing values. In this approach, an imputer repeatedly imputes the missing values by taking draws from the posterior predictive distribution for the missing values conditional on the observed values, and releases these completed data sets to analysts. With each completed data set the analyst performs the analysis of interest, treating the data as if it were fully observed. These analyses are then combined with standard combining rules, allowing the analyst to make appropriate inferences which take into account the uncertainty present due to the missing data. In order to preserve the statistical properties present in the data, the imputer must use a plausible distribution to generate the imputed values. In data sets containing variables with different measurement scales, e.g. some categorical and some continuous variables, this is a challenging problem. A method is proposed to multiply impute missing values in such data sets by modelling the joint distribution of the variables in the data through a sequence of generalised linear models, and data augmentation methods are used to draw imputations from a proper posterior distribution using Markov Chain Monte Carlo (MCMC). The performance of the proposed method is illustrated using simulation studies and on a data set taken from a breast feeding study.

Suggested Citation

  • Lee, Min Cherng & Mitra, Robin, 2016. "Multiply imputing missing values in data sets with mixed measurement scales using a sequence of generalised linear models," Computational Statistics & Data Analysis, Elsevier, vol. 95(C), pages 24-38.
  • Handle: RePEc:eee:csdana:v:95:y:2016:i:c:p:24-38
    DOI: 10.1016/j.csda.2015.08.004
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947315001772
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2015.08.004?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Donald B. Rubin, 2003. "Nested multiple imputation of NMES via partially incompatible MCMC," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 57(1), pages 3-18, February.
    2. Joseph G. Ibrahim & Ming-Hui Chen & Stuart R. Lipsitz & Amy H. Herring, 2005. "Missing-Data Methods for Generalized Linear Models: A Comparative Review," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 332-346, March.
    3. van Buuren, Stef & Groothuis-Oudshoorn, Karin, 2011. "mice: Multivariate Imputation by Chained Equations in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i03).
    4. Bernhardt, Paul W. & Wang, Huixia Judy & Zhang, Daowen, 2014. "Flexible modeling of survival data with covariates subject to detection limits via multiple imputation," Computational Statistics & Data Analysis, Elsevier, vol. 69(C), pages 81-91.
    5. Ming‐Hui Chen & Joseph G. Ibrahim, 2001. "Maximum Likelihood Methods for Cure Rate Models with Missing Covariates," Biometrics, The International Biometric Society, vol. 57(1), pages 43-52, March.
    6. Rashid, S. & Mitra, R. & Steele, R.J., 2015. "Using mixtures of t densities to make inferences in the presence of missing data with a small number of multiply imputed data sets," Computational Statistics & Data Analysis, Elsevier, vol. 92(C), pages 84-96.
    7. Mitra Robin & Dunson David, 2010. "Two-Level Stochastic Search Variable Selection in GLMs with Missing Predictors," The International Journal of Biostatistics, De Gruyter, vol. 6(1), pages 1-40, October.
    8. Xu, Linzhi & Zhang, Jiajia, 2010. "Multiple imputation method for the semiparametric accelerated failure time mixture cure model," Computational Statistics & Data Analysis, Elsevier, vol. 54(7), pages 1808-1816, July.
    9. Hapfelmeier, A. & Ulm, K., 2014. "Variable selection by Random Forests using data with missing values," Computational Statistics & Data Analysis, Elsevier, vol. 80(C), pages 129-139.
    10. Consentino, Fabrizio & Claeskens, Gerda, 2010. "Order selection tests with multiply imputed data," Computational Statistics & Data Analysis, Elsevier, vol. 54(10), pages 2284-2295, October.
    11. J. G. Ibrahim & S. R. Lipsitz & M.‐H. Chen, 1999. "Missing covariates in generalized linear models when the missing data mechanism is non‐ignorable," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(1), pages 173-190.
    12. Andrew Gelman & Iven Van Mechelen & Geert Verbeke & Daniel F. Heitjan & Michel Meulders, 2005. "Multiple Imputation for Model Checking: Completed-Data Plots with Missing and Latent Data," Biometrics, The International Biometric Society, vol. 61(1), pages 74-85, March.
    13. W. R. Gilks & P. Wild, 1992. "Adaptive Rejection Sampling for Gibbs Sampling," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 41(2), pages 337-348, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jiang, Depeng & Zhao, Puying & Tang, Niansheng, 2016. "A propensity score adjustment method for regression models with nonignorable missing covariates," Computational Statistics & Data Analysis, Elsevier, vol. 94(C), pages 98-119.
    2. Jiang, Wei & Josse, Julie & Lavielle, Marc, 2020. "Logistic regression with missing covariates—Parameter estimation, model selection and prediction within a joint-modeling framework," Computational Statistics & Data Analysis, Elsevier, vol. 145(C).
    3. Simon Grund & Oliver Lüdtke & Alexander Robitzsch, 2021. "On the Treatment of Missing Data in Background Questionnaires in Educational Large-Scale Assessments: An Evaluation of Different Procedures," Journal of Educational and Behavioral Statistics, , vol. 46(4), pages 430-465, August.
    4. Norah Alyabs & Sy Han Chiou, 2022. "The Missing Indicator Approach for Accelerated Failure Time Model with Covariates Subject to Limits of Detection," Stats, MDPI, vol. 5(2), pages 1-13, May.
    5. Amy H. Herring & Joseph G. Ibrahim & Stuart R. Lipsitz, 2002. "Frailty Models with Missing Covariates," Biometrics, The International Biometric Society, vol. 58(1), pages 98-109, March.
    6. Hongbin Zhang & Lang Wu, 2018. "A non‐linear model for censored and mismeasured time varying covariates in survival models, with applications in human immunodeficiency virus and acquired immune deficiency syndrome studies," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 67(5), pages 1437-1450, November.
    7. Humera Razzak & Christian Heumann, 2019. "Hybrid Multiple Imputation In A Large Scale Complex Survey," Statistics in Transition New Series, Polish Statistical Association, vol. 20(4), pages 33-58, December.
    8. Qingxia Chen & Joseph G. Ibrahim, 2006. "Semiparametric Models for Missing Covariate and Response Data in Regression Models," Biometrics, The International Biometric Society, vol. 62(1), pages 177-184, March.
    9. Chen, Qingxia & Ibrahim, Joseph G. & Chen, Ming-Hui & Senchaudhuri, Pralay, 2008. "Theory and inference for regression models with missing responses and covariates," Journal of Multivariate Analysis, Elsevier, vol. 99(6), pages 1302-1331, July.
    10. Göran Kauermann & Mehboob Ali, 2021. "Semi-parametric regression when some (expensive) covariates are missing by design," Statistical Papers, Springer, vol. 62(4), pages 1675-1696, August.
    11. Faisal Maqbool Zahid & Shahla Faisal & Christian Heumann, 2020. "Variable selection techniques after multiple imputation in high-dimensional data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 29(3), pages 553-580, September.
    12. repec:jss:jstsof:45:i02 is not listed on IDEAS
    13. Fang, Fang & Shao, Jun, 2016. "Iterated imputation estimation for generalized linear models with missing response and covariate values," Computational Statistics & Data Analysis, Elsevier, vol. 103(C), pages 111-123.
    14. Richard M. Golden & Steven S. Henley & Halbert White & T. Michael Kashner, 2019. "Consequences of Model Misspecification for Maximum Likelihood Estimation with Missing Data," Econometrics, MDPI, vol. 7(3), pages 1-27, September.
    15. Nanhua Zhang & Roderick J. Little, 2012. "A Pseudo-Bayesian Shrinkage Approach to Regression with Missing Covariates," Biometrics, The International Biometric Society, vol. 68(3), pages 933-942, September.
    16. Ming‐Hui Chen & Joseph G. Ibrahim, 2001. "Maximum Likelihood Methods for Cure Rate Models with Missing Covariates," Biometrics, The International Biometric Society, vol. 57(1), pages 43-52, March.
    17. Simon Grund & Oliver Lüdtke & Alexander Robitzsch, 2018. "Multiple Imputation of Missing Data at Level 2: A Comparison of Fully Conditional and Joint Modeling in Multilevel Designs," Journal of Educational and Behavioral Statistics, , vol. 43(3), pages 316-353, June.
    18. Hammon, Angelina & Zinn, Sabine, 2020. "Multiple imputation of binary multilevel missing not at random data," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 69(3), pages 547-564.
    19. S. Eftekhari Mahabadi & M. Ganjali, 2012. "An index of local sensitivity to non-ignorability for parametric survival models with potential non-random missing covariate: an application to the SEER cancer registry data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(11), pages 2327-2348, July.
    20. Lei Jin & Suojin Wang, 2010. "A Model Validation Procedure when Covariate Data are Missing at Random," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 37(3), pages 403-421, September.
    21. Burns, Christopher & Prager, Daniel & Ghosh, Sujit & Goodwin, Barry, 2015. "Imputing for Missing Data in the ARMS Household Section: A Multivariate Imputation Approach," 2015 AAEA & WAEA Joint Annual Meeting, July 26-28, San Francisco, California 205291, Agricultural and Applied Economics Association.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:95:y:2016:i:c:p:24-38. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.