IDEAS home Printed from https://ideas.repec.org/a/taf/japsta/v39y2012i10p2177-2198.html
   My bibliography  Save this article

Multiple imputation using multivariate gh transformations

Author

Listed:
  • Yulei He
  • Trivellore E. Raghunathan

Abstract

Multiple imputation has emerged as a popular approach to handling data sets with missing values. For incomplete continuous variables, imputations are usually produced using multivariate normal models. However, this approach might be problematic for variables with a strong non-normal shape, as it would generate imputations incoherent with actual distributions and thus lead to incorrect inferences. For non-normal data, we consider a multivariate extension of Tukey's gh distribution/transformation [38] to accommodate skewness and/or kurtosis and capture the correlation among the variables. We propose an algorithm to fit the incomplete data with the model and generate imputations. We apply the method to a national data set for hospital performance on several standard quality measures, which are highly skewed to the left and substantially correlated with each other. We use Monte Carlo studies to assess the performance of the proposed approach. We discuss possible generalizations and give some advices to practitioners on how to handle non-normal incomplete data.

Suggested Citation

  • Yulei He & Trivellore E. Raghunathan, 2012. "Multiple imputation using multivariate gh transformations," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(10), pages 2177-2198, June.
  • Handle: RePEc:taf:japsta:v:39:y:2012:i:10:p:2177-2198
    DOI: 10.1080/02664763.2012.702268
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/02664763.2012.702268
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/02664763.2012.702268?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Kobi Abayomi & Andrew Gelman & Marc Levy, 2008. "Diagnostics for multivariate imputations," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 57(3), pages 273-291, June.
    2. Velilla, Santiago, 1993. "A note on the multivariate Box--Cox transformation to normality," Statistics & Probability Letters, Elsevier, vol. 17(4), pages 259-263, July.
    3. He, Yulei & Raghunathan, Trivellore E., 2006. "Tukey's gh Distribution for Multiple Imputation," The American Statistician, American Statistical Association, vol. 60, pages 251-256, August.
    4. Hakan Demirtas & Donald Hedeker, 2008. "Imputing continuous data under some non‐Gaussian distributions," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 62(2), pages 193-205, May.
    5. Andrew Gelman & Iven Van Mechelen & Geert Verbeke & Daniel F. Heitjan & Michel Meulders, 2005. "Multiple Imputation for Model Checking: Completed-Data Plots with Missing and Latent Data," Biometrics, The International Biometric Society, vol. 61(1), pages 74-85, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Xu, Ganggang & Genton, Marc G., 2015. "Efficient maximum approximated likelihood inference for Tukey’s g-and-h distribution," Computational Statistics & Data Analysis, Elsevier, vol. 91(C), pages 78-91.
    2. Paul T. von Hippel, 2013. "Should a Normal Imputation Model be Modified to Impute Skewed Variables?," Sociological Methods & Research, , vol. 42(1), pages 105-138, February.
    3. Zhixin Lun & Ravindra Khattree, 2021. "Imputation for Skewed Data: Multivariate Lomax Case," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 83(1), pages 86-113, May.
    4. Ganggang Xu & Marc G. Genton, 2017. "Tukey -and- Random Fields," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(519), pages 1236-1249, July.
    5. Marco Geraci & Alexander McLain, 2018. "Multiple Imputation for Bounded Variables," Psychometrika, Springer;The Psychometric Society, vol. 83(4), pages 919-940, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Marco Geraci & Alexander McLain, 2018. "Multiple Imputation for Bounded Variables," Psychometrika, Springer;The Psychometric Society, vol. 83(4), pages 919-940, December.
    2. Yang Zhao, 2022. "Diagnostic checking of multiple imputation models," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 106(2), pages 271-286, June.
    3. Paul T. von Hippel, 2013. "Should a Normal Imputation Model be Modified to Impute Skewed Variables?," Sociological Methods & Research, , vol. 42(1), pages 105-138, February.
    4. Zhu, Xuwen & Melnykov, Volodymyr, 2018. "Manly transformation in finite mixture modeling," Computational Statistics & Data Analysis, Elsevier, vol. 121(C), pages 190-208.
    5. Zhongqi Liang & Qihua Wang & Yuting Wei, 2022. "Robust model selection with covariables missing at random," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 74(3), pages 539-557, June.
    6. Gerko Vink & Laurence E. Frank & Jeroen Pannekoek & Stef Buuren, 2014. "Predictive mean matching imputation of semicontinuous variables," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 68(1), pages 61-90, February.
    7. Jung, Hyekyung & Schafer, Joseph L. & Seo, Byungtae, 2011. "A latent class selection model for nonignorably missing data," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 802-812, January.
    8. Meulders, Michel, 2013. "An R Package for Probabilistic Latent Feature Analysis of Two-Way Two-Mode Frequencies," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 54(i14).
    9. Yajuan Si & Jerome P. Reiter, 2013. "Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys," Journal of Educational and Behavioral Statistics, , vol. 38(5), pages 499-521, October.
    10. Yana Melnykov & Xuwen Zhu & Volodymyr Melnykov, 2021. "Transformation mixture modeling for skewed data groups with heavy tails and scatter," Computational Statistics, Springer, vol. 36(1), pages 61-78, March.
    11. Kobi Abayomi & Gonzalo Pizarro, 2013. "Monitoring Human Development Goals: A Straightforward (Bayesian) Methodology for Cross-National Indices," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 110(2), pages 489-515, January.
    12. Michael Wolfson & Geoff Rowe, 2014. "HealthPaths: Using functional health trajectories to quantify the relative importance of selected health determinants," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 31(31), pages 941-974.
    13. Roman Matkovskyy, 2016. "A comparison of pre- and post-crisis efficiency of OECD countries: evidence from a model with temporal heterogeneity in time and unobservable individual effect," European Journal of Comparative Economics, Cattaneo University (LIUC), vol. 13(2), pages 135-167, December.
    14. Kilic,Talip & Yacoubou Djima,Ismael & Carletto,Calogero & Kilic,Talip & Yacoubou Djima,Ismael & Carletto,Calogero, 2017. "Mission impossible? exploring the promise of multiple imputation for predicting missing GPS-based land area measures in household surveys," Policy Research Working Paper Series 8138, The World Bank.
    15. Zhixin Lun & Ravindra Khattree, 2021. "Imputation for Skewed Data: Multivariate Lomax Case," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 83(1), pages 86-113, May.
    16. Roberto Gismondi, 2009. "Optimal Provisional Estimation in Short-term Surveys," Rivista di statistica ufficiale, ISTAT - Italian National Institute of Statistics - (Rome, ITALY), vol. 11(2-3), pages 5-34, January.
    17. Michael J. Daniels & Arkendu S. Chatterjee & Chenguang Wang, 2012. "Bayesian Model Selection for Incomplete Data Using the Posterior Predictive Distribution," Biometrics, The International Biometric Society, vol. 68(4), pages 1055-1063, December.
    18. Lee, Min Cherng & Mitra, Robin, 2016. "Multiply imputing missing values in data sets with mixed measurement scales using a sequence of generalised linear models," Computational Statistics & Data Analysis, Elsevier, vol. 95(C), pages 24-38.
    19. Roland Weigand, 2014. "Matrix Box-Cox Models for Multivariate Realized Volatility," Working Papers 144, Bavarian Graduate Program in Economics (BGPE).
    20. Caterina Giusti, 2009. "Multiple Imputation of Missing Income Data in the Survey on Income and Living Conditions," Rivista di statistica ufficiale, ISTAT - Italian National Institute of Statistics - (Rome, ITALY), vol. 11(2-3), pages 63-80, January.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:japsta:v:39:y:2012:i:10:p:2177-2198. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/CJAS20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.