IDEAS home Printed from https://ideas.repec.org/a/taf/amstat/v70y2016i4p358-364.html
   My bibliography  Save this article

Bias Introduced by Rounding in Multiple Imputation for Ordered Categorical Variables

Author

Listed:
  • Yan Xia
  • Yanyun Yang

Abstract

Multivariate normality is frequently assumed when multiple imputation is applied for missing data. When data are ordered categorical, imputing missing data using the fully normal imputation results in implausible values falling outside of the categorical values. Naïve rounding has been suggested to round the imputed values to their categorical neighbors for further analysis. Previous studies showed that, for binary data, the rounded values can result in biased mean estimation when the population distribution is asymmetric. However, it has been conjectured that as the number of categories increases, the bias will decrease. To investigate this conjecture, the present study derives the formulas for the biases of the mean and standard deviation for ordered categorical variables with naïve rounding. Results show that both the biases of the mean and standard deviation decrease as the number of categories increases from 3 to 9. This study also finds that although symmetric population distributions lead to unbiased means of the rounded values, the standard deviations may still be largely biased. A simulation study further shows that the biases due to naïve rounding can result in substantially low coverage rates for the population mean parameter.

Suggested Citation

  • Yan Xia & Yanyun Yang, 2016. "Bias Introduced by Rounding in Multiple Imputation for Ordered Categorical Variables," The American Statistician, Taylor & Francis Journals, vol. 70(4), pages 358-364, October.
  • Handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:358-364
    DOI: 10.1080/00031305.2016.1200486
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/00031305.2016.1200486
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/00031305.2016.1200486?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Yucel, Recai M. & He, Yulei & Zaslavsky, Alan M., 2008. "Using Calibration to Improve Rounding in Imputation," The American Statistician, American Statistical Association, vol. 62, pages 125-129, May.
    2. Horton N.J. & Lipsitz S.R. & Parzen M., 2003. "A Potential for Bias When Rounding in Multiple Imputation," The American Statistician, American Statistical Association, vol. 57, pages 229-232, November.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Zachary K. Collier & Minji Kong & Olushola Soyoye & Kamal Chawla & Ann M. Aviles & Yasser Payne, 2024. "Deep Learning Imputation for Asymmetric and Incomplete Likert-Type Items," Journal of Educational and Behavioral Statistics, , vol. 49(2), pages 241-267, April.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Paul T. von Hippel, 2013. "Should a Normal Imputation Model be Modified to Impute Skewed Variables?," Sociological Methods & Research, , vol. 42(1), pages 105-138, February.
    2. Matthew Desmond & Tracey Shollenberger, 2015. "Forced Displacement From Rental Housing: Prevalence and Neighborhood Consequences," Demography, Springer;Population Association of America (PAA), vol. 52(5), pages 1751-1772, October.
    3. Yajuan Si & Jerome P. Reiter, 2013. "Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys," Journal of Educational and Behavioral Statistics, , vol. 38(5), pages 499-521, October.
    4. Davide Vidotto & Jeroen K. Vermunt & Katrijn van Deun, 2018. "Bayesian Multilevel Latent Class Models for the Multiple Imputation of Nested Categorical Data," Journal of Educational and Behavioral Statistics, , vol. 43(5), pages 511-539, October.
    5. Stuart R. Lipsitz & Garrett M. Fitzmaurice & Roger D. Weiss, 2020. "Using Multiple Imputation with GEE with Non-monotone Missing Longitudinal Binary Outcomes," Psychometrika, Springer;The Psychometric Society, vol. 85(4), pages 890-904, December.
    6. Kajal Lahiri & Zulkarnain Pulungan, 2006. "Health Inequality and Its Determinants in New York," Discussion Papers 06-03, University at Albany, SUNY, Department of Economics.
    7. Chae, David H. & Lincoln, Karen D. & Adler, Nancy E. & Syme, S. Leonard, 2010. "Do experiences of racial discrimination predict cardiovascular disease among African American men? The moderating role of internalized negative racial group attitudes," Social Science & Medicine, Elsevier, vol. 71(6), pages 1182-1188, September.
    8. Lahiri, Kajal & Pulungan, Zulkarnain, 2007. "Income-related health disparity and its determinants in New York state: racial/ethnic and geographical comparisons," MPRA Paper 21694, University Library of Munich, Germany.
    9. Humera Razzak & Christian Heumann, 2019. "Hybrid Multiple Imputation In A Large Scale Complex Survey," Statistics in Transition New Series, Polish Statistical Association, vol. 20(4), pages 33-58, December.
    10. Razzak Humera & Heumann Christian, 2019. "Hybrid Multiple Imputation In A Large Scale Complex Survey," Statistics in Transition New Series, Statistics Poland, vol. 20(4), pages 33-58, December.
    11. Darrick Yee & Andrew Ho, 2015. "Discreteness Causes Bias in Percentage-Based Comparisons: A Case Study From Educational Testing," The American Statistician, Taylor & Francis Journals, vol. 69(3), pages 174-181, August.
    12. Jörg Drechsler, 2015. "Multiple Imputation of Multilevel Missing Data—Rigor Versus Simplicity," Journal of Educational and Behavioral Statistics, , vol. 40(1), pages 69-95, February.
    13. G. Inan & R. Yucel, 2017. "Joint GEEs for multivariate correlated data with incomplete binary outcomes," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(11), pages 1920-1937, August.
    14. Kristian Kleinke & Jost Reinecke & Cornelia Weins, 2021. "The development of delinquency during adolescence: a comparison of missing data techniques revisited," Quality & Quantity: International Journal of Methodology, Springer, vol. 55(3), pages 877-895, June.
    15. Xu, Wan & Khachatryan, Hayk, 2014. "Multiple Imputation in the Complex National Nursery Survey Data by Fully Conditional Specification," 2014 Annual Meeting, July 27-29, 2014, Minneapolis, Minnesota 170208, Agricultural and Applied Economics Association.
    16. Yucel, Recai M. & Demirtas, Hakan, 2010. "Impact of non-normal random effects on inference by multiple imputation: A simulation assessment," Computational Statistics & Data Analysis, Elsevier, vol. 54(3), pages 790-801, March.
    17. Yeh Jason Jia-Hsing, 2009. "Missing (Completely?) At Random: Lessons from Insurance Studies," Asia-Pacific Journal of Risk and Insurance, De Gruyter, vol. 3(2), pages 1-13, April.
    18. R Florez-Lopez, 2010. "Effects of missing data in credit risk scoring. A comparative analysis of methods to achieve robustness in the absence of sufficient data," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 61(3), pages 486-501, March.
    19. Celeste Combrinck & Vanessa Scherman & David Maree & Sarah Howie, 2018. "Multiple Imputation for Dichotomous MNAR Items Using Recursive Structural Equation Modeling With Rasch Measures as Predictors," SAGE Open, , vol. 8(1), pages 21582440187, February.
    20. Carsten Kuchler & Martin Spiess, 2009. "The data quality concept of accuracy in the context of publicly shared data sets," AStA Wirtschafts- und Sozialstatistisches Archiv, Springer;Deutsche Statistische Gesellschaft - German Statistical Society, vol. 3(1), pages 67-80, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:amstat:v:70:y:2016:i:4:p:358-364. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/UTAS20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.