IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v55y2011i10p2793-2806.html
   My bibliography  Save this article

Iterative stepwise regression imputation using standard and robust methods

Author

Listed:
  • Templ, Matthias
  • Kowarik, Alexander
  • Filzmoser, Peter

Abstract

Imputation of missing values is one of the major tasks for data pre-processing in many areas. Whenever imputation of data from official statistics comes into mind, several (additional) challenges almost always arise, like large data sets, data sets consisting of a mixture of different variable types, or data outliers. The aim is to propose an automatic algorithm called IRMI for iterative model-based imputation using robust methods, encountering for the mentioned challenges, and to provide a software tool in R. This algorithm is compared to the algorithm IVEWARE, which is the "recommended software" for imputations in international and national statistical institutions. Using artificial data and real data sets from official statistics and other fields, the advantages of IRMI over IVEWARE-especially with respect to robustness-are demonstrated.

Suggested Citation

  • Templ, Matthias & Kowarik, Alexander & Filzmoser, Peter, 2011. "Iterative stepwise regression imputation using standard and robust methods," Computational Statistics & Data Analysis, Elsevier, vol. 55(10), pages 2793-2806, October.
  • Handle: RePEc:eee:csdana:v:55:y:2011:i:10:p:2793-2806
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947311001411
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. White, Ian R. & Daniel, Rhian & Royston, Patrick, 2010. "Avoiding bias due to perfect prediction in multiple imputation of incomplete categorical variables," Computational Statistics & Data Analysis, Elsevier, vol. 54(10), pages 2267-2275, October.
    2. Jonathan Fisher, 2006. "Income Imputation and the Analysis of Expenditure Data in the Consumer Expenditure Survey," Working Papers 394, U.S. Bureau of Labor Statistics.
    3. Hron, K. & Templ, M. & Filzmoser, P., 2010. "Imputation of missing values for compositional data using classical and robust methods," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3095-3107, December.
    4. Cantoni E. & Ronchetti E., 2001. "Robust Inference for Generalized Linear Models," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1022-1030, September.
    5. Serneels, Sven & Verdonck, Tim, 2008. "Principal component analysis for data containing outliers and missing elements," Computational Statistics & Data Analysis, Elsevier, vol. 52(3), pages 1712-1727, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Morehart, Mitch & Milkove, Dan & Xu, Yang, 2014. "Multivariate Farm Debt Imputation in the Agricultural Resource Management Survey (ARMS)," 2014 Annual Meeting, July 27-29, 2014, Minneapolis, Minnesota 169401, Agricultural and Applied Economics Association.
    2. Nikola Štefelová & Andreas Alfons & Javier Palarea-Albaladejo & Peter Filzmoser & Karel Hron, 2021. "Robust regression with compositional covariates including cellwise outliers," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(4), pages 869-909, December.
    3. Hapfelmeier, A. & Hothorn, T. & Ulm, K., 2012. "Recursive partitioning on incomplete data using surrogate decisions and multiple imputation," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1552-1565.
    4. Pavlo Mozharovskyi & Julie Josse & François Husson, 2017. "Nonparametric imputation by data depth," Working Papers 2017-72, Center for Research in Economics and Statistics.
    5. Matthias Templ, 2023. "Enhancing Precision in Large-Scale Data Analysis: An Innovative Robust Imputation Algorithm for Managing Outliers and Missing Values," Mathematics, MDPI, vol. 11(12), pages 1-22, June.
    6. Robbins Michael W., 2014. "The Utility of Nonparametric Transformations for Imputation of Survey Data," Journal of Official Statistics, Sciendo, vol. 30(4), pages 1-26, December.
    7. Frahm, Gabriel & Nordhausen, Klaus & Oja, Hannu, 2020. "M-estimation with incomplete and dependent multivariate data," Journal of Multivariate Analysis, Elsevier, vol. 176(C).
    8. Kowarik, Alexander & Templ, Matthias, 2016. "Imputation with the R Package VIM," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 74(i07).
    9. Arthur Stepchenko & Jurij Chizhov & Ludmila Aleksejeva, 2018. "Transfer of the data preprocessing parameters and fore- casting models," Journal of Advances in Technology and Engineering Research, A/Professor Akbar A. Khatibi, vol. 4(6), pages 214-221.
    10. Alfons, Andreas & Templ, Matthias, 2013. "Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 54(i15).
    11. Gerko Vink & Laurence E. Frank & Jeroen Pannekoek & Stef Buuren, 2014. "Predictive mean matching imputation of semicontinuous variables," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 68(1), pages 61-90, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Frahm, Gabriel & Nordhausen, Klaus & Oja, Hannu, 2020. "M-estimation with incomplete and dependent multivariate data," Journal of Multivariate Analysis, Elsevier, vol. 176(C).
    2. Bianco, Ana M. & Martínez, Elena, 2009. "Robust testing in the logistic regression model," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4095-4105, October.
    3. Florence Bouvet & Chong-Uk Kim, 2014. "Are US imports really hurting US households?: an analysis of the relationship between US households' consumption and US imports," Global Business and Economics Review, Inderscience Enterprises Ltd, vol. 16(2), pages 157-178.
    4. Doidge, James C & Higgins, Daryl J & Delfabbro, Paul & Edwards, Ben & Vassallo, Suzanne & Toumbourou, John W & Segal, Leonie, 2017. "Economic predictors of child maltreatment in an Australian population-based birth cohort," Children and Youth Services Review, Elsevier, vol. 72(C), pages 14-25.
    5. Hill, Jonathan B. & Prokhorov, Artem, 2016. "GEL estimation for heavy-tailed GARCH models with robust empirical likelihood inference," Journal of Econometrics, Elsevier, vol. 190(1), pages 18-45.
    6. Ferrari, Pier Alda & Annoni, Paola & Barbiero, Alessandro & Manzi, Giancarlo, 2011. "An imputation method for categorical variables with application to nonlinear principal component analysis," Computational Statistics & Data Analysis, Elsevier, vol. 55(7), pages 2410-2420, July.
    7. Fiaschi, Davide & Giuliani, Elisa & Nieri, Federica & Salvati, Nicola, 2020. "How bad is your company? Measuring corporate wrongdoing beyond the magic of ESG metrics," Business Horizons, Elsevier, vol. 63(3), pages 287-299.
    8. Ricardo A. Maronna & Victor J. Yohai, 2021. "Optimal robust estimators for families of distributions on the integers," Statistical Papers, Springer, vol. 62(5), pages 2269-2281, October.
    9. García-Escudero, L.A. & Gordaliza, A. & Mayo-Iscar, A. & San Martín, R., 2010. "Robust clusterwise linear regression through trimming," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3057-3069, December.
    10. Krichene, H. & Geiger, T. & Frieler, K. & Willner, S.N. & Sauer, I. & Otto, C., 2021. "Long-term impacts of tropical cyclones and fluvial floods on economic growth – Empirical evidence on transmission channels at different levels of development," World Development, Elsevier, vol. 144(C).
    11. Cantoni, Eva & Ronchetti, Elvezio, 2006. "A robust approach for skewed and heavy-tailed outcomes in the analysis of health care expenditures," Journal of Health Economics, Elsevier, vol. 25(2), pages 198-213, March.
    12. Tutz, Gerhard & Ramzan, Shahla, 2015. "Improved methods for the imputation of missing data by nearest neighbor methods," Computational Statistics & Data Analysis, Elsevier, vol. 90(C), pages 84-99.
    13. Bruce D. Meyer & Wallace K. C. Mok & James X. Sullivan, 2009. "The Under-Reporting of Transfers in Household Surveys: Its Nature and Consequences," NBER Working Papers 15181, National Bureau of Economic Research, Inc.
    14. Takahiro Yoshida & Morito Tsutsumi, 2018. "On the effects of spatial relationships in spatial compositional multivariate models," Letters in Spatial and Resource Sciences, Springer, vol. 11(1), pages 57-70, March.
    15. Ahmad R. Alsaber & Jiazhu Pan & Adeeba Al-Hurban, 2021. "Handling Complex Missing Data Using Random Forest Approach for an Air Quality Monitoring Dataset: A Case Study of Kuwait Environmental Data (2012 to 2018)," IJERPH, MDPI, vol. 18(3), pages 1-25, February.
    16. Stoklosa, Jakub & Huggins, Richard M., 2012. "A robust P-spline approach to closed population capture–recapture models with time dependence and heterogeneity," Computational Statistics & Data Analysis, Elsevier, vol. 56(2), pages 408-417.
    17. Maria Anna Di Palma & Michele Gallo, 2019. "External Information Model in a Compositional Perspective: Evaluation of Campania Adolescents’ Preferences in the Allocation of Leisure-Time," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 146(1), pages 117-133, November.
    18. Nengsih Titin Agustin & Bertrand Frédéric & Maumy-Bertrand Myriam & Meyer Nicolas, 2019. "Determining the number of components in PLS regression on incomplete data set," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 18(6), pages 1-28, December.
    19. Zhang, Yuexia & Qin, Guoyou & Zhu, Zhongyi & Xu, Wanghong, 2019. "A novel robust approach for analysis of longitudinal data," Computational Statistics & Data Analysis, Elsevier, vol. 138(C), pages 83-95.
    20. Ghosh, Abhik & Mandal, Abhijit & Martín, Nirian & Pardo, Leandro, 2016. "Influence analysis of robust Wald-type tests," Journal of Multivariate Analysis, Elsevier, vol. 147(C), pages 102-126.

    More about this item

    Keywords

    Regression imputation Robustness R;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:55:y:2011:i:10:p:2793-2806. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.