IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v55y2011i10p2793-2806.html
   My bibliography  Save this article

Iterative stepwise regression imputation using standard and robust methods

Author

Listed:
  • Templ, Matthias
  • Kowarik, Alexander
  • Filzmoser, Peter

Abstract

Imputation of missing values is one of the major tasks for data pre-processing in many areas. Whenever imputation of data from official statistics comes into mind, several (additional) challenges almost always arise, like large data sets, data sets consisting of a mixture of different variable types, or data outliers. The aim is to propose an automatic algorithm called IRMI for iterative model-based imputation using robust methods, encountering for the mentioned challenges, and to provide a software tool in R. This algorithm is compared to the algorithm IVEWARE, which is the "recommended software" for imputations in international and national statistical institutions. Using artificial data and real data sets from official statistics and other fields, the advantages of IRMI over IVEWARE-especially with respect to robustness-are demonstrated.

Suggested Citation

  • Templ, Matthias & Kowarik, Alexander & Filzmoser, Peter, 2011. "Iterative stepwise regression imputation using standard and robust methods," Computational Statistics & Data Analysis, Elsevier, vol. 55(10), pages 2793-2806, October.
  • Handle: RePEc:eee:csdana:v:55:y:2011:i:10:p:2793-2806
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947311001411
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. White, Ian R. & Daniel, Rhian & Royston, Patrick, 2010. "Avoiding bias due to perfect prediction in multiple imputation of incomplete categorical variables," Computational Statistics & Data Analysis, Elsevier, vol. 54(10), pages 2267-2275, October.
    2. Cantoni E. & Ronchetti E., 2001. "Robust Inference for Generalized Linear Models," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1022-1030, September.
    3. Jonathan Fisher, 2006. "Income Imputation and the Analysis of Expenditure Data in the Consumer Expenditure Survey," Working Papers 394, U.S. Bureau of Labor Statistics.
    4. Hron, K. & Templ, M. & Filzmoser, P., 2010. "Imputation of missing values for compositional data using classical and robust methods," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3095-3107, December.
    5. Serneels, Sven & Verdonck, Tim, 2008. "Principal component analysis for data containing outliers and missing elements," Computational Statistics & Data Analysis, Elsevier, vol. 52(3), pages 1712-1727, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Morehart, Mitch & Milkove, Dan & Xu, Yang, 2014. "Multivariate Farm Debt Imputation in the Agricultural Resource Management Survey (ARMS)," 2014 Annual Meeting, July 27-29, 2014, Minneapolis, Minnesota 169401, Agricultural and Applied Economics Association.
    2. Gerko Vink & Laurence E. Frank & Jeroen Pannekoek & Stef Buuren, 2014. "Predictive mean matching imputation of semicontinuous variables," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 68(1), pages 61-90, February.
    3. Nikola Štefelová & Andreas Alfons & Javier Palarea-Albaladejo & Peter Filzmoser & Karel Hron, 2021. "Robust regression with compositional covariates including cellwise outliers," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(4), pages 869-909, December.
    4. Hapfelmeier, A. & Hothorn, T. & Ulm, K., 2012. "Recursive partitioning on incomplete data using surrogate decisions and multiple imputation," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1552-1565.
    5. Pavlo Mozharovskyi & Julie Josse & François Husson, 2017. "Nonparametric imputation by data depth," Working Papers 2017-72, Center for Research in Economics and Statistics.
    6. Matthias Templ, 2023. "Enhancing Precision in Large-Scale Data Analysis: An Innovative Robust Imputation Algorithm for Managing Outliers and Missing Values," Mathematics, MDPI, vol. 11(12), pages 1-22, June.
    7. Robbins Michael W., 2014. "The Utility of Nonparametric Transformations for Imputation of Survey Data," Journal of Official Statistics, Sciendo, vol. 30(4), pages 675-700, December.
    8. Frahm, Gabriel & Nordhausen, Klaus & Oja, Hannu, 2020. "M-estimation with incomplete and dependent multivariate data," Journal of Multivariate Analysis, Elsevier, vol. 176(C).
    9. Kowarik, Alexander & Templ, Matthias, 2016. "Imputation with the R Package VIM," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 74(i07).
    10. Arthur Stepchenko & Jurij Chizhov & Ludmila Aleksejeva, 2018. "Transfer of the data preprocessing parameters and fore- casting models," Journal of Advances in Technology and Engineering Research, A/Professor Akbar A. Khatibi, vol. 4(6), pages 214-221.
    11. Alfons, Andreas & Templ, Matthias, 2013. "Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 54(i15).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Frahm, Gabriel & Nordhausen, Klaus & Oja, Hannu, 2020. "M-estimation with incomplete and dependent multivariate data," Journal of Multivariate Analysis, Elsevier, vol. 176(C).
    2. Nikola Štefelová & Andreas Alfons & Javier Palarea-Albaladejo & Peter Filzmoser & Karel Hron, 2021. "Robust regression with compositional covariates including cellwise outliers," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(4), pages 869-909, December.
    3. Bianco, Ana M. & Martínez, Elena, 2009. "Robust testing in the logistic regression model," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4095-4105, October.
    4. Vincent Bauer & Keven Ruby & Robert Pape, 2017. "Solving the Problem of Unattributed Political Violence," Journal of Conflict Resolution, Peace Science Society (International), vol. 61(7), pages 1537-1564, August.
    5. Lô, Serigne N. & Ronchetti, Elvezio, 2009. "Robust and accurate inference for generalized linear models," Journal of Multivariate Analysis, Elsevier, vol. 100(9), pages 2126-2136, October.
    6. Martin, Eisele & Zhu, Junyi, 2013. "Multiple imputation in a complex household survey - the German Panel on Household Finances (PHF): challenges and solutions," MPRA Paper 57666, University Library of Munich, Germany.
    7. Florence Bouvet & Chong-Uk Kim, 2014. "Are US imports really hurting US households?: an analysis of the relationship between US households' consumption and US imports," Global Business and Economics Review, Inderscience Enterprises Ltd, vol. 16(2), pages 157-178.
    8. Giulia Romano & Nicola Salvati & Andrea Guerrini, 2014. "Factors Affecting Water Utility Companies’ Decision to Promote the Reduction of Household Water Consumption," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 28(15), pages 5491-5505, December.
    9. Andrea A. Naghi & Máté Váradi & Mikhail Zhelonkin, 2021. "Robust Estimation of Probit Models with Endogeneity," Tinbergen Institute Discussion Papers 21-004/III, Tinbergen Institute.
    10. Miron, Julien & Poilane, Benjamin & Cantoni, Eva, 2022. "Robust polytomous logistic regression," Computational Statistics & Data Analysis, Elsevier, vol. 176(C).
    11. Caroline J. Dodd-Reynolds & Dimitris Vallis & Adetayo Kasim & Nasima Akhter & Coral L. Hanson, 2020. "The Northumberland Exercise Referral Scheme as a Universal Community Weight Management Programme: A Mixed Methods Exploration of Outcomes, Expectations and Experiences across a Social Gradient," IJERPH, MDPI, vol. 17(15), pages 1-21, July.
    12. Jonathan Fisher & David S. Johnson & Timothy M. Smeeding, 2015. "Inequality of Income and Consumption in the U.S.: Measuring the Trends in Inequality from 1984 to 2011 for the Same Individuals," Review of Income and Wealth, International Association for Research in Income and Wealth, vol. 61(4), pages 630-650, December.
    13. Doidge, James C & Higgins, Daryl J & Delfabbro, Paul & Edwards, Ben & Vassallo, Suzanne & Toumbourou, John W & Segal, Leonie, 2017. "Economic predictors of child maltreatment in an Australian population-based birth cohort," Children and Youth Services Review, Elsevier, vol. 72(C), pages 14-25.
    14. Frahm, Gabriel & Jaekel, Uwe, 2010. "A generalization of Tyler's M-estimators to the case of incomplete data," Computational Statistics & Data Analysis, Elsevier, vol. 54(2), pages 374-393, February.
    15. Hill, Jonathan B. & Prokhorov, Artem, 2016. "GEL estimation for heavy-tailed GARCH models with robust empirical likelihood inference," Journal of Econometrics, Elsevier, vol. 190(1), pages 18-45.
    16. Ferrari, Pier Alda & Annoni, Paola & Barbiero, Alessandro & Manzi, Giancarlo, 2011. "An imputation method for categorical variables with application to nonlinear principal component analysis," Computational Statistics & Data Analysis, Elsevier, vol. 55(7), pages 2410-2420, July.
    17. Fiaschi, Davide & Giuliani, Elisa & Nieri, Federica & Salvati, Nicola, 2020. "How bad is your company? Measuring corporate wrongdoing beyond the magic of ESG metrics," Business Horizons, Elsevier, vol. 63(3), pages 287-299.
    18. Ricardo A. Maronna & Victor J. Yohai, 2021. "Optimal robust estimators for families of distributions on the integers," Statistical Papers, Springer, vol. 62(5), pages 2269-2281, October.
    19. Graciela Boente & Daniela Rodriguez & Pablo Vena, 2020. "Robust estimators in a generalized partly linear regression model under monotony constraints," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 29(1), pages 50-89, March.
    20. Neykov, N.M. & Filzmoser, P. & Neytchev, P.N., 2012. "Robust joint modeling of mean and dispersion through trimming," Computational Statistics & Data Analysis, Elsevier, vol. 56(1), pages 34-48, January.

    More about this item

    Keywords

    Regression imputation Robustness R;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:55:y:2011:i:10:p:2793-2806. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.