IDEAS home Printed from https://ideas.repec.org/p/crs/wpaper/2017-72.html
   My bibliography  Save this paper

Nonparametric imputation by data depth

Author

Listed:
  • Pavlo Mozharovskyi

    (CREST; ENSAI; Université Bretagne Loire)

  • Julie Josse

    (CMAP; Ecole polytechnique)

  • François Husson

    (IRMAR; Applied Mathematics Unit; Agrocampus Ouest)

Abstract

The presented methodology for single imputation of missing values borrows the idea from data depth — a measure of centrality defined for an arbitrary point of the space with respect to a probability distribution or a data cloud. This consists in iterative maximization of the depth of each observation with missing values, and can be employed with any properly defined statistical depth function. On each single iteration, imputation is narrowed down to optimization of quadratic, linear, or quasiconcave function being solved analytically, by linear programming, or the Nelder-Mead method, respectively. Being able to grasp the underlying data topology, the procedure is distribution free, allows to impute close to the data, preserves prediction possibilities different to local imputation methods (k-nearest neighbors, random forest), and has attractive robustness and asymptotic properties under elliptical symmetry. It is shown that its particular case — when using Mahalanobis depth — has direct connection to well known treatments for multivariate normal model, such as iterated regression or regularized PCA. The methodology is extended to the multiple imputation for data stemming from an elliptically symmetric distribution. Simulation and real data studies positively contrast the procedure with existing popular alternatives. The method has been implemented as an R-package.

Suggested Citation

  • Pavlo Mozharovskyi & Julie Josse & François Husson, 2017. "Nonparametric imputation by data depth," Working Papers 2017-72, Center for Research in Economics and Statistics.
  • Handle: RePEc:crs:wpaper:2017-72
    as

    Download full text from publisher

    File URL: http://crest.science/RePEc/wpstorage/2017-72.pdf
    File Function: CREST working paper version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Pavel Bazovkin & Karl Mosler, 2015. "A general solution for robust linear programs with distortion risk constraints," Annals of Operations Research, Springer, vol. 229(1), pages 103-120, June.
    2. Marc Hallin & Davy Paindaveine & Miroslav Siman, 2008. "Multivariate quantiles and multiple-output regression quantiles: from L1 optimization to halfspace depth," Working Papers ECARES 2008_042, ULB -- Universite Libre de Bruxelles.
    3. Templ, Matthias & Kowarik, Alexander & Filzmoser, Peter, 2011. "Iterative stepwise regression imputation using standard and robust methods," Computational Statistics & Data Analysis, Elsevier, vol. 55(10), pages 2793-2806, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mia Hubert & Peter Rousseeuw & Pieter Segaert, 2015. "Multivariate functional outlier detection," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 24(2), pages 177-202, July.
    2. Hemant Kulkarni & Jayabrata Biswas & Kiranmoy Das, 2019. "A joint quantile regression model for multiple longitudinal outcomes," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 103(4), pages 453-473, December.
    3. María Edo & Walter Sosa Escudero & Marcela Svarc, 2021. "A multidimensional approach to measuring the middle class," The Journal of Economic Inequality, Springer;Society for the Study of Economic Inequality, vol. 19(1), pages 139-162, March.
    4. Gerko Vink & Laurence E. Frank & Jeroen Pannekoek & Stef Buuren, 2014. "Predictive mean matching imputation of semicontinuous variables," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 68(1), pages 61-90, February.
    5. Sarno, Lucio & Schneider, Paul & Wagner, Christian, 2012. "Properties of foreign exchange risk premiums," Journal of Financial Economics, Elsevier, vol. 105(2), pages 279-310.
    6. Dyckerhoff, Rainer & Mozharovskyi, Pavlo, 2016. "Exact computation of the halfspace depth," Computational Statistics & Data Analysis, Elsevier, vol. 98(C), pages 19-30.
    7. L. Jeff Hong & Zhiyuan Huang & Henry Lam, 2021. "Learning-Based Robust Optimization: Procedures and Statistical Guarantees," Management Science, INFORMS, vol. 67(6), pages 3447-3467, June.
    8. Arthur Stepchenko & Jurij Chizhov & Ludmila Aleksejeva, 2018. "Transfer of the data preprocessing parameters and fore- casting models," Journal of Advances in Technology and Engineering Research, A/Professor Akbar A. Khatibi, vol. 4(6), pages 214-221.
    9. Dette, Holger & Hoderlein, Stefan & Neumeyer, Natalie, 2016. "Testing multivariate economic restrictions using quantiles: The example of Slutsky negative semidefiniteness," Journal of Econometrics, Elsevier, vol. 191(1), pages 129-144.
    10. Bazovkin, Pavel, 2014. "Geometrical framework for robust portfolio optimization," Discussion Papers in Econometrics and Statistics 01/14, University of Cologne, Institute of Econometrics and Statistics.
    11. Einmahl, J.H.J. & Li, Jun & Liu, Regina, 2015. "Bridging Centrality and Extremity : Refining Empirical Data Depth using Extreme Value Statistics," Discussion Paper 2015-020, Tilburg University, Center for Economic Research.
    12. repec:spo:wpmain:info:hdl:2441/3qnaslliat80pbqa8t90240unj is not listed on IDEAS
    13. Guillaume Carlier & Victor Chernozhukov & Alfred Galichon, 2016. "Vector Quantile Regression: An Optimal Transport Approach," SciencePo Working papers hal-03567920, HAL.
    14. Paindaveine, Davy & Šiman, Miroslav, 2012. "Computing multiple-output regression quantile regions," Computational Statistics & Data Analysis, Elsevier, vol. 56(4), pages 840-853.
    15. Victor Chernozhukov & Alfred Galichon & Marc Hallin & Marc Henry, 2014. "Monge-Kantorovich Depth, Quantiles, Ranks, and Signs," Papers 1412.8434, arXiv.org, revised Sep 2015.
    16. Ra'ul Torres & Rosa E. Lillo & Henry Laniado, 2015. "A Directional Multivariate Value at Risk," Papers 1502.00908, arXiv.org.
    17. Merlo, Luca & Petrella, Lea & Salvati, Nicola & Tzavidis, Nikos, 2022. "Marginal M-quantile regression for multivariate dependent data," Computational Statistics & Data Analysis, Elsevier, vol. 173(C).
    18. Sulkhan Chavleishvili & Simone Manganelli, 2024. "Forecasting and stress testing with quantile vector autoregression," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 39(1), pages 66-85, January.
    19. Xiaohui Liu, 2017. "Fast implementation of the Tukey depth," Computational Statistics, Springer, vol. 32(4), pages 1395-1410, December.
    20. Guillaume Carlier & Victor Chernozhukov & Alfred Galichon, 2014. "Vector quantile regression," CeMMAP working papers CWP48/14, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    21. Nadja Klein & Thomas Kneib, 2020. "Directional bivariate quantiles: a robust approach based on the cumulative distribution function," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 104(2), pages 225-260, June.

    More about this item

    Keywords

    Elliptical symmetry; Outliers; Tukey depth; Zonoid depth; Nonparametric imputation; Convex optimization;
    All these keywords.

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:crs:wpaper:2017-72. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Secretariat General (email available below). General contact details of provider: https://edirc.repec.org/data/crestfr.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.