IDEAS home Printed from https://ideas.repec.org/a/spr/stpapr/v55y2014i1p15-28.html
   My bibliography  Save this article

Shape bias of robust covariance estimators: an empirical study

Author

Listed:
  • M. Hubert
  • P. Rousseeuw
  • K. Vakili

Abstract

Detecting outliers in a multivariate point cloud is not trivial, especially when dealing with a sizable fraction of contamination. Over time, it has increasingly been recognized that the safest and most feasible approach to exposing outliers starts by computing a highly robust estimator of location and scatter that can withstand a large proportion of contamination. Many such estimators have been proposed in recent years. We will compare the worst-case bias of several prominent robust multivariate estimators by means of simulation. We also propose a new tool to compare robust estimators on real data sets, and illustrate it. Copyright Springer-Verlag Berlin Heidelberg 2014

Suggested Citation

  • M. Hubert & P. Rousseeuw & K. Vakili, 2014. "Shape bias of robust covariance estimators: an empirical study," Statistical Papers, Springer, vol. 55(1), pages 15-28, February.
  • Handle: RePEc:spr:stpapr:v:55:y:2014:i:1:p:15-28
    DOI: 10.1007/s00362-013-0544-8
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1007/s00362-013-0544-8
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1007/s00362-013-0544-8?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Vanja Dukic & Hedibert F. Lopes & Nicholas G. Polson, 2012. "Tracking Epidemics With Google Flu Trends Data and a State-Space SEIR Model," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(500), pages 1410-1426, December.
    2. Hubert, M. & Vandervieren, E., 2008. "An adjusted boxplot for skewed distributions," Computational Statistics & Data Analysis, Elsevier, vol. 52(12), pages 5186-5201, August.
    3. Billor, Nedret & Hadi, Ali S. & Velleman, Paul F., 2000. "BACON: blocked adaptive computationally efficient outlier nominators," Computational Statistics & Data Analysis, Elsevier, vol. 34(3), pages 279-298, September.
    4. Jeremy Ginsberg & Matthew H. Mohebbi & Rajan S. Patel & Lynnette Brammer & Mark S. Smolinski & Larry Brilliant, 2009. "Detecting influenza epidemics using search engine query data," Nature, Nature, vol. 457(7232), pages 1012-1014, February.
    5. Salibian-Barrera, Matias & Van Aelst, Stefan & Willems, Gert, 2006. "Principal Components Analysis Based on Multivariate MM Estimators With Fast and Robust Bootstrap," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1198-1211, September.
    6. Debruyne, M. & Hubert, M., 2009. "The influence function of the Stahel-Donoho covariance estimator of smallest outlyingness," Statistics & Probability Letters, Elsevier, vol. 79(3), pages 275-282, February.
    7. Todorov, Valentin & Filzmoser, Peter, 2009. "An Object-Oriented Framework for Robust Multivariate Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 32(i03).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Tarr, G. & Müller, S. & Weber, N.C., 2016. "Robust estimation of precision matrices under cellwise contamination," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 404-420.
    2. Claudio Agostinelli & Andy Leung & Victor Yohai & Ruben Zamar, 2015. "Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 24(3), pages 441-461, September.
    3. Jakob Raymaekers & Peter J. Rousseeuw & Iwein Vranckx, 2018. "Discussion of “The power of monitoring: how to make the most of a contaminated multivariate sample”," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 27(4), pages 589-594, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Marco Riani & Andrea Cerioli & Francesca Torti, 2014. "On consistency factors and efficiency of robust S-estimators," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 23(2), pages 356-387, June.
    2. Vilijandas Bagdonavičius & Linas Petkevičius, 2020. "A new multiple outliers identification method in linear regression," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 83(3), pages 275-296, April.
    3. Graciela Boente & Matías Salibián-Barrera, 2021. "Robust functional principal components for sparse longitudinal data," METRON, Springer;Sapienza Università di Roma, vol. 79(2), pages 159-188, August.
    4. Taesik Lee & Hayong Shin, 2016. "Combining syndromic surveillance and ILI data using particle filter for epidemic state estimation," Flexible Services and Manufacturing Journal, Springer, vol. 28(1), pages 233-253, June.
    5. Robert Finger, 2010. "Review of ‘Robustbase’ software for R," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 25(7), pages 1205-1210, November/.
    6. Zhengming Xing & Bradley Nicholson & Monica Jimenez & Timothy Veldman & Lori Hudson & Joseph Lucas & David Dunson & Aimee K. Zaas & Christopher W. Woods & Geoffrey S. Ginsburg & Lawrence Carin, 2014. "Bayesian modeling of temporal properties of infectious disease in a college student population," Journal of Applied Statistics, Taylor & Francis Journals, vol. 41(6), pages 1358-1382, June.
    7. Amir Hassan Zadeh & Hamed M. Zolbanin & Ramesh Sharda & Dursun Delen, 2019. "Social Media for Nowcasting Flu Activity: Spatio-Temporal Big Data Analysis," Information Systems Frontiers, Springer, vol. 21(4), pages 743-760, August.
    8. Garciga, Christian & Verbrugge, Randal, 2021. "Robust covariance matrix estimation and identification of unusual data points: New tools," Research in Economics, Elsevier, vol. 75(2), pages 176-202.
    9. Cevallos-Valdiviezo, Holger & Van Aelst, Stefan, 2019. "Fast computation of robust subspace estimators," Computational Statistics & Data Analysis, Elsevier, vol. 134(C), pages 171-185.
    10. Haolun Shi & Jiguo Cao, 2022. "Robust Functional Principal Component Analysis Based on a New Regression Framework," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 27(3), pages 523-543, September.
    11. Mostafa Abbas & Thomas B. Morland & Eric S. Hall & Yasser EL-Manzalawy, 2021. "Associations between Google Search Trends for Symptoms and COVID-19 Confirmed and Death Cases in the United States," IJERPH, MDPI, vol. 18(9), pages 1-24, April.
    12. Valentin Todorov & Matthias Templ & Peter Filzmoser, 2011. "Detection of multivariate outliers in business survey data with incomplete information," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 5(1), pages 37-56, April.
    13. Baek, Changryong & Davis, Richard A. & Pipiras, Vladas, 2017. "Sparse seasonal and periodic vector autoregressive modeling," Computational Statistics & Data Analysis, Elsevier, vol. 106(C), pages 103-126.
    14. Mia Hubert & Peter Rousseeuw & Pieter Segaert, 2015. "Multivariate functional outlier detection," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 24(2), pages 177-202, July.
    15. David H Chae & Sean Clouston & Mark L Hatzenbuehler & Michael R Kramer & Hannah L F Cooper & Sacoby M Wilson & Seth I Stephens-Davidowitz & Robert S Gold & Bruce G Link, 2015. "Association between an Internet-Based Measure of Area Racism and Black Mortality," PLOS ONE, Public Library of Science, vol. 10(4), pages 1-12, April.
    16. Catherine Fuss & Angelos Theodorakopoulos, 2018. "Compositional Changes in Aggregate Productivity in an Era of Globalisation and Financial Crisis," Working Papers of VIVES - Research Centre for Regional Economics 627696, KU Leuven, Faculty of Economics and Business (FEB), VIVES - Research Centre for Regional Economics.
    17. Xiaoli Wang & Shuangsheng Wu & C Raina MacIntyre & Hongbin Zhang & Weixian Shi & Xiaomin Peng & Wei Duan & Peng Yang & Yi Zhang & Quanyi Wang, 2015. "Using an Adjusted Serfling Regression Model to Improve the Early Warning at the Arrival of Peak Timing of Influenza in Beijing," PLOS ONE, Public Library of Science, vol. 10(3), pages 1-14, March.
    18. L. Pitsoulis & G. Zioutas, 2010. "A fast algorithm for robust regression with penalised trimmed squares," Computational Statistics, Springer, vol. 25(4), pages 663-689, December.
    19. Cristian BARRA & Roberto ZOTTI, 2019. "Bank Performance, Financial Stability And Market Concentration: Evidence From Cooperative And Non‐Cooperative Banks," Annals of Public and Cooperative Economics, Wiley Blackwell, vol. 90(1), pages 103-139, March.
    20. Ishani Chaudhuri & Parthajit Kayal, 2022. "Predicting Power of Ticker Search Volume in Indian Stock Market," Working Papers 2022-214, Madras School of Economics,Chennai,India.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stpapr:v:55:y:2014:i:1:p:15-28. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.