IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v55y2011i1p544-553.html
   My bibliography  Save this article

Error rates for multivariate outlier detection

Author

Listed:
  • Cerioli, Andrea
  • Farcomeni, Alessio

Abstract

Multivariate outlier identification requires the choice of reliable cut-off points for the robust distances that measure the discrepancy from the fit provided by high-breakdown estimators of location and scatter. Multiplicity issues affect the identification of the appropriate cut-off points. It is described how a careful choice of the error rate which is controlled during the outlier detection process can yield a good compromise between high power and low swamping, when alternatives to the Family Wise Error Rate are considered. Multivariate outlier detection rules based on the False Discovery Rate and the False Discovery Exceedance criteria are proposed. The properties of these rules are evaluated through simulation. The rules are then applied to real data examples. The conclusion is that the proposed approach provides a sensible strategy in many situations of practical interest.

Suggested Citation

  • Cerioli, Andrea & Farcomeni, Alessio, 2011. "Error rates for multivariate outlier detection," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 544-553, January.
  • Handle: RePEc:eee:csdana:v:55:y:2011:i:1:p:544-553
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167-9473(10)00222-7
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Stephan Morgenthaler, 2007. "A survey of robust statistics," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 15(3), pages 271-293, February.
    2. Maronna, Ricardo A. & Yohai, Victor J., 2010. "Correcting MM estimates for "fat" data sets," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3168-3173, December.
    3. Marco Riani & Anthony C. Atkinson & Andrea Cerioli, 2009. "Finding an unknown number of multivariate outliers," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(2), pages 447-466, April.
    4. Todorov, Valentin & Filzmoser, Peter, 2010. "Robust statistic for the one-way MANOVA," Computational Statistics & Data Analysis, Elsevier, vol. 54(1), pages 37-48, January.
    5. Filzmoser, Peter & Maronna, Ricardo & Werner, Mark, 2008. "Outlier identification in high dimensions," Computational Statistics & Data Analysis, Elsevier, vol. 52(3), pages 1694-1711, January.
    6. Christophe Croux & Catherine Dehon, 2010. "Influence functions of the Spearman and Kendall correlation measures," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 19(4), pages 497-515, November.
    7. Stephan Morgenthaler, 2007. "A survey of robust statistics," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 16(1), pages 171-172, June.
    8. van der Laan Mark J. & Dudoit Sandrine & Pollard Katherine S., 2004. "Augmentation Procedures for Control of the Generalized Family-Wise Error Rate and Tail Probabilities for the Proportion of False Positives," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 3(1), pages 1-27, June.
    9. Stephan Morgenthaler, 2007. "A survey of robust statistics," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 15(3), pages 271-293, February.
    10. Mark van der Laan & Sandrine Dudoit & Katherine Pollard, 2004. "Multiple Testing. Part III. Procedures for Control of the Generalized Family-Wise Error Rate and Proportion of False Positives," U.C. Berkeley Division of Biostatistics Working Paper Series 1140, Berkeley Electronic Press.
    11. Guo Wenge & Romano Joseph, 2007. "A Generalized Sidak-Holm Procedure and Control of Generalized Error Rates under Independence," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 6(1), pages 1-35, January.
    12. Alessio Farcomeni, 2007. "Some Results on the Control of the False Discovery Rate under Dependence," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 34(2), pages 275-297, June.
    13. Cerioli, Andrea, 2010. "Multivariate Outlier Detection With High-Breakdown Estimators," Journal of the American Statistical Association, American Statistical Association, vol. 105(489), pages 147-156.
    14. Alessio Farcomeni, 2009. "Generalized Augmentation to Control the False Discovery Exceedance in Multiple Testing," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 36(3), pages 501-517, September.
    15. John D. Storey, 2002. "A direct approach to false discovery rates," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(3), pages 479-498, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Davide Nicola Continanza & Andrea del Monaco & Marco di Lucido & Daniele Figoli & Pasquale Maddaloni & Filippo Quarta & Giuseppe Turturiello, 2023. "Stacking machine learning models for anomaly detection: comparing AnaCredit to other banking data sets," IFC Bulletins chapters, in: Bank for International Settlements (ed.), Data science in central banking: applications and tools, volume 59, Bank for International Settlements.
    2. Alessio Farcomeni & Luca Greco, 2015. "S-estimation of hidden Markov models," Computational Statistics, Springer, vol. 30(1), pages 57-80, March.
    3. Archimbaud, Aurore & Nordhausen, Klaus & Ruiz-Gazen, Anne, 2018. "ICS for multivariate outlier detection with application to quality control," Computational Statistics & Data Analysis, Elsevier, vol. 128(C), pages 184-199.
    4. Riani, Marco & Atkinson, Anthony Curtis & Corbellini, Aldo & Farcomeni, Alessio & Laurini, Fabrizio, 2024. "Information Criteria for Outlier Detection Avoiding Arbitrary Significance Levels," Econometrics and Statistics, Elsevier, vol. 29(C), pages 189-205.
    5. Silvia Salini & Andrea Cerioli & Fabrizio Laurini & Marco Riani, 2016. "Reliable Robust Regression Diagnostics," International Statistical Review, International Statistical Institute, vol. 84(1), pages 99-127, April.
    6. Andrea Cerioli & Marco Riani & Anthony C. Atkinson & Aldo Corbellini, 2018. "The power of monitoring: how to make the most of a contaminated multivariate sample," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 27(4), pages 559-587, December.
    7. Claudio Agostinelli & Luca Greco, 2019. "Weighted likelihood estimation of multivariate location and scatter," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(3), pages 756-784, September.
    8. Luca Greco, 2022. "Robust fitting of mixtures of GLMs by weighted likelihood," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 106(1), pages 25-48, March.
    9. Jan Kalina & Jan Tichavský, 2022. "The minimum weighted covariance determinant estimator for high-dimensional data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(4), pages 977-999, December.
    10. Van Aelst, S. & Vandervieren, E. & Willems, G., 2012. "A Stahel–Donoho estimator based on huberized outlyingness," Computational Statistics & Data Analysis, Elsevier, vol. 56(3), pages 531-542.
    11. Luca Greco & Antonio Lucadamo & Claudio Agostinelli, 2021. "Weighted likelihood latent class linear regression," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(2), pages 711-746, June.
    12. Anthony C. Atkinson & Andrea Cerioli & Marco Riani, 2016. "Discussion of ‘Asymptotic Theory of Outlier Detection Algorithms for Linear Time Series Regression Models’ by Johansen and Nielsen," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 43(2), pages 349-352, June.
    13. Cerioli, Andrea & Farcomeni, Alessio & Riani, Marco, 2013. "Robust distances for outlier-free goodness-of-fit testing," Computational Statistics & Data Analysis, Elsevier, vol. 65(C), pages 29-45.
    14. Luca Greco & Alessio Farcomeni, 2016. "A plug-in approach to sparse and robust principal component analysis," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(3), pages 449-481, September.
    15. Sugasawa, Shonosuke & Kobayashi, Genya, 2022. "Robust fitting of mixture models using weighted complete estimating equations," Computational Statistics & Data Analysis, Elsevier, vol. 174(C).
    16. Lourenço, V.M. & Pires, A.M., 2014. "M-regression, false discovery rates and outlier detection with application to genetic association studies," Computational Statistics & Data Analysis, Elsevier, vol. 78(C), pages 33-42.
    17. Francesco Dotto & Alessio Farcomeni & Luis Angel García-Escudero & Agustín Mayo-Iscar, 2017. "A fuzzy approach to robust regression clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 11(4), pages 691-710, December.
    18. Luca Greco & Giovanni Saraceno & Claudio Agostinelli, 2021. "Robust Fitting of a Wrapped Normal Model to Multivariate Circular Data and Outlier Detection," Stats, MDPI, vol. 4(2), pages 1-18, June.
    19. Greco, Luca & Pacillo, Simona & Maresca, Piera, 2023. "An impartial trimming algorithm for robust circle fitting," Computational Statistics & Data Analysis, Elsevier, vol. 181(C).
    20. Luca Greco & Antonio Lucadamo & Pietro Amenta, 2020. "An Impartial Trimming Approach for Joint Dimension and Sample Reduction," Journal of Classification, Springer;The Classification Society, vol. 37(3), pages 769-788, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Alessio Farcomeni, 2009. "Generalized Augmentation to Control the False Discovery Exceedance in Multiple Testing," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 36(3), pages 501-517, September.
    2. Christophe Croux & Catherine Dehon, 2010. "Influence functions of the Spearman and Kendall correlation measures," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 19(4), pages 497-515, November.
    3. Pallavi Basu & Luella Fu & Alessio Saretto & Wenguang Sun, 2021. "Empirical Bayes Control of the False Discovery Exceedance," Working Papers 2115, Federal Reserve Bank of Dallas.
    4. Cerioli, Andrea & Farcomeni, Alessio & Riani, Marco, 2013. "Robust distances for outlier-free goodness-of-fit testing," Computational Statistics & Data Analysis, Elsevier, vol. 65(C), pages 29-45.
    5. L. Finos & A. Farcomeni, 2011. "k-FWER Control without p -value Adjustment, with Application to Detection of Genetic Determinants of Multiple Sclerosis in Italian Twins," Biometrics, The International Biometric Society, vol. 67(1), pages 174-181, March.
    6. Alfons, A. & Ates, N.Y. & Groenen, P.J.F., 2018. "A Robust Bootstrap Test for Mediation Analysis," ERIM Report Series Research in Management ERS-2018-005-MKT, Erasmus Research Institute of Management (ERIM), ERIM is the joint research institute of the Rotterdam School of Management, Erasmus University and the Erasmus School of Economics (ESE) at Erasmus University Rotterdam.
    7. Silvia Salini & Andrea Cerioli & Fabrizio Laurini & Marco Riani, 2016. "Reliable Robust Regression Diagnostics," International Statistical Review, International Statistical Institute, vol. 84(1), pages 99-127, April.
    8. Youssef Allouah & Rachid Guerraoui & L^e-Nguy^en Hoang & Oscar Villemaud, 2022. "Robust Sparse Voting," Papers 2202.08656, arXiv.org, revised Jan 2024.
    9. Guo Wenge & Peddada Shyamal, 2008. "Adaptive Choice of the Number of Bootstrap Samples in Large Scale Multiple Testing," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(1), pages 1-21, March.
    10. Eugster, Manuel J.A. & Leisch, Friedrich, 2011. "Weighted and robust archetypal analysis," Computational Statistics & Data Analysis, Elsevier, vol. 55(3), pages 1215-1225, March.
    11. Todorov, Valentin & Filzmoser, Peter, 2009. "An Object-Oriented Framework for Robust Multivariate Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 32(i03).
    12. Roland Fried & Herold Dehling, 2011. "Robust nonparametric tests for the two-sample location problem," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 20(4), pages 409-422, November.
    13. Wang, Li & Xu, Xingzhong, 2012. "Step-up procedure controlling generalized family-wise error rate," Statistics & Probability Letters, Elsevier, vol. 82(4), pages 775-782.
    14. Leonid Hanin, 2021. "Cavalier Use of Inferential Statistics Is a Major Source of False and Irreproducible Scientific Findings," Mathematics, MDPI, vol. 9(6), pages 1-13, March.
    15. Debashis Ghosh, 2006. "Shrunken p-Values for Assessing Differential Expression with Applications to Genomic Data Analysis," Biometrics, The International Biometric Society, vol. 62(4), pages 1099-1106, December.
    16. Wang, Li, 2022. "New testing procedures with k-FWER control for discrete data," Statistics & Probability Letters, Elsevier, vol. 180(C).
    17. Li Wang, 2019. "Weighted multiple testing procedure for grouped hypotheses with k-FWER control," Computational Statistics, Springer, vol. 34(2), pages 885-909, June.
    18. repec:jss:jstsof:32:i03 is not listed on IDEAS
    19. G�nther Fink & Margaret McConnell & Sebastian Vollmer, 2014. "Testing for heterogeneous treatment effects in experimental data: false discovery risks and correction procedures," Journal of Development Effectiveness, Taylor & Francis Journals, vol. 6(1), pages 44-57, January.
    20. Salvatore Ingrassia & Simona Minotti & Giorgio Vittadini, 2012. "Local Statistical Modeling via a Cluster-Weighted Approach with Elliptical Distributions," Journal of Classification, Springer;The Classification Society, vol. 29(3), pages 363-401, October.
    21. Irene Castro-Conde & Jacobo Uña-Álvarez, 2015. "Power, FDR and conservativeness of BB-SGoF method," Computational Statistics, Springer, vol. 30(4), pages 1143-1161, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:55:y:2011:i:1:p:544-553. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.