IDEAS home Printed from https://ideas.repec.org/a/spr/stpapr/v64y2023i2d10.1007_s00362-022-01319-7.html
   My bibliography  Save this article

A further study comparing forward search multivariate outlier methods including ATLA with an application to clustering

Author

Listed:
  • Brenton R. Clarke

    (Murdoch University)

  • Andrew Grose

    (Curtin University)

Abstract

This paper makes comparisons of automated procedures for robust multivariate outlier detection through discussion and simulation. In particular, automated procedures that use the forward search along with Mahalanobis distances to identify and classify multivariate outliers subject to predefined criteria are examined. Procedures utilizing a parametric model criterion based on a $$\chi ^2$$ χ 2 -distribution are among these, whereas the multivariate Adaptive Trimmed Likelihood Algorithm (ATLA) identifies outliers based on an objective function that is derived from the asymptotics of the location estimator assuming a multivariate normal distribution. Several criterion including size (false positive rate), sensitivity, and relative efficiency are canvassed. To illustrate relative efficiency in a multivariate setting in a new way, measures of variability of the multivariate location parameter when the underlying distribution is chosen from a multivariate generalization of the Tukey–Huber $$\epsilon $$ ϵ -contamination model are used. Mean slippage models are also entertained. The simulation results here are illuminating and demonstrate there is no broadly accepted procedure that outperforms in all situations, albeit one may ascertain circumstances for which a particular method may be best if implemented. Finally the paper explores graphical monitoring for existence of clusters and the potential of classification through occurrence of multiple minima in the objective function using ATLA.

Suggested Citation

  • Brenton R. Clarke & Andrew Grose, 2023. "A further study comparing forward search multivariate outlier methods including ATLA with an application to clustering," Statistical Papers, Springer, vol. 64(2), pages 395-420, April.
  • Handle: RePEc:spr:stpapr:v:64:y:2023:i:2:d:10.1007_s00362-022-01319-7
    DOI: 10.1007/s00362-022-01319-7
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00362-022-01319-7
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00362-022-01319-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Anthony C. Atkinson & Marco Riani & Andrea Cerioli, 2018. "Cluster detection and clustering with random start forward searches," Journal of Applied Statistics, Taylor & Francis Journals, vol. 45(5), pages 777-798, April.
    2. Garciga, Christian & Verbrugge, Randal, 2021. "Robust covariance matrix estimation and identification of unusual data points: New tools," Research in Economics, Elsevier, vol. 75(2), pages 176-202.
    3. Hadi, Ali S. & Luceno, Alberto, 1997. "Maximum trimmed likelihood estimators: a unified approach, examples, and algorithms," Computational Statistics & Data Analysis, Elsevier, vol. 25(3), pages 251-272, August.
    4. Andrea Cerioli & Alessio Farcomeni & Marco Riani, 2019. "Wild adaptive trimming for robust estimation and cluster analysis," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 46(1), pages 235-256, March.
    5. Andrea Cerioli & Marco Riani & Anthony C. Atkinson & Aldo Corbellini, 2018. "The power of monitoring: how to make the most of a contaminated multivariate sample," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 27(4), pages 559-587, December.
    6. Marco Riani & Anthony C. Atkinson & Andrea Cerioli, 2009. "Finding an unknown number of multivariate outliers," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(2), pages 447-466, April.
    7. Billor, Nedret & Hadi, Ali S. & Velleman, Paul F., 2000. "BACON: blocked adaptive computationally efficient outlier nominators," Computational Statistics & Data Analysis, Elsevier, vol. 34(3), pages 279-298, September.
    8. Peter Filzmoser & Anne Ruiz-Gazen & Christine Thomas-Agnan, 2014. "Identification of local multivariate outliers," Statistical Papers, Springer, vol. 55(1), pages 29-47, February.
    9. Elisa Cabana & Rosa E. Lillo & Henry Laniado, 2021. "Multivariate outlier detection based on a robust Mahalanobis distance with shrinkage estimators," Statistical Papers, Springer, vol. 62(4), pages 1583-1609, August.
    10. Anthony Atkinson & Marco Riani, 2004. "The forward search and data visualisation," Computational Statistics, Springer, vol. 19(1), pages 29-54, February.
    11. Cerioli, Andrea & Farcomeni, Alessio & Riani, Marco, 2014. "Strong consistency and robustness of the Forward Search estimator of multivariate location and scatter," Journal of Multivariate Analysis, Elsevier, vol. 126(C), pages 167-183.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Emily Groenewald & Gary Van Vuuren, 2024. "Visualisation of Mahalanobis Distances for Trivariate JOINT Distributions," International Journal of Economics and Financial Issues, Econjournals, vol. 14(2), pages 203-206, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Marco Riani & Anthony C. Atkinson & Andrea Cerioli & Aldo Corbellini, 2019. "Comments on: Data science, big data and statistics," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(2), pages 349-352, June.
    2. Torti, Francesca & Corbellini, Aldo & Atkinson, Anthony C., 2021. "fsdaSAS: a package for robust regression for very large datasets including the batch forward search," LSE Research Online Documents on Economics 109895, London School of Economics and Political Science, LSE Library.
    3. Alessio Farcomeni & Antonio Punzo, 2020. "Robust model-based clustering with mild and gross outliers," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 29(4), pages 989-1007, December.
    4. Francesca Torti & Aldo Corbellini & Anthony C. Atkinson, 2021. "fsdaSAS: A Package for Robust Regression for Very Large Datasets Including the Batch Forward Search," Stats, MDPI, vol. 4(2), pages 1-21, April.
    5. Riani, Marco & Atkinson, Anthony Curtis & Corbellini, Aldo & Farcomeni, Alessio & Laurini, Fabrizio, 2024. "Information Criteria for Outlier Detection Avoiding Arbitrary Significance Levels," Econometrics and Statistics, Elsevier, vol. 29(C), pages 189-205.
    6. Reiko Aoki & Juan P. M. Bustamante & Gilberto A. Paula, 2022. "Local influence diagnostics with forward search in regression analysis," Statistical Papers, Springer, vol. 63(5), pages 1477-1497, October.
    7. Andrea Cerioli & Marco Riani & Anthony C. Atkinson & Aldo Corbellini, 2018. "The power of monitoring: how to make the most of a contaminated multivariate sample," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 27(4), pages 559-587, December.
    8. Anthony C. Atkinson & Aldo Corbellini & Marco Riani, 2017. "Robust Bayesian regression with the forward search: theory and data analysis," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 26(4), pages 869-886, December.
    9. Pokojovy, Michael & Jobe, J. Marcus, 2022. "A robust deterministic affine-equivariant algorithm for multivariate location and scatter," Computational Statistics & Data Analysis, Elsevier, vol. 172(C).
    10. Arismendi, Juan C. & Broda, Simon, 2017. "Multivariate elliptical truncated moments," Journal of Multivariate Analysis, Elsevier, vol. 157(C), pages 29-44.
    11. Zuppiroli, Marco & Donati, Michele & Riani, Marco & Verga, Giovanni, 2015. "The Impact of Trading Activity in Agricultural Futures Markets," 2015 Fourth Congress, June 11-12, 2015, Ancona, Italy 207848, Italian Association of Agricultural and Applied Economics (AIEAA).
    12. Baishuai Zuo & Chuancun Yin & Jing Yao, 2023. "Multivariate range Value-at-Risk and covariance risk measures for elliptical and log-elliptical distributions," Papers 2305.09097, arXiv.org.
    13. Baishuai Zuo & Chuancun Yin, 2022. "Multivariate doubly truncated moments for generalized skew-elliptical distributions with application to multivariate tail conditional risk measures," Papers 2203.00839, arXiv.org.
    14. Alessio Farcomeni & Francesco Dotto, 2018. "The power of (extended) monitoring in robust clustering," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 27(4), pages 651-660, December.
    15. Grané, Aurea & Salini, Silvia & Verdolini, Elena, 2021. "Robust multivariate analysis for mixed-type data: Novel algorithm and its practical application in socio-economic research," Socio-Economic Planning Sciences, Elsevier, vol. 73(C).
    16. Atkinson, Anthony C. & Riani, Marco & Torti, Francesca, 2016. "Robust methods for heteroskedastic regression," Computational Statistics & Data Analysis, Elsevier, vol. 104(C), pages 209-222.
    17. Francesca Torti & Marco Riani & Gianluca Morelli, 2021. "Semiautomatic robust regression clustering of international trade data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(3), pages 863-894, September.
    18. Šárka Brodinová & Peter Filzmoser & Thomas Ortner & Christian Breiteneder & Maia Rohm, 2019. "Robust and sparse k-means clustering for high-dimensional data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(4), pages 905-932, December.
    19. Silvia Salini & Andrea Cerioli & Fabrizio Laurini & Marco Riani, 2016. "Reliable Robust Regression Diagnostics," International Statistical Review, International Statistical Institute, vol. 84(1), pages 99-127, April.
    20. Cappozzo, Andrea & Greselin, Francesca & Murphy, Thomas Brendan, 2021. "Robust variable selection for model-based learning in presence of adulteration," Computational Statistics & Data Analysis, Elsevier, vol. 158(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stpapr:v:64:y:2023:i:2:d:10.1007_s00362-022-01319-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.