IDEAS home Printed from https://ideas.repec.org/a/taf/gnstxx/v25y2013i2p447-461.html
   My bibliography  Save this article

Computationally easy outlier detection via projection pursuit with finitely many directions

Author

Listed:
  • Robert Serfling
  • Satyaki Mazumder

Abstract

Outlier detection is fundamental to data analysis. Desirable properties are affine invariance, robustness, low computational burden, and nonimposition of elliptical contours. However, leading methods fail to possess all of these features. The Mahalanobis distance outlyingness (MD) imposes elliptical contours. The projection outlyingness, powerfully involving projections of the data onto all univariate directions, is highly computationally intensive. Computationally easy variants using projection pursuit with but finitely many directions have been introduced, but these fail to capture at once the other desired properties. Here, we develop a 'robust Mahalanobis spatial outlyingness on projections' (RMSP) function, which indeed satisfies all the four desired properties. Pre-transformation to a strong invariant coordinate system yields affine invariance, 'spatial trimming' yields robustness, and 'spatial Mahalanobis outlyingness' is used to obtain computational ease and smooth, unconstrained contours. From empirical study using artificial and actual data, our findings are that SUP is outclassed by MD and RMSP, that MD and RMSP are competitive, and that RMSP is especially advantageous in describing the intermediate outlyingness structure when elliptical contours are not assumed.

Suggested Citation

  • Robert Serfling & Satyaki Mazumder, 2013. "Computationally easy outlier detection via projection pursuit with finitely many directions," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 25(2), pages 447-461, June.
  • Handle: RePEc:taf:gnstxx:v:25:y:2013:i:2:p:447-461
    DOI: 10.1080/10485252.2013.766335
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/10485252.2013.766335
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/10485252.2013.766335?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Filzmoser, Peter & Maronna, Ricardo & Werner, Mark, 2008. "Outlier identification in high dimensions," Computational Statistics & Data Analysis, Elsevier, vol. 52(3), pages 1694-1711, January.
    2. Lutz Dümbgen & David E. Tyler, 2005. "On the Breakdown Properties of Some Multivariate M‐Functionals," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 32(2), pages 247-264, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Lin Xi & Liangxing Jin & Yujie Ji & Pingting Liu & Junjie Wei, 2024. "Prediction of Ultimate Bearing Capacity of Soil–Cement Mixed Pile Composite Foundation Using SA-IRMO-BPNN Model," Mathematics, MDPI, vol. 12(11), pages 1-24, May.
    2. P. Navarro-Esteban & J. A. Cuesta-Albertos, 2021. "High-dimensional outlier detection using random projections," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(4), pages 908-934, December.
    3. Loperfido, Nicola, 2018. "Skewness-based projection pursuit: A computational approach," Computational Statistics & Data Analysis, Elsevier, vol. 120(C), pages 42-57.
    4. Wang, Shanshan & Serfling, Robert, 2018. "On masking and swamping robustness of leading nonparametric outlier identifiers for multivariate data," Journal of Multivariate Analysis, Elsevier, vol. 166(C), pages 32-49.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. G. Zioutas & C. Chatzinakos & T. D. Nguyen & L. Pitsoulis, 2017. "Optimization techniques for multivariate least trimmed absolute deviation estimation," Journal of Combinatorial Optimization, Springer, vol. 34(3), pages 781-797, October.
    2. Roelant, E. & Van Aelst, S. & Croux, C., 2009. "Multivariate generalized S-estimators," Journal of Multivariate Analysis, Elsevier, vol. 100(5), pages 876-887, May.
    3. Junlong Zhao & Chao Liu & Lu Niu & Chenlei Leng, 2019. "Multiple influential point detection in high dimensional regression spaces," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 81(2), pages 385-408, April.
    4. Van Aelst, S. & Vandervieren, E. & Willems, G., 2012. "A Stahel–Donoho estimator based on huberized outlyingness," Computational Statistics & Data Analysis, Elsevier, vol. 56(3), pages 531-542.
    5. C. Croux & C. Dehon & A. Yadine, 2010. "The k-step spatial sign covariance matrix," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 4(2), pages 137-150, September.
    6. Chung, Hee Cheol & Ahn, Jeongyoun, 2021. "Subspace rotations for high-dimensional outlier detection," Journal of Multivariate Analysis, Elsevier, vol. 183(C).
    7. Jan Kalina & Jan Tichavský, 2022. "The minimum weighted covariance determinant estimator for high-dimensional data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(4), pages 977-999, December.
    8. P. Navarro-Esteban & J. A. Cuesta-Albertos, 2021. "High-dimensional outlier detection using random projections," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(4), pages 908-934, December.
    9. D. Rosadi & P. Filzmoser, 2019. "Robust second-order least-squares estimation for regression models with autoregressive errors," Statistical Papers, Springer, vol. 60(1), pages 105-122, February.
    10. Boente, Graciela & Pires, Ana M. & Rodrigues, Isabel M., 2010. "Detecting influential observations in principal components and common principal components," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 2967-2975, December.
    11. Jack Jewson & David Rossell, 2022. "General Bayesian loss function selection and the use of improper models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(5), pages 1640-1665, November.
    12. Hallin Marc & Paindaveine Davy, 2006. "Parametric and semiparametric inference for shape: the role of the scale functional," Statistics & Risk Modeling, De Gruyter, vol. 24(3), pages 327-350, December.
    13. Seija Sirkiä & Sara Taskinen & Hannu Oja & David Tyler, 2009. "Tests and estimates of shape based on spatial signs and ranks," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 21(2), pages 155-176.
    14. Erkuş, Ekin Can & Purutçuoğlu, Vilda, 2021. "Outlier detection and quasi-periodicity optimization algorithm: Frequency domain based outlier detection (FOD)," European Journal of Operational Research, Elsevier, vol. 291(2), pages 560-574.
    15. Paindaveine, Davy, 2008. "A canonical definition of shape," Statistics & Probability Letters, Elsevier, vol. 78(14), pages 2240-2247, October.
    16. Paindaveine, Davy & Van Bever, Germain, 2014. "Inference on the shape of elliptical distributions based on the MCD," Journal of Multivariate Analysis, Elsevier, vol. 129(C), pages 125-144.
    17. Cerioli, Andrea & Farcomeni, Alessio, 2011. "Error rates for multivariate outlier detection," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 544-553, January.
    18. Šárka Brodinová & Peter Filzmoser & Thomas Ortner & Christian Breiteneder & Maia Rohm, 2019. "Robust and sparse k-means clustering for high-dimensional data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(4), pages 905-932, December.
    19. Sirkiä, Seija & Taskinen, Sara & Oja, Hannu, 2007. "Symmetrised M-estimators of multivariate scatter," Journal of Multivariate Analysis, Elsevier, vol. 98(8), pages 1611-1629, September.
    20. Taskinen, Sara & Koch, Inge & Oja, Hannu, 2012. "Robustifying principal component analysis with spatial sign vectors," Statistics & Probability Letters, Elsevier, vol. 82(4), pages 765-774.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:gnstxx:v:25:y:2013:i:2:p:447-461. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/GNST20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.