IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v54y2010i12p2967-2975.html
   My bibliography  Save this article

Detecting influential observations in principal components and common principal components

Author

Listed:
  • Boente, Graciela
  • Pires, Ana M.
  • Rodrigues, Isabel M.

Abstract

Detecting outlying observations is an important step in any analysis, even when robust estimates are used. In particular, the robustified Mahalanobis distance is a natural measure of outlyingness if one focuses on ellipsoidal distributions. However, it is well known that the asymptotic chi-square approximation for the cutoff value of the Mahalanobis distance based on several robust estimates (like the minimum volume ellipsoid, the minimum covariance determinant and the S-estimators) is not adequate for detecting atypical observations in small samples from the normal distribution. In the multi-population setting and under a common principal components model, aggregated measures based on standardized empirical influence functions are used to detect observations with a significant impact on the estimators. As in the one-population setting, the cutoff values obtained from the asymptotic distribution of those aggregated measures are not adequate for small samples. More appropriate cutoff values, adapted to the sample sizes, can be computed by using a cross-validation approach. Cutoff values obtained from a Monte Carlo study using S-estimators are provided for illustration. A real data set is also analyzed.

Suggested Citation

  • Boente, Graciela & Pires, Ana M. & Rodrigues, Isabel M., 2010. "Detecting influential observations in principal components and common principal components," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 2967-2975, December.
  • Handle: RePEc:eee:csdana:v:54:y:2010:i:12:p:2967-2975
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167-9473(10)00002-2
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Boente, Graciela & Pires, Ana M. & Rodrigues, Isabel M., 2006. "General projection-pursuit estimators for the common principal components model: influence functions and Monte Carlo study," Journal of Multivariate Analysis, Elsevier, vol. 97(1), pages 124-147, January.
    2. Hubert, Mia & Rousseeuw, Peter & Verdonck, Tim, 2009. "Robust PCA for skewed data and its outlier map," Computational Statistics & Data Analysis, Elsevier, vol. 53(6), pages 2264-2274, April.
    3. Becker, Claudia & Gather, Ursula, 2001. "The largest nonidentifiable outlier: a comparison of multivariate simultaneous outlier identification rules," Computational Statistics & Data Analysis, Elsevier, vol. 36(1), pages 119-127, March.
    4. Filzmoser, Peter & Maronna, Ricardo & Werner, Mark, 2008. "Outlier identification in high dimensions," Computational Statistics & Data Analysis, Elsevier, vol. 52(3), pages 1694-1711, January.
    5. Graciela Boente, 2002. "Influence functions and outlier detection under the common principal components model: A robust approach," Biometrika, Biometrika Trust, vol. 89(4), pages 861-875, December.
    6. Chen, Tao & Martin, Elaine & Montague, Gary, 2009. "Robust probabilistic PCA with missing data and contribution analysis for outlier detection," Computational Statistics & Data Analysis, Elsevier, vol. 53(10), pages 3706-3716, August.
    7. Serneels, Sven & Verdonck, Tim, 2008. "Principal component analysis for data containing outliers and missing elements," Computational Statistics & Data Analysis, Elsevier, vol. 52(3), pages 1712-1727, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ausloos, Marcel & Cerqueti, Roy & Bartolacci, Francesca & Castellano, Nicola G., 2018. "SME investment best strategies. Outliers for assessing how to optimize performance," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 509(C), pages 754-765.
    2. Bali, Juan Lucas & Boente, Graciela, 2015. "Influence function of projection-pursuit principal components for functional data," Journal of Multivariate Analysis, Elsevier, vol. 133(C), pages 173-199.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Debruyne, Michiel & Hubert, Mia & Van Horebeek, Johan, 2010. "Detecting influential observations in Kernel PCA," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3007-3019, December.
    2. Graciela Boente & Frank Critchley & Liliana Orellana, 2007. "Influence functions of two families of robust estimators under proportional scatter matrices," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 15(3), pages 295-327, February.
    3. Bianco, Ana & Boente, Graciela & Pires, Ana M. & Rodrigues, Isabel M., 2008. "Robust discrimination under a hierarchy on the scatter matrices," Journal of Multivariate Analysis, Elsevier, vol. 99(6), pages 1332-1357, July.
    4. Boente, Graciela & Molina, Julieta & Sued, Mariela, 2010. "On the asymptotic behavior of general projection-pursuit estimators under the common principal components model," Statistics & Probability Letters, Elsevier, vol. 80(3-4), pages 228-235, February.
    5. Luca Bagnato & Antonio Punzo, 2021. "Unconstrained representation of orthogonal matrices with application to common principal components," Computational Statistics, Springer, vol. 36(2), pages 1177-1195, June.
    6. Marc Hallin & Davy Paindaveine & Thomas Verdebout, 2014. "Efficient R-Estimation of Principal and Common Principal Components," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1071-1083, September.
    7. Rob J. Hyndman & Han Lin Shang, 2008. "Rainbow plots, Bagplots and Boxplots for Functional Data," Monash Econometrics and Business Statistics Working Papers 9/08, Monash University, Department of Econometrics and Business Statistics.
    8. Graciela Boente & Frank Critchley & Liliana Orellana, 2007. "Influence functions of two families of robust estimators under proportional scatter matrices," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 15(3), pages 295-327, February.
    9. Paindaveine, Davy & Rasoafaraniaina, Rondrotiana Joséa & Verdebout, Thomas, 2017. "Preliminary test estimation for multi-sample principal components," Econometrics and Statistics, Elsevier, vol. 2(C), pages 106-116.
    10. G. Zioutas & C. Chatzinakos & T. D. Nguyen & L. Pitsoulis, 2017. "Optimization techniques for multivariate least trimmed absolute deviation estimation," Journal of Combinatorial Optimization, Springer, vol. 34(3), pages 781-797, October.
    11. Bali, Juan Lucas & Boente, Graciela, 2015. "Influence function of projection-pursuit principal components for functional data," Journal of Multivariate Analysis, Elsevier, vol. 133(C), pages 173-199.
    12. Jürgen Wellmann & Ursula Gather, 2003. "Identification of outliers in a one-way random effects model," Statistical Papers, Springer, vol. 44(3), pages 335-348, July.
    13. Thomas Triebs & Subal C. Kumbhakar, 2012. "Management Practice in Production," ifo Working Paper Series 129, ifo Institute - Leibniz Institute for Economic Research at the University of Munich.
    14. David E. Tyler & Frank Critchley & Lutz Dümbgen & Hannu Oja, 2009. "Invariant co‐ordinate selection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(3), pages 549-592, June.
    15. Graciela Boente & Ana Pires & Isabel Rodrigues, 2008. "Estimators for the common principal components model based on reweighting: influence functions and Monte Carlo study," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 67(2), pages 189-218, March.
    16. Václav Plevka & Pieter Segaert & Chris M. J. Tampère & Mia Hubert, 2016. "Analysis of travel activity determinants using robust statistics," Transportation, Springer, vol. 43(6), pages 979-996, November.
    17. Dorota Toczydlowska & Gareth W. Peters, 2018. "Financial Big Data Solutions for State Space Panel Regression in Interest Rate Dynamics," Econometrics, MDPI, vol. 6(3), pages 1-45, July.
    18. Frahm, Gabriel & Jaekel, Uwe, 2010. "A generalization of Tyler's M-estimators to the case of incomplete data," Computational Statistics & Data Analysis, Elsevier, vol. 54(2), pages 374-393, February.
    19. Kirschstein, Thomas & Liebscher, Steffen & Becker, Claudia, 2013. "Robust estimation of location and scatter by pruning the minimum spanning tree," Journal of Multivariate Analysis, Elsevier, vol. 120(C), pages 173-184.
    20. García-Escudero, L.A. & Gordaliza, A. & Mayo-Iscar, A. & San Martín, R., 2010. "Robust clusterwise linear regression through trimming," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3057-3069, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:54:y:2010:i:12:p:2967-2975. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.