IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v55y2011i1p752-764.html
   My bibliography  Save this article

A new and practical influence measure for subsets of covariance matrix sample principal components with applications to high dimensional datasets

Author

Listed:
  • Prendergast, Luke A.
  • Li Wai Suen, Connie

Abstract

Principal Component Analysis (PCA) is an important tool in multivariate analysis, in particular when faced with high dimensional data. There has been much done with regard to sensitivity analysis and the development of influence diagnostics for the eigenvector estimators that define the sample principal components. However, little, if any, has been done in this setting with regard to the sample principal components themselves. In this paper we develop a sensitivity measure for principal components associated with the covariance matrix that is very much related to the influence function (Hampel, 1974). This influence measure is based on the average squared canonical correlation and differs from the existing measures in that it assesses the influence of certain observational types on the sample principal components. We use this measure to derive an influence diagnostic that satisfies two key criteria being (i) it detects influential observations with respect to subsets of sample principal components and (ii) is efficient to calculate even in high dimensions. We use several microarray datasets to show that our measure satisfies both criteria.

Suggested Citation

  • Prendergast, Luke A. & Li Wai Suen, Connie, 2011. "A new and practical influence measure for subsets of covariance matrix sample principal components with applications to high dimensional datasets," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 752-764, January.
  • Handle: RePEc:eee:csdana:v:55:y:2011:i:1:p:752-764
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167-9473(10)00267-7
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Harrison, David Jr. & Rubinfeld, Daniel L., 1978. "Hedonic housing prices and the demand for clean air," Journal of Environmental Economics and Management, Elsevier, vol. 5(1), pages 81-102, March.
    2. Ash A. Alizadeh & Michael B. Eisen & R. Eric Davis & Chi Ma & Izidore S. Lossos & Andreas Rosenwald & Jennifer C. Boldrick & Hajeer Sabet & Truc Tran & Xin Yu & John I. Powell & Liming Yang & Gerald E, 2000. "Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling," Nature, Nature, vol. 403(6769), pages 503-511, February.
    3. J. Ramsay & Jos Berge & G. Styan, 1984. "Matrix correlation," Psychometrika, Springer;The Psychometric Society, vol. 49(3), pages 403-423, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jacques Bénasséni, 2018. "A correction of approximations used in sensitivity study of principal component analysis," Computational Statistics, Springer, vol. 33(4), pages 1939-1955, December.
    2. Prendergast, Luke A. & Smith, Jodie A., 2022. "Influence functions for linear discriminant analysis: Sensitivity analysis and efficient influence diagnostics," Journal of Multivariate Analysis, Elsevier, vol. 190(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lucija Muehlenbachs & Elisheba Spiller & Christopher Timmins, 2015. "The Housing Market Impacts of Shale Gas Development," American Economic Review, American Economic Association, vol. 105(12), pages 3633-3659, December.
    2. Jianhong Shi & Qian Yang & Xiongya Li & Weixing Song, 2017. "Effects of measurement error on a class of single-index varying coefficient regression models," Computational Statistics, Springer, vol. 32(3), pages 977-1001, September.
    3. Sewell, Daniel K., 2018. "Visualizing data through curvilinear representations of matrices," Computational Statistics & Data Analysis, Elsevier, vol. 128(C), pages 255-270.
    4. Smith, Michael & Kohn, Robert, 1996. "Nonparametric regression using Bayesian variable selection," Journal of Econometrics, Elsevier, vol. 75(2), pages 317-343, December.
    5. Villalonga, Belen, 2004. "Intangible resources, Tobin's q, and sustainability of performance differences," Journal of Economic Behavior & Organization, Elsevier, vol. 54(2), pages 205-230, June.
    6. Brockmeier, M., 1991. "Entwicklung und Aufhebung von Reinheitsgeboten im Nahrungsmittelbereich – Analyse und Bewertung," Proceedings “Schriften der Gesellschaft für Wirtschafts- und Sozialwissenschaften des Landbaues e.V.”, German Association of Agricultural Economists (GEWISOLA), vol. 27.
    7. Miles M Finney, 2017. "Air Quality and the Development of Los Angeles," The Review of Regional Studies, Southern Regional Science Association, vol. 47(3), pages 271-288, Fall.
    8. M. Moghadam & K. Aminian & M. Asghari & M. Parnianpour, 2013. "How well do the muscular synergies extracted via non-negative matrix factorisation explain the variation of torque at shoulder joint?," Computer Methods in Biomechanics and Biomedical Engineering, Taylor & Francis Journals, vol. 16(3), pages 291-301.
    9. Terri Menke, 1987. "Economic Welfare and Urban Amenities Across Race-Sex Groups," Urban Studies, Urban Studies Journal Limited, vol. 24(2), pages 151-161, April.
    10. Suneel Babu Chatla, 2023. "Nonparametric inference for additive models estimated via simplified smooth backfitting," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 75(1), pages 71-97, February.
    11. Miller, Steve & Startz, Richard, 2019. "Feasible generalized least squares using support vector regression," Economics Letters, Elsevier, vol. 175(C), pages 28-31.
    12. Chunfang Zhao & Yingliang Wu & Yunfeng Chen & Guohua Chen, 2023. "Multiscale Effects of Hedonic Attributes on Airbnb Listing Prices Based on MGWR: A Case Study of Beijing, China," Sustainability, MDPI, vol. 15(2), pages 1-21, January.
    13. Umberto Amato & Anestis Antoniadis & Italia De Feis & Irene Gijbels, 2021. "Penalised robust estimators for sparse and high-dimensional linear models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(1), pages 1-48, March.
    14. repec:asg:wpaper:1006 is not listed on IDEAS
    15. Tizheng Li & Xiaojuan Kang, 2022. "Variable selection of higher-order partially linear spatial autoregressive model with a diverging number of parameters," Statistical Papers, Springer, vol. 63(1), pages 243-285, February.
    16. Deac Dan Stelian & Schebesch Klaus Bruno, 2018. "Market Forecasts and Client Behavioral Data: Towards Finding Adequate Model Complexity," Studia Universitatis „Vasile Goldis” Arad – Economics Series, Sciendo, vol. 28(3), pages 50-75, September.
    17. James Hansen & James McDonald & Panayiotis Theodossiou & Brad Larsen, 2010. "Partially Adaptive Econometric Methods For Regression and Classification," Computational Economics, Springer;Society for Computational Economics, vol. 36(2), pages 153-169, August.
    18. Tang, Yanlin & Song, Xinyuan & Wang, Huixia Judy & Zhu, Zhongyi, 2013. "Variable selection in high-dimensional quantile varying coefficient models," Journal of Multivariate Analysis, Elsevier, vol. 122(C), pages 115-132.
    19. Kenneth Y. Chay & Michael Greenstone, 2005. "Does Air Quality Matter? Evidence from the Housing Market," Journal of Political Economy, University of Chicago Press, vol. 113(2), pages 376-424, April.
    20. Juan Ignacio Zoloa, 2020. "Noise pollution and housing markets: A spatial hedonic analysis for La Plata City," Ensayos de Política Económica, Departamento de Investigación Francisco Valsecchi, Facultad de Ciencias Económicas, Pontificia Universidad Católica Argentina., vol. 3(2), pages 129-152, Octubre.
    21. Cheng, Tsung-Chi, 2012. "On simultaneously identifying outliers and heteroscedasticity without specific form," Computational Statistics & Data Analysis, Elsevier, vol. 56(7), pages 2258-2272.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:55:y:2011:i:1:p:752-764. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.