IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v6y2018i11p269-d184261.html
   My bibliography  Save this article

Identifying the Informational/Signal Dimension in Principal Component Analysis

Author

Listed:
  • Sergio Camiz

    (Dipartimento di Matematica, Sapienza Università di Roma, 00185 Roma, Italy)

  • Valério D. Pillar

    (Departamento de Ecologia, Universidade Federal do Rio Grande do Sul, 91501-970 Porto Alegre, Brazil)

Abstract

The identification of a reduced dimensional representation of the data is among the main issues of exploratory multidimensional data analysis and several solutions had been proposed in the literature according to the method. Principal Component Analysis ( PCA ) is the method that has received the largest attention thus far and several identification methods—the so-called stopping rules —have been proposed, giving very different results in practice, and some comparative study has been carried out. Some inconsistencies in the previous studies led us to try to fix the distinction between signal from noise in PCA —and its limits—and propose a new testing method. This consists in the production of simulated data according to a predefined eigenvalues structure, including zero-eigenvalues. From random populations built according to several such structures, reduced-size samples were extracted and to them different levels of random normal noise were added. This controlled introduction of noise allows a clear distinction between expected signal and noise, the latter relegated to the non-zero eigenvalues in the samples corresponding to zero ones in the population. With this new method, we tested the performance of ten different stopping rules. Of every method, for every structure and every noise, both power (the ability to correctly identify the expected dimension) and type-I error (the detection of a dimension composed only by noise) have been measured, by counting the relative frequencies in which the smallest non-zero eigenvalue in the population was recognized as signal in the samples and that in which the largest zero-eigenvalue was recognized as noise, respectively. This way, the behaviour of the examined methods is clear and their comparison/evaluation is possible. The reported results show that both the generalization of the Bartlett’s test by Rencher and the Bootstrap method by Pillar result much better than all others: both are accounted for reasonable power, decreasing with noise, and very good type-I error. Thus, more than the others, these methods deserve being adopted.

Suggested Citation

  • Sergio Camiz & Valério D. Pillar, 2018. "Identifying the Informational/Signal Dimension in Principal Component Analysis," Mathematics, MDPI, vol. 6(11), pages 1-16, November.
  • Handle: RePEc:gam:jmathe:v:6:y:2018:i:11:p:269-:d:184261
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/6/11/269/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/6/11/269/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Louis Guttman, 1954. "Some necessary conditions for common-factor analysis," Psychometrika, Springer;The Psychometric Society, vol. 19(2), pages 149-161, June.
    2. Li, Baibing & Martin, Elaine B. & Morris, A. Julian, 2002. "On principal component analysis in L1," Computational Statistics & Data Analysis, Elsevier, vol. 40(3), pages 471-474, September.
    3. Ian T. Jolliffe, 1982. "A Note on the Use of Principal Components in Regression," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 31(3), pages 300-303, November.
    4. Dray, Stephane, 2008. "On the number of principal components: A test of dimensionality based on measurements of similarity between matrices," Computational Statistics & Data Analysis, Elsevier, vol. 52(4), pages 2228-2237, January.
    5. Peter Schönemann & Robert Carroll, 1970. "Fitting one matrix to another under choice of a central dilation and a rigid motion," Psychometrika, Springer;The Psychometric Society, vol. 35(2), pages 245-255, June.
    6. Peres-Neto, Pedro R. & Jackson, Donald A. & Somers, Keith M., 2005. "How many principal components? stopping rules for determining the number of non-trivial axes revisited," Computational Statistics & Data Analysis, Elsevier, vol. 49(4), pages 974-997, June.
    7. Carl Eckart & Gale Young, 1936. "The approximation of one matrix by another of lower rank," Psychometrika, Springer;The Psychometric Society, vol. 1(3), pages 211-218, September.
    8. P. Robert & Y. Escoufier, 1976. "A Unifying Tool for Linear Multivariate Statistical Methods: The RV‐Coefficient," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 25(3), pages 257-265, November.
    9. Josse, J. & Pagès, J. & Husson, F., 2008. "Testing the significance of the RV coefficient," Computational Statistics & Data Analysis, Elsevier, vol. 53(1), pages 82-91, September.
    10. I. T. Jolliffe, 1972. "Discarding Variables in a Principal Component Analysis. I: Artificial Data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 21(2), pages 160-173, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Michael Brusco & Renu Singh & Douglas Steinley, 2009. "Variable Neighborhood Search Heuristics for Selecting a Subset of Variables in Principal Component Analysis," Psychometrika, Springer;The Psychometric Society, vol. 74(4), pages 705-726, December.
    2. Dray, Stephane, 2008. "On the number of principal components: A test of dimensionality based on measurements of similarity between matrices," Computational Statistics & Data Analysis, Elsevier, vol. 52(4), pages 2228-2237, January.
    3. Psaradakis, Zacharias & Vávra, Marián, 2014. "On testing for nonlinearity in multivariate time series," Economics Letters, Elsevier, vol. 125(1), pages 1-4.
    4. Bauer, Jan O. & Drabant, Bernhard, 2021. "Principal loading analysis," Journal of Multivariate Analysis, Elsevier, vol. 184(C).
    5. Cumming, J.A. & Wooff, D.A., 2007. "Dimension reduction via principal variables," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 550-565, September.
    6. Archimbaud, Aurore & Nordhausen, Klaus & Ruiz-Gazen, Anne, 2018. "ICS for multivariate outlier detection with application to quality control," Computational Statistics & Data Analysis, Elsevier, vol. 128(C), pages 184-199.
    7. Brusco, Michael J., 2014. "A comparison of simulated annealing algorithms for variable selection in principal component analysis and discriminant analysis," Computational Statistics & Data Analysis, Elsevier, vol. 77(C), pages 38-53.
    8. Paola Zuccolotto, 2012. "Principal component analysis with interval imputed missing values," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 96(1), pages 1-23, January.
    9. Iyetomi, Hiroshi & Nakayama, Yasuhiro & Yoshikawa, Hiroshi & Aoyama, Hideaki & Fujiwara, Yoshi & Ikeda, Yuichi & Souma, Wataru, 2011. "What causes business cycles? Analysis of the Japanese industrial production data," Journal of the Japanese and International Economies, Elsevier, vol. 25(3), pages 246-272, September.
    10. Jolliffe, Ian, 2022. "A 50-year personal journey through time with principal component analysis," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    11. Josse, Julie & Husson, François, 2012. "Selecting the number of components in principal component analysis using cross-validation approximations," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1869-1879.
    12. Kelly P. Murillo & Eugenio M. Rocha, 2020. "Factors Influencing the Economic Behavior of the Food, Beverages and Tobacco Industry: A Case Study for Portuguese Enterprises," World Journal of Applied Economics, WERI-World Economic Research Institute, vol. 6(2), pages 99-121, December.
    13. Adele Ravagnani & Fabrizio Lillo & Paola Deriu & Piero Mazzarisi & Francesca Medda & Antonio Russo, 2024. "Dimensionality reduction techniques to support insider trading detection," Papers 2403.00707, arXiv.org, revised May 2024.
    14. Joy R. Petway & Yu-Pin Lin & Rainer F. Wunderlich, 2019. "Analyzing Opinions on Sustainable Agriculture: Toward Increasing Farmer Knowledge of Organic Practices in Taiwan-Yuanli Township," Sustainability, MDPI, vol. 11(14), pages 1-27, July.
    15. Cling, Jean-Pierre & Delecourt, Clément, 2022. "Interlinkages between the Sustainable Development Goals," World Development Perspectives, Elsevier, vol. 25(C).
    16. Oscar Claveria & Enric Monte & Salvador Torra, 2017. "A new approach for the quantification of qualitative measures of economic expectations," Quality & Quantity: International Journal of Methodology, Springer, vol. 51(6), pages 2685-2706, November.
    17. Rauf Ahmad, M., 2019. "A significance test of the RV coefficient in high dimensions," Computational Statistics & Data Analysis, Elsevier, vol. 131(C), pages 116-130.
    18. Marconi, Gabriele, 2014. "European higher education policies and the problem of estimating a complex model with a small cross-section," MPRA Paper 87600, University Library of Munich, Germany.
    19. Leise Kelli de Oliveira & Carla de Oliveira Leite Nascimento & Paulo Renato de Sousa & Paulo Tarso Vilela de Resende & Francisco Gildemir Ferreira da Silva, 2019. "Transport Service Provider Perception of Barriers and Urban Freight Policies in Brazil," Sustainability, MDPI, vol. 11(24), pages 1-17, December.
    20. Ard H. J. den Reijer & Pieter W. Otter & Jan P. A. M. Jacobs, 2024. "An heuristic scree plot criterion for the number of factors," Statistical Papers, Springer, vol. 65(6), pages 3991-4000, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:6:y:2018:i:11:p:269-:d:184261. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.