IDEAS home Printed from https://ideas.repec.org/a/bla/jorssa/v185y2022i2p543-565.html
   My bibliography  Save this article

Asymptotic theory of principal component analysis for time series data with cautionary comments

Author

Listed:
  • Xinyu Zhang
  • Howell Tong

Abstract

Principal component analysis (PCA) is a most frequently used statistical tool in almost all branches of data science. However, like many other statistical tools, there is sometimes the risk of misuse or even abuse. In this paper, we highlight possible pitfalls in using the theoretical results of PCA based on the assumption of independent data when the data are time series. For the latter, we state with proof a central limit theorem of the eigenvalues and eigenvectors (loadings), give direct and bootstrap estimation of their asymptotic covariances, and assess their efficacy via simulation. Specifically, we pay attention to the proportion of variation, which decides the number of principal components (PCs), and the loadings, which help interpret the meaning of PCs. Our findings are that while the proportion of variation is quite robust to different dependence assumptions, the inference of PC loadings requires careful attention. We initiate and conclude our investigation with an empirical example on portfolio management, in which the PC loadings play a prominent role. It is given as a paradigm of correct usage of PCA for time series data.

Suggested Citation

  • Xinyu Zhang & Howell Tong, 2022. "Asymptotic theory of principal component analysis for time series data with cautionary comments," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(2), pages 543-565, April.
  • Handle: RePEc:bla:jorssa:v:185:y:2022:i:2:p:543-565
    DOI: 10.1111/rssa.12793
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/rssa.12793
    Download Restriction: no

    File URL: https://libkey.io/10.1111/rssa.12793?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. A. Azzalini & A. Capitanio, 1999. "Statistical applications of the multivariate skew normal distribution," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(3), pages 579-602.
    2. Giraitis, Liudas & Robinson, Peter M., 2001. "Whittle Estimation Of Arch Models," Econometric Theory, Cambridge University Press, vol. 17(3), pages 608-631, June.
    3. Taniguchi, M. & Krishnaiah, P. R., 1987. "Asymptotic distributions of functions of the eigenvalues of sample covariance matrix and canonical correlation matrix in multivariate time series," Journal of Multivariate Analysis, Elsevier, vol. 22(1), pages 156-176, June.
    4. repec:ebl:ecbull:v:7:y:2004:i:3:p:1-10 is not listed on IDEAS
    5. Xiaofeng Shao, 2009. "Confidence intervals for spectral mean and ratio statistics," Biometrika, Biometrika Trust, vol. 96(1), pages 107-117.
    6. Giraitis, Liudas & Robinson, Peter M., 2001. "Whittle estimation of ARCH models," LSE Research Online Documents on Economics 316, London School of Economics and Political Science, LSE Library.
    7. B. Ahamad, 1967. "An Analysis of Crimes by the Method of Principal Components," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 16(1), pages 17-35, March.
    8. M. Hossein Partovi & Michael Caputo, 2004. "Principal Portfolios: Recasting the Efficient Frontier," Economics Bulletin, AccessEcon, vol. 7(3), pages 1-10.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Thu K. Hoang & Klarizze Anne Martin Puzon & Hoai Thi Thu Dang & Rachel M. Gisselquist, 2024. "Inequality and institutional outcomes in Viet Nam: A combined principal components and clustering analysis," WIDER Working Paper Series wp-2024-38, World Institute for Development Economic Research (UNU-WIDER).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zhang, Xinyu & Tong, Howell, 2022. "Asymptotic theory of principal component analysis for time series data with cautionary comments," LSE Research Online Documents on Economics 113566, London School of Economics and Political Science, LSE Library.
    2. Diongue Abdou Ka & Dominique Guegan, 2008. "Estimation of k-Factor Gigarch Process: A Monte Carlo Study," Post-Print halshs-00375758, HAL.
    3. Dominique Guegan & Bertrand K. Hassani, 2019. "Risk Measurement," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) halshs-02119256, HAL.
    4. Peter M Robinson & Paolo Zaffaroni, 2005. "Pseudo-Maximum Likelihood Estimation of ARCH(8) Models," STICERD - Econometrics Paper Series 495, Suntory and Toyota International Centres for Economics and Related Disciplines, LSE.
    5. Abdou Kâ Diongue & Dominique Guegan, 2008. "Estimation of k-factor GIGARCH process : a Monte Carlo study," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) halshs-00235179, HAL.
    6. Wei, Honglei & Zhang, Hongfan & Jiang, Hui & Huang, Lei, 2022. "On the semi-varying coefficient dynamic panel data model with autocorrelated errors," Computational Statistics & Data Analysis, Elsevier, vol. 173(C).
    7. Robinson, Peter M. & Zaffaroni, Paolo, 2005. "Pseudo-maximum likelihood estimation of ARCH(∞) models," LSE Research Online Documents on Economics 58182, London School of Economics and Political Science, LSE Library.
    8. Giraitis, Liudas & Leipus, Remigijus & Robinson, Peter M. & Surgailis, Donatas, 2003. "LARCH, leverage and long memory," LSE Research Online Documents on Economics 2020, London School of Economics and Political Science, LSE Library.
    9. Abdou Kâ Diongue & Dominique Guegan, 2004. "Estimating parameters for a k-GIGARCH process," Post-Print halshs-00188531, HAL.
    10. repec:hum:wpaper:sfb649dp2006-033 is not listed on IDEAS
    11. Zaffaroni, Paolo, 2009. "Whittle estimation of EGARCH and other exponential volatility models," Journal of Econometrics, Elsevier, vol. 151(2), pages 190-200, August.
    12. Polzehl, Jörg & Spokoiny, Vladimir, 2006. "Varying coefficient GARCH versus local constant volatility modeling: Comparison of the predictive power," SFB 649 Discussion Papers 2006-033, Humboldt University Berlin, Collaborative Research Center 649: Economic Risk.
    13. Tata Subba Rao & Granville Tunnicliffe Wilson & Joao Jesus & Richard E. Chandler, 2017. "Inference with the Whittle Likelihood: A Tractable Approach Using Estimating Functions," Journal of Time Series Analysis, Wiley Blackwell, vol. 38(2), pages 204-224, March.
    14. Todd Prono, 2016. "Closed-Form Estimation of Finite-Order ARCH Models: Asymptotic Theory and Finite-Sample Performance," Finance and Economics Discussion Series 2016-083, Board of Governors of the Federal Reserve System (U.S.).
    15. Robinson, Peter M. & Zafaroni, Paolo, 2005. "Pseudo-maximum likelihood estimation of ARCH models," LSE Research Online Documents on Economics 4544, London School of Economics and Political Science, LSE Library.
    16. Xuejie Feng & Chiping Zhang, 2020. "A Perturbation Method to Optimize the Parameters of Autoregressive Conditional Heteroscedasticity Model," Computational Economics, Springer;Society for Computational Economics, vol. 55(3), pages 1021-1044, March.
    17. Giraitis, Liudas & Leipus, Remigijus & Robinson, Peter M. & Surgailis, Donatas, 2004. "LARCH, leverage, and long memory," LSE Research Online Documents on Economics 294, London School of Economics and Political Science, LSE Library.
    18. Huang, Da & Wang, Hansheng & Yao, Qiwei, 2008. "Estimating GARCH models: when to use what?," LSE Research Online Documents on Economics 5398, London School of Economics and Political Science, LSE Library.
    19. Royer, Julien, 2021. "Conditional asymmetry in Power ARCH($\infty$) models," MPRA Paper 109118, University Library of Munich, Germany.
    20. Liudas Giraitis & Remigijus Leipus & Peter M Robinson & Donatas Surgailis, 2003. "LARCH, Leverage and Long Memory," STICERD - Econometrics Paper Series 460, Suntory and Toyota International Centres for Economics and Related Disciplines, LSE.
    21. Jean-Marc Bardet & Paul Doukhan & José Rafael Leon_, 2005. "Uniform Limit Theorems for the Integrated Periodogram of Weakly Dependent Time Series and their Applications to Whittle's Estimate," Working Papers 2005-46, Center for Research in Economics and Statistics.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssa:v:185:y:2022:i:2:p:543-565. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.