IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2103.14626.html
   My bibliography  Save this paper

Divide-and-Conquer: A Distributed Hierarchical Factor Approach to Modeling Large-Scale Time Series Data

Author

Listed:
  • Zhaoxing Gao
  • Ruey S. Tsay

Abstract

This paper proposes a hierarchical approximate-factor approach to analyzing high-dimensional, large-scale heterogeneous time series data using distributed computing. The new method employs a multiple-fold dimension reduction procedure using Principal Component Analysis (PCA) and shows great promises for modeling large-scale data that cannot be stored nor analyzed by a single machine. Each computer at the basic level performs a PCA to extract common factors among the time series assigned to it and transfers those factors to one and only one node of the second level. Each 2nd-level computer collects the common factors from its subordinates and performs another PCA to select the 2nd-level common factors. This process is repeated until the central server is reached, which collects common factors from its direct subordinates and performs a final PCA to select the global common factors. The noise terms of the 2nd-level approximate factor model are the unique common factors of the 1st-level clusters. We focus on the case of 2 levels in our theoretical derivations, but the idea can easily be generalized to any finite number of hierarchies. We discuss some clustering methods when the group memberships are unknown and introduce a new diffusion index approach to forecasting. We further extend the analysis to unit-root nonstationary time series. Asymptotic properties of the proposed method are derived for the diverging dimension of the data in each computing unit and the sample size $T$. We use both simulated data and real examples to assess the performance of the proposed method in finite samples, and compare our method with the commonly used ones in the literature concerning the forecastability of extracted factors.

Suggested Citation

  • Zhaoxing Gao & Ruey S. Tsay, 2021. "Divide-and-Conquer: A Distributed Hierarchical Factor Approach to Modeling Large-Scale Time Series Data," Papers 2103.14626, arXiv.org.
  • Handle: RePEc:arx:papers:2103.14626
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2103.14626
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Forni, Mario & Hallin, Marc & Lippi, Marco & Reichlin, Lucrezia, 2005. "The Generalized Dynamic Factor Model: One-Sided Estimation and Forecasting," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 830-840, September.
    2. Stephen A. Ross, 2013. "The Arbitrage Theory of Capital Asset Pricing," World Scientific Book Chapters, in: Leonard C MacLean & William T Ziemba (ed.), HANDBOOK OF THE FUNDAMENTALS OF FINANCIAL DECISION MAKING Part I, chapter 1, pages 11-30, World Scientific Publishing Co. Pte. Ltd..
    3. Bai, Jushan & Ng, Serena, 2013. "Principal components estimation and identification of static factors," Journal of Econometrics, Elsevier, vol. 176(1), pages 18-29.
    4. Harris, David, 1997. "Principal Components Analysis of Cointegrated Time Series," Econometric Theory, Cambridge University Press, vol. 13(4), pages 529-557, February.
    5. Jiazhu Pan & Qiwei Yao, 2008. "Modelling multiple time series via common factors," Biometrika, Biometrika Trust, vol. 95(2), pages 365-379.
    6. Jianqing Fan & Yuan Liao & Martina Mincheva, 2013. "Large covariance estimation by thresholding principal orthogonal complements," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 75(4), pages 603-680, September.
    7. Lippi, Marco & Reichlin, Lucrezia & Hallin, Marc & Forni, Mario, 2000. "Reference Cycles: The NBER Methodology Revisited," CEPR Discussion Papers 2400, C.E.P.R. Discussion Papers.
    8. Jushan Bai & Serena Ng, 2002. "Determining the Number of Factors in Approximate Factor Models," Econometrica, Econometric Society, vol. 70(1), pages 191-221, January.
    9. Zhaoxing Gao & Ruey S. Tsay, 2020. "Modeling High-Dimensional Unit-Root Time Series," Papers 2005.03496, arXiv.org, revised Aug 2020.
    10. Fama, Eugene F. & French, Kenneth R., 2015. "A five-factor asset pricing model," Journal of Financial Economics, Elsevier, vol. 116(1), pages 1-22.
    11. Tomohiro Ando & Jushan Bai, 2017. "Clustering Huge Number of Financial Time Series: A Panel Data Approach With High-Dimensional Predictors and Factor Structures," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(519), pages 1182-1198, July.
    12. Alonso, Andrés M. & Galeano, Pedro & Peña, Daniel, 2020. "A robust procedure to build dynamic factor models with cluster structure," Journal of Econometrics, Elsevier, vol. 216(1), pages 35-52.
    13. Chamberlain, Gary & Rothschild, Michael, 1983. "Arbitrage, Factor Structure, and Mean-Variance Analysis on Large Asset Markets," Econometrica, Econometric Society, vol. 51(5), pages 1281-1304, September.
    14. Fama, Eugene F. & French, Kenneth R., 1993. "Common risk factors in the returns on stocks and bonds," Journal of Financial Economics, Elsevier, vol. 33(1), pages 3-56, February.
    15. Boivin, Jean & Ng, Serena, 2006. "Are more data always better for factor analysis?," Journal of Econometrics, Elsevier, vol. 132(1), pages 169-194, May.
    16. Caiado, Jorge & Crato, Nuno & Pena, Daniel, 2006. "A periodogram-based metric for time series classification," Computational Statistics & Data Analysis, Elsevier, vol. 50(10), pages 2668-2684, June.
    17. Gao, Zhaoxing & Ma, Yingying & Wang, Hansheng & Yao, Qiwei, 2019. "Banded spatio-temporal autoregressions," Journal of Econometrics, Elsevier, vol. 208(1), pages 211-230.
    18. Stock, James H & Watson, Mark W, 2002. "Macroeconomic Forecasting Using Diffusion Indexes," Journal of Business & Economic Statistics, American Statistical Association, vol. 20(2), pages 147-162, April.
    19. Seung C. Ahn & Alex R. Horenstein, 2013. "Eigenvalue Ratio Test for the Number of Factors," Econometrica, Econometric Society, vol. 81(3), pages 1203-1227, May.
    20. Bai, Jushan, 2004. "Estimating cross-section common stochastic trends in nonstationary panel data," Journal of Econometrics, Elsevier, vol. 122(1), pages 137-183, September.
    21. Pan, Jiazhu & Yao, Qiwei, 2008. "Modelling multiple time series via common factors," LSE Research Online Documents on Economics 22876, London School of Economics and Political Science, LSE Library.
    22. Gregory, Allan W. & Head, Allen C., 1999. "Common and country-specific fluctuations in productivity, investment, and the current account," Journal of Monetary Economics, Elsevier, vol. 44(3), pages 423-451, December.
    23. James H. Stock & Mark W. Watson, 1989. "New Indexes of Coincident and Leading Economic Indicators," NBER Chapters, in: NBER Macroeconomics Annual 1989, Volume 4, pages 351-409, National Bureau of Economic Research, Inc.
    24. Lam, Clifford & Yao, Qiwei, 2012. "Factor modeling for high-dimensional time series: inference for the number of factors," LSE Research Online Documents on Economics 45684, London School of Economics and Political Science, LSE Library.
    25. Stock J.H. & Watson M.W., 2002. "Forecasting Using Principal Components From a Large Number of Predictors," Journal of the American Statistical Association, American Statistical Association, vol. 97, pages 1167-1179, December.
    26. Stock, James H, 1987. "Asymptotic Properties of Least Squares Estimators of Cointegrating Vectors," Econometrica, Econometric Society, vol. 55(5), pages 1035-1056, September.
    27. Ming Yuan & Yi Lin, 2006. "Model selection and estimation in regression with grouped variables," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 68(1), pages 49-67, February.
    28. Jushan Bai, 2003. "Inferential Theory for Factor Models of Large Dimensions," Econometrica, Econometric Society, vol. 71(1), pages 135-171, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Gao, Zhaoxing & Tsay, Ruey S., 2023. "A Two-Way Transformed Factor Model for Matrix-Variate Time Series," Econometrics and Statistics, Elsevier, vol. 27(C), pages 83-101.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zhaoxing Gao & Ruey S. Tsay, 2020. "A Two-Way Transformed Factor Model for Matrix-Variate Time Series," Papers 2011.09029, arXiv.org.
    2. Poncela, Pilar & Ruiz, Esther & Miranda, Karen, 2021. "Factor extraction using Kalman filter and smoothing: This is not just another survey," International Journal of Forecasting, Elsevier, vol. 37(4), pages 1399-1425.
    3. Fan, Jianqing & Xue, Lingzhou & Yao, Jiawei, 2017. "Sufficient forecasting using factor models," Journal of Econometrics, Elsevier, vol. 201(2), pages 292-306.
    4. Catherine Doz & Peter Fuleky, 2019. "Dynamic Factor Models," Working Papers 2019-4, University of Hawaii Economic Research Organization, University of Hawaii at Manoa.
    5. Stock, J.H. & Watson, M.W., 2016. "Dynamic Factor Models, Factor-Augmented Vector Autoregressions, and Structural Vector Autoregressions in Macroeconomics," Handbook of Macroeconomics, in: J. B. Taylor & Harald Uhlig (ed.), Handbook of Macroeconomics, edition 1, volume 2, chapter 0, pages 415-525, Elsevier.
    6. Choi, Sung Hoon & Kim, Donggyu, 2023. "Large volatility matrix analysis using global and national factor models," Journal of Econometrics, Elsevier, vol. 235(2), pages 1917-1933.
    7. Yuefeng Han & Rong Chen & Cun-Hui Zhang, 2020. "Rank Determination in Tensor Factor Model," Papers 2011.07131, arXiv.org, revised May 2022.
    8. Yuefeng Han & Dan Yang & Cun-Hui Zhang & Rong Chen, 2021. "CP Factor Model for Dynamic Tensors," Papers 2110.15517, arXiv.org, revised Apr 2024.
    9. Fan, Jianqing & Ke, Yuan & Liao, Yuan, 2021. "Augmented factor models with applications to validating market risk factors and forecasting bond risk premia," Journal of Econometrics, Elsevier, vol. 222(1), pages 269-294.
    10. Gao, Zhaoxing & Tsay, Ruey S., 2023. "A Two-Way Transformed Factor Model for Matrix-Variate Time Series," Econometrics and Statistics, Elsevier, vol. 27(C), pages 83-101.
    11. Yuefeng Han & Rong Chen & Dan Yang & Cun-Hui Zhang, 2020. "Tensor Factor Model Estimation by Iterative Projection," Papers 2006.02611, arXiv.org, revised Jul 2024.
    12. Yoshimasa Uematsu & Takashi Yamagata, 2019. "Estimation of Weak Factor Models," DSSR Discussion Papers 96, Graduate School of Economics and Management, Tohoku University.
    13. Helmut Lütkepohl, 2014. "Structural Vector Autoregressive Analysis in a Data Rich Environment: A Survey," Discussion Papers of DIW Berlin 1351, DIW Berlin, German Institute for Economic Research.
    14. Zhe Sun & Yundong Tu, 2024. "Factors in Fashion: Factor Analysis towards the Mode," Papers 2409.19287, arXiv.org.
    15. Yoshimasa Uematsu & Takashi Yamagata, 2019. "Estimation of Weak Factor Models," ISER Discussion Paper 1053r, Institute of Social and Economic Research, Osaka University, revised Mar 2020.
    16. Simon Freyaldenhoven, 2020. "Identification Through Sparsity in Factor Models," Working Papers 20-25, Federal Reserve Bank of Philadelphia.
    17. Bodnar, Taras & Reiß, Markus, 2016. "Exact and asymptotic tests on a factor model in low and large dimensions with applications," Journal of Multivariate Analysis, Elsevier, vol. 150(C), pages 125-151.
    18. Jiti Gao & Guangming Pan & Yanrong Yang & Bo Zhang, 2019. "Estimation of Cross-Sectional Dependence in Large Panels," Papers 1904.06843, arXiv.org.
    19. Jianqing Fan & Yuan Liao & Martina Mincheva, 2013. "Large covariance estimation by thresholding principal orthogonal complements," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 75(4), pages 603-680, September.
    20. Aït-Sahalia, Yacine & Xiu, Dacheng, 2017. "Using principal component analysis to estimate a high dimensional factor model with high-frequency data," Journal of Econometrics, Elsevier, vol. 201(2), pages 384-399.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2103.14626. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.