IDEAS home Printed from https://ideas.repec.org/a/eee/econom/v233y2023i1p113-131.html
   My bibliography  Save this article

Factor-based imputation of missing values and covariances in panel data of large dimensions

Author

Listed:
  • Cahan, Ercument
  • Bai, Jushan
  • Ng, Serena

Abstract

Economists are blessed with a wealth of data for analysis, but more often than not, values in some entries of the data matrix are missing. Various methods have been proposed to handle missing observations in a few variables. We exploit the factor structure in panel data of large dimensions. Our tall-project algorithm first estimates the factors from a tall block in which data for all rows are observed, and projections of unit specific sample size are then used to estimate the factor loadings. A missing value is imputed by its estimated common component which we show is consistent and asymptotically normal without further iteration. Implications for using imputed data in factor augmented regressions are then discussed. To compensate for the downward bias in sample covariance matrices created by an omitted noise in each imputed value, we overlay the imputed data with re-sampled idiosyncratic residuals many times and use the average of the covariances to estimate the parameters of interest. Simulations show that the procedures have desirable finite sample properties.

Suggested Citation

  • Cahan, Ercument & Bai, Jushan & Ng, Serena, 2023. "Factor-based imputation of missing values and covariances in panel data of large dimensions," Journal of Econometrics, Elsevier, vol. 233(1), pages 113-131.
  • Handle: RePEc:eee:econom:v:233:y:2023:i:1:p:113-131
    DOI: 10.1016/j.jeconom.2022.01.006
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0304407622000215
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.jeconom.2022.01.006?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to look for a different version below or search for a different version of it.

    Other versions of this item:

    References listed on IDEAS

    as
    1. Giannone, Domenico & Reichlin, Lucrezia & Small, David, 2008. "Nowcasting: The real-time informational content of macroeconomic data," Journal of Monetary Economics, Elsevier, vol. 55(4), pages 665-676, May.
    2. Jushan Bai & Serena Ng, 2002. "Determining the Number of Factors in Approximate Factor Models," Econometrica, Econometric Society, vol. 70(1), pages 191-221, January.
    3. Gonçalves, Sílvia & Perron, Benoit, 2020. "Bootstrapping factor models with cross sectional dependence," Journal of Econometrics, Elsevier, vol. 218(2), pages 476-495.
    4. Xiong, Ruoxuan & Pelger, Markus, 2023. "Large dimensional latent factor modeling with missing observations and applications to causal inference," Journal of Econometrics, Elsevier, vol. 233(1), pages 271-301.
    5. Jushan Bai & Serena Ng, 2021. "Matrix Completion, Counterfactuals, and Factor Analysis of Missing Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(536), pages 1746-1763, October.
    6. Susan Athey & Mohsen Bayati & Nikolay Doudchenko & Guido Imbens & Khashayar Khosravi, 2021. "Matrix Completion Methods for Causal Panel Data Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(536), pages 1716-1730, October.
    7. Jin, Sainan & Miao, Ke & Su, Liangjun, 2021. "On factor models with random missing: EM estimation, inference, and cross validation," Journal of Econometrics, Elsevier, vol. 222(1), pages 745-777.
    8. Domenico Giannone & Lucrezia Reichlin & David Small, 2008. "Nowcasting: the real time informational content of macroeconomic data releases," ULB Institutional Repository 2013/6409, ULB -- Universite Libre de Bruxelles.
    9. Domenico Giannone & Lucrezia Reichlin & David H. Small, 2005. "Nowcasting GDP and inflation: the real-time informational content of macroeconomic data releases," Finance and Economics Discussion Series 2005-42, Board of Governors of the Federal Reserve System (U.S.).
    10. Jungbacker, B. & Koopman, S.J. & van der Wel, M., 2011. "Maximum likelihood estimation for dynamic factor models with missing data," Journal of Economic Dynamics and Control, Elsevier, vol. 35(8), pages 1358-1368, August.
    11. Jushan Bai, 2003. "Inferential Theory for Factor Models of Large Dimensions," Econometrica, Econometric Society, vol. 71(1), pages 135-171, January.
    12. Jushan Bai & Serena Ng, 2006. "Confidence Intervals for Diffusion Index Forecasts and Inference for Factor-Augmented Regressions," Econometrica, Econometric Society, vol. 74(4), pages 1133-1150, July.
    13. Chamberlain, Gary & Rothschild, Michael, 1983. "Arbitrage, Factor Structure, and Mean-Variance Analysis on Large Asset Markets," Econometrica, Econometric Society, vol. 51(5), pages 1281-1304, September.
    14. J. B. Taylor & Harald Uhlig (ed.), 2016. "Handbook of Macroeconomics," Handbook of Macroeconomics, Elsevier, edition 1, volume 2, number 2.
    15. Marta Bańbura & Michele Modugno, 2014. "Maximum Likelihood Estimation Of Factor Models On Datasets With Arbitrary Pattern Of Missing Data," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 29(1), pages 133-160, January.
    16. B. Jungbacker & S.J. Koopman & M. van der Wel, 2009. "Dynamic Factor Analysis in The Presence of Missing Data," Tinbergen Institute Discussion Papers 09-010/4, Tinbergen Institute, revised 11 Mar 2011.
    17. Stock J.H. & Watson M.W., 2002. "Forecasting Using Principal Components From a Large Number of Predictors," Journal of the American Statistical Association, American Statistical Association, vol. 97, pages 1167-1179, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Helena Chuliá & Sabuhi Khalili & Jorge M. Uribe, 2024. "Monitoring time-varying systemic risk in sovereign debt and currency markets with generative AI," IREA Working Papers 202402, University of Barcelona, Research Institute of Applied Economics, revised Feb 2024.
    2. Juan, Aranzazu de & Poncela, Maria Pilar, 2023. "Economic activity and C02 emissions in Spain," DES - Working Papers. Statistics and Econometrics. WS 37975, Universidad Carlos III de Madrid. Departamento de Estadística.
    3. Zhou, Ruichao & Wu, Jianhong, 2023. "Determining the number of change-points in high-dimensional factor models by cross-validation with matrix completion," Economics Letters, Elsevier, vol. 232(C).
    4. Christian Fieberg & Daniel Metko & Thorsten Poddig & Thomas Loy, 2023. "Machine learning techniques for cross-sectional equity returns’ prediction," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 45(1), pages 289-323, March.
    5. Gomez-Gonzalez, Jose E. & Uribe, Jorge M. & Valencia, Oscar, 2024. "Asymmetric Sovereign Risk: Implications for Climate Change Preparation," IDB Publications (Working Papers) 13447, Inter-American Development Bank.
    6. Jungjun Choi & Ming Yuan, 2023. "Matrix Completion When Missing Is Not at Random and Its Applications in Causal Panel Data Models," Papers 2308.02364, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xiong, Ruoxuan & Pelger, Markus, 2023. "Large dimensional latent factor modeling with missing observations and applications to causal inference," Journal of Econometrics, Elsevier, vol. 233(1), pages 271-301.
    2. Stock, J.H. & Watson, M.W., 2016. "Dynamic Factor Models, Factor-Augmented Vector Autoregressions, and Structural Vector Autoregressions in Macroeconomics," Handbook of Macroeconomics, in: J. B. Taylor & Harald Uhlig (ed.), Handbook of Macroeconomics, edition 1, volume 2, chapter 0, pages 415-525, Elsevier.
    3. Jianqing Fan & Kunpeng Li & Yuan Liao, 2020. "Recent Developments on Factor Models and its Applications in Econometric Learning," Papers 2009.10103, arXiv.org.
    4. Catherine Doz & Peter Fuleky, 2019. "Dynamic Factor Models," Working Papers 2019-4, University of Hawaii Economic Research Organization, University of Hawaii at Manoa.
    5. Pilar Poncela & Esther Ruiz, 2016. "Small- Versus Big-Data Factor Extraction in Dynamic Factor Models: An Empirical Assessment," Advances in Econometrics, in: Dynamic Factor Models, volume 35, pages 401-434, Emerald Group Publishing Limited.
    6. Jushan Bai & Serena Ng, 2021. "Matrix Completion, Counterfactuals, and Factor Analysis of Missing Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(536), pages 1746-1763, October.
    7. Poncela, Pilar & Ruiz, Esther & Miranda, Karen, 2021. "Factor extraction using Kalman filter and smoothing: This is not just another survey," International Journal of Forecasting, Elsevier, vol. 37(4), pages 1399-1425.
    8. Jin, Sainan & Miao, Ke & Su, Liangjun, 2021. "On factor models with random missing: EM estimation, inference, and cross validation," Journal of Econometrics, Elsevier, vol. 222(1), pages 745-777.
    9. Matteo Barigozzi & Matteo Luciani, 2019. "Quasi Maximum Likelihood Estimation and Inference of Large Approximate Dynamic Factor Models via the EM algorithm," Papers 1910.03821, arXiv.org, revised Sep 2024.
    10. Daniel J. Lewis & Karel Mertens & James H. Stock & Mihir Trivedi, 2022. "Measuring real activity using a weekly economic index," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 37(4), pages 667-687, June.
    11. Kaufmann, Daniel & Scheufele, Rolf, 2017. "Business tendency surveys and macroeconomic fluctuations," International Journal of Forecasting, Elsevier, vol. 33(4), pages 878-893.
    12. De Mol, Christine & Giannone, Domenico & Reichlin, Lucrezia, 2008. "Forecasting using a large number of predictors: Is Bayesian shrinkage a valid alternative to principal components?," Journal of Econometrics, Elsevier, vol. 146(2), pages 318-328, October.
    13. Marcellino, Massimiliano & Sivec, Vasja, 2016. "Monetary, fiscal and oil shocks: Evidence based on mixed frequency structural FAVARs," Journal of Econometrics, Elsevier, vol. 193(2), pages 335-348.
    14. Ma, Tao & Zhou, Zhou & Antoniou, Constantinos, 2018. "Dynamic factor model for network traffic state forecast," Transportation Research Part B: Methodological, Elsevier, vol. 118(C), pages 281-317.
    15. Monica Defend & Aleksey Min & Lorenzo Portelli & Franz Ramsauer & Francesco Sandrini & Rudi Zagst, 2021. "Quantifying Drivers of Forecasted Returns Using Approximate Dynamic Factor Models for Mixed-Frequency Panel Data," Forecasting, MDPI, vol. 3(1), pages 1-35, February.
    16. Martin Solberger & Erik Spånberg, 2020. "Estimating a Dynamic Factor Model in EViews Using the Kalman Filter and Smoother," Computational Economics, Springer;Society for Computational Economics, vol. 55(3), pages 875-900, March.
    17. Catherine Doz & Domenico Giannone & Lucrezia Reichlin, 2012. "A Quasi–Maximum Likelihood Approach for Large, Approximate Dynamic Factor Models," The Review of Economics and Statistics, MIT Press, vol. 94(4), pages 1014-1024, November.
    18. Alvarez, Rocio & Camacho, Maximo & Perez-Quiros, Gabriel, 2016. "Aggregate versus disaggregate information in dynamic factor models," International Journal of Forecasting, Elsevier, vol. 32(3), pages 680-694.
    19. Bai, Jushan & Liao, Yuan, 2016. "Efficient estimation of approximate factor models via penalized maximum likelihood," Journal of Econometrics, Elsevier, vol. 191(1), pages 1-18.
    20. Bräuning, Falk & Koopman, Siem Jan, 2014. "Forecasting macroeconomic variables using collapsed dynamic factor analysis," International Journal of Forecasting, Elsevier, vol. 30(3), pages 572-584.

    More about this item

    Keywords

    Risk management; Covariance structure; Matrix completion; Incomplete data;
    All these keywords.

    JEL classification:

    • C1 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General
    • C2 - Mathematical and Quantitative Methods - - Single Equation Models; Single Variables

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:econom:v:233:y:2023:i:1:p:113-131. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/jeconom .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.