IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v187y2023ics0167947323001330.html
   My bibliography  Save this article

Online missing value imputation for high-dimensional mixed-type data via generalized factor models

Author

Listed:
  • Liu, Wei
  • Luo, Lan
  • Zhou, Ling

Abstract

The complete-observation requirement of most machine learning methods necessitates new statistical methods to handle datasets messy with missing values. This is especially urgent for streaming data that are generated at high speed and with a lack of quality control. Missing data imputation becomes an inevitable preprocessing step before subsequent analysis. A practical and meaningful online imputation algorithm should be not only scalable to large-scale datasets but also able to manage high-dimensional mixed-type data containing binary, count and continuous variables. To fill this gap, a novel online imputation algorithm, called OMIG, is proposed for streaming data under the framework of generalized factor models. To obtain deeper insight, OMIG is theoretically and empirically compared to its other two versions, the oracle version and the offline version. Theoretical and numerical findings show that (a) the imputed data obtained by OMIG are not equivalent to but instead at a slower rate than those obtained by its oracle version in terms of imputation accuracy; (b) OMIG outperforms its offline version in imputation accuracy; and (c) OMIG is equivalent to its oracle version in estimation accuracy for the factor loading, which largely facilitates interpretation and follow-up analysis. Extensive numerical experiments and two real datasets are used to demonstrate the performance of the proposed method.

Suggested Citation

  • Liu, Wei & Luo, Lan & Zhou, Ling, 2023. "Online missing value imputation for high-dimensional mixed-type data via generalized factor models," Computational Statistics & Data Analysis, Elsevier, vol. 187(C).
  • Handle: RePEc:eee:csdana:v:187:y:2023:i:c:s0167947323001330
    DOI: 10.1016/j.csda.2023.107822
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947323001330
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2023.107822?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Bai, Jushan & Ng, Serena, 2013. "Principal components estimation and identification of static factors," Journal of Econometrics, Elsevier, vol. 176(1), pages 18-29.
    2. Jushan Bai & Serena Ng, 2002. "Determining the Number of Factors in Approximate Factor Models," Econometrica, Econometric Society, vol. 70(1), pages 191-221, January.
    3. Lan Luo & Peter X.‐K. Song, 2020. "Renewable estimation and incremental inference in generalized linear models with streaming data sets," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 82(1), pages 69-97, February.
    4. Jin, Sainan & Miao, Ke & Su, Liangjun, 2021. "On factor models with random missing: EM estimation, inference, and cross validation," Journal of Econometrics, Elsevier, vol. 222(1), pages 745-777.
    5. Ruoxuan Xiong & Markus Pelger, 2019. "Large Dimensional Latent Factor Modeling with Missing Observations and Applications to Causal Inference," Papers 1910.08273, arXiv.org, revised Jan 2022.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Cahan, Ercument & Bai, Jushan & Ng, Serena, 2023. "Factor-based imputation of missing values and covariances in panel data of large dimensions," Journal of Econometrics, Elsevier, vol. 233(1), pages 113-131.
    2. Yinchu Zhu, 2019. "How well can we learn large factor models without assuming strong factors?," Papers 1910.10382, arXiv.org, revised Nov 2019.
    3. Jianqing Fan & Kunpeng Li & Yuan Liao, 2020. "Recent Developments on Factor Models and its Applications in Econometric Learning," Papers 2009.10103, arXiv.org.
    4. Jushan Bai & Serena Ng, 2020. "Simpler Proofs for Approximate Factor Models of Large Dimensions," Papers 2008.00254, arXiv.org.
    5. Thomas Despois & Catherine Doz, 2023. "Identifying and interpreting the factors in factor models via sparsity: Different approaches," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 38(4), pages 533-555, June.
    6. Yunus Emre Ergemen & Carlos Vladimir Rodríguez-Caballero, 2016. "A Dynamic Multi-Level Factor Model with Long-Range Dependence," CREATES Research Papers 2016-23, Department of Economics and Business Economics, Aarhus University.
    7. Ruoxuan Xiong & Markus Pelger, 2019. "Large Dimensional Latent Factor Modeling with Missing Observations and Applications to Causal Inference," Papers 1910.08273, arXiv.org, revised Jan 2022.
    8. Aleksandra Halka & Grzegorz Szafranski, 2018. "What Common Factors are Driving Inflation in CEE Countries?," Prague Economic Papers, Prague University of Economics and Business, vol. 2018(2), pages 131-148.
    9. Mao Takongmo, Charles Olivier & Stevanovic, Dalibor, 2015. "Selection Of The Number Of Factors In Presence Of Structural Instability: A Monte Carlo Study," L'Actualité Economique, Société Canadienne de Science Economique, vol. 91(1-2), pages 177-233, Mars-Juin.
    10. Stock, J.H. & Watson, M.W., 2016. "Dynamic Factor Models, Factor-Augmented Vector Autoregressions, and Structural Vector Autoregressions in Macroeconomics," Handbook of Macroeconomics, in: J. B. Taylor & Harald Uhlig (ed.), Handbook of Macroeconomics, edition 1, volume 2, chapter 0, pages 415-525, Elsevier.
    11. Antoine A. Djogbenou, 2020. "Comovements in the real activity of developed and emerging economies: A test of global versus specific international factors," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 35(3), pages 344-370, April.
    12. Matteo Barigozzi, 2023. "Asymptotic equivalence of Principal Components and Quasi Maximum Likelihood estimators in Large Approximate Factor Models," Papers 2307.09864, arXiv.org, revised May 2024.
    13. Poncela, Pilar & Ruiz, Esther & Miranda, Karen, 2021. "Factor extraction using Kalman filter and smoothing: This is not just another survey," International Journal of Forecasting, Elsevier, vol. 37(4), pages 1399-1425.
    14. Simon Beyeler & Sylvia Kaufmann, 2021. "Reduced‐form factor augmented VAR—Exploiting sparsity to include meaningful factors," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 36(7), pages 989-1012, November.
    15. Yoshimasa Uematsu & Takashi Yamagata, 2019. "Estimation of Weak Factor Models," DSSR Discussion Papers 96, Graduate School of Economics and Management, Tohoku University.
    16. Juho Koistinen & Bernd Funovits, 2022. "Estimation of Impulse-Response Functions with Dynamic Factor Models: A New Parametrization," Papers 2202.00310, arXiv.org, revised Feb 2022.
    17. Haruo Iwakura & Ryo Okui, 2014. "Asymptotic Efficiency in Factor Models and Dynamic Panel Data Models," KIER Working Papers 887, Kyoto University, Institute of Economic Research.
    18. Francisco Corona & Pilar Poncela & Esther Ruiz, 2017. "Determining the number of factors after stationary univariate transformations," Empirical Economics, Springer, vol. 53(1), pages 351-372, August.
    19. Ergemen, Yunus Emre, 2023. "Parametric estimation of long memory in factor models," Journal of Econometrics, Elsevier, vol. 235(2), pages 1483-1499.
    20. Yohei Yamamoto, 2019. "Bootstrap inference for impulse response functions in factor‐augmented vector autoregressions," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 34(2), pages 247-267, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:187:y:2023:i:c:s0167947323001330. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.