IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v179y2023ics0167947322002109.html
   My bibliography  Save this article

Block-diagonal precision matrix regularization for ultra-high dimensional data

Author

Listed:
  • Yang, Yihe
  • Dai, Hongsheng
  • Pan, Jianxin

Abstract

A method that estimates the precision matrix of multiple variables in the extreme scope of “ultrahigh dimension” and “small sample-size” is proposed. Initially, a covariance column-wise screening method is provided in order to identify a small sub-group, which are significantly correlated, from thousands and even millions of variables. Then, a regularization of block-diagonal covariance structure of the thousands or millions of variables is imposed, in which only the covariances of variables in that small sub-group are retained and all others vanish. It is further proven that under some mild conditions the vital sub-group identified by the covariance column-wise screening method is consistent. A major advantage of the proposed method is its efficiency - it produces a reliable precision matrix estimator for thousands of variables within a few of seconds while the existing methods take at least several hours and even so still yield inaccurate estimators. Empirical data studies and numerical simulations show that the proposed precision matrix estimation greatly outperforms existing methods in the sense of taking much less computing time and resulting in much more accurate estimation when dealing with ultrahigh dimensional data.

Suggested Citation

  • Yang, Yihe & Dai, Hongsheng & Pan, Jianxin, 2023. "Block-diagonal precision matrix regularization for ultra-high dimensional data," Computational Statistics & Data Analysis, Elsevier, vol. 179(C).
  • Handle: RePEc:eee:csdana:v:179:y:2023:i:c:s0167947322002109
    DOI: 10.1016/j.csda.2022.107630
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947322002109
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2022.107630?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Wenliang Pan & Xueqin Wang & Weinan Xiao & Hongtu Zhu, 2019. "A Generic Sure Independence Screening Procedure," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(526), pages 928-937, April.
    2. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    3. Jianqing Fan & Yuan Liao & Martina Mincheva, 2013. "Large covariance estimation by thresholding principal orthogonal complements," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 75(4), pages 603-680, September.
    4. Liu, Weidong & Luo, Xi, 2015. "Fast and adaptive sparse precision matrix estimation in high dimensions," Journal of Multivariate Analysis, Elsevier, vol. 135(C), pages 153-162.
    5. Runze Li & Wei Zhong & Liping Zhu, 2012. "Feature Screening via Distance Correlation Learning," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(499), pages 1129-1139, September.
    6. Ming Yuan & Yi Lin, 2007. "Model selection and estimation in the Gaussian graphical model," Biometrika, Biometrika Trust, vol. 94(1), pages 19-35.
    7. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    8. He, Kevin & Kang, Jian & Hong, Hyokyoung G. & Zhu, Ji & Li, Yanming & Lin, Huazhen & Xu, Han & Li, Yi, 2019. "Covariance-insured screening," Computational Statistics & Data Analysis, Elsevier, vol. 132(C), pages 100-114.
    9. Cai, Tony & Liu, Weidong, 2011. "Adaptive Thresholding for Sparse Covariance Matrix Estimation," Journal of the American Statistical Association, American Statistical Association, vol. 106(494), pages 672-684.
    10. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    11. Fan, Jianqing & Kim, Donggyu, 2019. "Structured volatility matrix estimation for non-synchronized high-frequency financial data," Journal of Econometrics, Elsevier, vol. 209(1), pages 61-78.
    12. Teng Zhang & Hui Zou, 2014. "Sparse precision matrix estimation via lasso penalized D-trace loss," Biometrika, Biometrika Trust, vol. 101(1), pages 103-120.
    13. Rothman, Adam J. & Levina, Elizaveta & Zhu, Ji, 2009. "Generalized Thresholding of Large Covariance Matrices," Journal of the American Statistical Association, American Statistical Association, vol. 104(485), pages 177-186.
    14. Cai, Tony & Liu, Weidong & Luo, Xi, 2011. "A Constrained â„“1 Minimization Approach to Sparse Precision Matrix Estimation," Journal of the American Statistical Association, American Statistical Association, vol. 106(494), pages 594-607.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bailey, Natalia & Pesaran, M. Hashem & Smith, L. Vanessa, 2019. "A multiple testing approach to the regularisation of large sample correlation matrices," Journal of Econometrics, Elsevier, vol. 208(2), pages 507-534.
    2. Jianqing Fan & Yuan Liao & Han Liu, 2016. "An overview of the estimation of large covariance and precision matrices," Econometrics Journal, Royal Economic Society, vol. 19(1), pages 1-32, February.
    3. Wang, Luheng & Chen, Zhao & Wang, Christina Dan & Li, Runze, 2020. "Ultrahigh dimensional precision matrix estimation via refitted cross validation," Journal of Econometrics, Elsevier, vol. 215(1), pages 118-130.
    4. Ziqi Chen & Chenlei Leng, 2016. "Dynamic Covariance Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(515), pages 1196-1207, July.
    5. Li, Degui, 2024. "Estimation of Large Dynamic Covariance Matrices: A Selective Review," Econometrics and Statistics, Elsevier, vol. 29(C), pages 16-30.
    6. Zeyu Wu & Cheng Wang & Weidong Liu, 2023. "A unified precision matrix estimation framework via sparse column-wise inverse operator under weak sparsity," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 75(4), pages 619-648, August.
    7. Lam, Clifford, 2020. "High-dimensional covariance matrix estimation," LSE Research Online Documents on Economics 101667, London School of Economics and Political Science, LSE Library.
    8. repec:hal:journl:hal-04675599 is not listed on IDEAS
    9. Yang, Yihe & Zhou, Jie & Pan, Jianxin, 2021. "Estimation and optimal structure selection of high-dimensional Toeplitz covariance matrix," Journal of Multivariate Analysis, Elsevier, vol. 184(C).
    10. Dai, Chaoxing & Lu, Kun & Xiu, Dacheng, 2019. "Knowing factors or factor loadings, or neither? Evaluating estimators of large covariance matrices with noisy and asynchronous data," Journal of Econometrics, Elsevier, vol. 208(1), pages 43-79.
    11. Vahe Avagyan & Andrés M. Alonso & Francisco J. Nogales, 2018. "D-trace estimation of a precision matrix using adaptive Lasso penalties," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(2), pages 425-447, June.
    12. Yin, Jianxin & Li, Hongzhe, 2012. "Model selection and estimation in the matrix normal graphical model," Journal of Multivariate Analysis, Elsevier, vol. 107(C), pages 119-140.
    13. Avagyan, Vahe & Nogales, Francisco J., 2015. "D-trace Precision Matrix Estimation Using Adaptive Lasso Penalties," DES - Working Papers. Statistics and Econometrics. WS 21775, Universidad Carlos III de Madrid. Departamento de Estadística.
    14. Zhong, Wei & Wang, Jiping & Chen, Xiaolin, 2021. "Censored mean variance sure independence screening for ultrahigh dimensional survival data," Computational Statistics & Data Analysis, Elsevier, vol. 159(C).
    15. Chen, Shuo & Kang, Jian & Xing, Yishi & Zhao, Yunpeng & Milton, Donald K., 2018. "Estimating large covariance matrix with network topology for high-dimensional biomedical data," Computational Statistics & Data Analysis, Elsevier, vol. 127(C), pages 82-95.
    16. Zhang Haixiang & Zheng Yinan & Yoon Grace & Zhang Zhou & Gao Tao & Joyce Brian & Zhang Wei & Schwartz Joel & Vokonas Pantel & Colicino Elena & Baccarelli Andrea & Hou Lifang & Liu Lei, 2017. "Regularized estimation in sparse high-dimensional multivariate regression, with application to a DNA methylation study," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 16(3), pages 159-171, August.
    17. Shaoxin Wang & Hu Yang & Chaoli Yao, 2019. "On the penalized maximum likelihood estimation of high-dimensional approximate factor model," Computational Statistics, Springer, vol. 34(2), pages 819-846, June.
    18. Bai, Jushan & Liao, Yuan, 2012. "Efficient Estimation of Approximate Factor Models," MPRA Paper 41558, University Library of Munich, Germany.
    19. Na Huang & Piotr Fryzlewicz, 2019. "NOVELIST estimator of large correlation and covariance matrices and their inverses," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(3), pages 694-727, September.
    20. Jianqing Fan & Xu Han, 2017. "Estimation of the false discovery proportion with unknown dependence," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(4), pages 1143-1164, September.
    21. Wang, Christina Dan & Chen, Zhao & Lian, Yimin & Chen, Min, 2022. "Asset selection based on high frequency Sharpe ratio," Journal of Econometrics, Elsevier, vol. 227(1), pages 168-188.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:179:y:2023:i:c:s0167947322002109. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.