IDEAS home Printed from https://ideas.repec.org/a/bla/scjsta/v50y2023i3p1279-1297.html
   My bibliography  Save this article

Variable selection for high‐dimensional generalized linear model with block‐missing data

Author

Listed:
  • Yifan He
  • Yang Feng
  • Xinyuan Song

Abstract

In modern scientific research, multiblock missing data emerges with synthesizing information across multiple studies. However, existing imputation methods for handling block‐wise missing data either focus on the single‐block missing pattern or heavily rely on the model structure. In this study, we propose a single regression‐based imputation algorithm for multiblock missing data. First, we conduct a sparse precision matrix estimation based on the structure of block‐wise missing data. Second, we impute the missing blocks with their means conditional on the observed blocks. Theoretical results about variable selection and estimation consistency are established in the context of a generalized linear model. Moreover, simulation studies show that compared with existing methods, the proposed imputation procedure is robust to various missing mechanisms because of the good properties of regression imputation. An application to Alzheimer's Disease Neuroimaging Initiative data also confirms the superiority of our proposed method.

Suggested Citation

  • Yifan He & Yang Feng & Xinyuan Song, 2023. "Variable selection for high‐dimensional generalized linear model with block‐missing data," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 50(3), pages 1279-1297, September.
  • Handle: RePEc:bla:scjsta:v:50:y:2023:i:3:p:1279-1297
    DOI: 10.1111/sjos.12632
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/sjos.12632
    Download Restriction: no

    File URL: https://libkey.io/10.1111/sjos.12632?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Sang Su Kwak & Kevin J. Washicosky & Emma Brand & Djuna Maydell & Jenna Aronson & Susan Kim & Diane E. Capen & Murat Cetinbas & Ruslan Sadreyev & Shen Ning & Enjana Bylykbashi & Weiming Xia & Steven L, 2020. "Amyloid-β42/40 ratio drives tau pathology in 3D human neural cell culture models of Alzheimer’s disease," Nature Communications, Nature, vol. 11(1), pages 1-14, December.
    2. Daniela M. Witten & Robert Tibshirani, 2009. "Covariance‐regularized regression and classification for high dimensional problems," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(3), pages 615-636, June.
    3. Lam, Clifford & Fan, Jianqing, 2009. "Sparsistency and rates of convergence in large covariance matrix estimation," LSE Research Online Documents on Economics 31540, London School of Economics and Political Science, LSE Library.
    4. Patrick Royston, 2004. "Multiple imputation of missing values," Stata Journal, StataCorp LP, vol. 4(3), pages 227-241, September.
    5. Tianxi Cai & T. Tony Cai & Anru Zhang, 2016. "Structured Matrix Completion with Applications to Genomic Data Integration," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(514), pages 621-633, April.
    6. Yingying Fan & Jinchi Lv, 2013. "Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(503), pages 1044-1061, September.
    7. Ming Yuan & Yi Lin, 2007. "Model selection and estimation in the Gaussian graphical model," Biometrika, Biometrika Trust, vol. 94(1), pages 19-35.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Aaron J Molstad & Adam J Rothman, 2018. "Shrinking characteristics of precision matrix estimators," Biometrika, Biometrika Trust, vol. 105(3), pages 563-574.
    2. Benjamin Poignard & Manabu Asai, 2023. "Estimation of high-dimensional vector autoregression via sparse precision matrix," The Econometrics Journal, Royal Economic Society, vol. 26(2), pages 307-326.
    3. Huangdi Yi & Qingzhao Zhang & Cunjie Lin & Shuangge Ma, 2022. "Information‐incorporated Gaussian graphical model for gene expression data," Biometrics, The International Biometric Society, vol. 78(2), pages 512-523, June.
    4. S Klaassen & J Kueck & M Spindler & V Chernozhukov, 2023. "Uniform inference in high-dimensional Gaussian graphical models," Biometrika, Biometrika Trust, vol. 110(1), pages 51-68.
    5. Zamar, Rubén, 2015. "Ranking Edges and Model Selection in High-Dimensional Graphs," DES - Working Papers. Statistics and Econometrics. WS ws1511, Universidad Carlos III de Madrid. Departamento de Estadística.
    6. Lam, Clifford, 2020. "High-dimensional covariance matrix estimation," LSE Research Online Documents on Economics 101667, London School of Economics and Political Science, LSE Library.
    7. Khai X. Chiong & Hyungsik Roger Moon, 2017. "Estimation of Graphical Models using the $L_{1,2}$ Norm," Papers 1709.10038, arXiv.org, revised Oct 2017.
    8. Wang, Luheng & Chen, Zhao & Wang, Christina Dan & Li, Runze, 2020. "Ultrahigh dimensional precision matrix estimation via refitted cross validation," Journal of Econometrics, Elsevier, vol. 215(1), pages 118-130.
    9. Gautam Sabnis & Debdeep Pati & Anirban Bhattacharya, 2019. "Compressed Covariance Estimation with Automated Dimension Learning," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 81(2), pages 466-481, December.
    10. Maboudou-Tchao, Edgard M. & Agboto, Vincent, 2013. "Monitoring the covariance matrix with fewer observations than variables," Computational Statistics & Data Analysis, Elsevier, vol. 64(C), pages 99-112.
    11. Xiao Guo & Hai Zhang, 2020. "Sparse directed acyclic graphs incorporating the covariates," Statistical Papers, Springer, vol. 61(5), pages 2119-2148, October.
    12. Lin Zhang & Andrew DiLernia & Karina Quevedo & Jazmin Camchong & Kelvin Lim & Wei Pan, 2021. "A random covariance model for bi‐level graphical modeling with application to resting‐state fMRI data," Biometrics, The International Biometric Society, vol. 77(4), pages 1385-1396, December.
    13. Kang, Xiaoning & Wang, Mingqiu, 2021. "Ensemble sparse estimation of covariance structure for exploring genetic disease data," Computational Statistics & Data Analysis, Elsevier, vol. 159(C).
    14. Bailey, Natalia & Pesaran, M. Hashem & Smith, L. Vanessa, 2019. "A multiple testing approach to the regularisation of large sample correlation matrices," Journal of Econometrics, Elsevier, vol. 208(2), pages 507-534.
    15. Johannes Lederer & Christian L. Müller, 2022. "Topology Adaptive Graph Estimation in High Dimensions," Mathematics, MDPI, vol. 10(8), pages 1-10, April.
    16. Chen, Shuo & Kang, Jian & Xing, Yishi & Zhao, Yunpeng & Milton, Donald K., 2018. "Estimating large covariance matrix with network topology for high-dimensional biomedical data," Computational Statistics & Data Analysis, Elsevier, vol. 127(C), pages 82-95.
    17. Tan, Kean Ming & Witten, Daniela & Shojaie, Ali, 2015. "The cluster graphical lasso for improved estimation of Gaussian graphical models," Computational Statistics & Data Analysis, Elsevier, vol. 85(C), pages 23-36.
    18. Pan, Yuqing & Mai, Qing, 2020. "Efficient computation for differential network analysis with applications to quadratic discriminant analysis," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    19. Fan, Xinyan & Zhang, Qingzhao & Ma, Shuangge & Fang, Kuangnan, 2021. "Conditional score matching for high-dimensional partial graphical models," Computational Statistics & Data Analysis, Elsevier, vol. 153(C).
    20. Katayama, Shota & Imori, Shinpei, 2014. "Lasso penalized model selection criteria for high-dimensional multivariate linear regression analysis," Journal of Multivariate Analysis, Elsevier, vol. 132(C), pages 138-150.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:scjsta:v:50:y:2023:i:3:p:1279-1297. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0303-6898 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.