IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i7p951-d1362481.html
   My bibliography  Save this article

Imputation-Based Variable Selection Method for Block-Wise Missing Data When Integrating Multiple Longitudinal Studies

Author

Listed:
  • Zhongzhe Ouyang

    (Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA)

  • Lu Wang

    (Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA)

  • Alzheimer’s Disease Neuroimaging Initiative

    (Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
    Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database ( adni.loni.usc.edu ). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf (accessed on 29 February 2024).)

Abstract

When integrating data from multiple sources, a common challenge is block-wise missing. Most existing methods address this issue only in cross-sectional studies. In this paper, we propose a method for variable selection when combining datasets from multiple sources in longitudinal studies. To account for block-wise missing in covariates, we impute the missing values multiple times based on combinations of samples from different missing pattern and predictors from different data sources. We then use these imputed data to construct estimating equations, and aggregate the information across subjects and sources with the generalized method of moments. We employ the smoothly clipped absolute deviation penalty in variable selection and use the extended Bayesian Information Criterion criteria for tuning parameter selection. We establish the asymptotic properties of the proposed estimator, and demonstrate the superior performance of the proposed method through numerical experiments. Furthermore, we apply the proposed method in the Alzheimer’s Disease Neuroimaging Initiative study to identify sensitive early-stage biomarkers of Alzheimer’s Disease, which is crucial for early disease detection and personalized treatment.

Suggested Citation

  • Zhongzhe Ouyang & Lu Wang & Alzheimer’s Disease Neuroimaging Initiative, 2024. "Imputation-Based Variable Selection Method for Block-Wise Missing Data When Integrating Multiple Longitudinal Studies," Mathematics, MDPI, vol. 12(7), pages 1-14, March.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:7:p:951-:d:1362481
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/7/951/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/7/951/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Hansen, Lars Peter, 1982. "Large Sample Properties of Generalized Method of Moments Estimators," Econometrica, Econometric Society, vol. 50(4), pages 1029-1054, July.
    2. Guan Yu & Quefeng Li & Dinggang Shen & Yufeng Liu, 2020. "Optimal Sparse Linear Prediction for Block-missing Multi-modality Data Without Imputation," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(531), pages 1406-1419, July.
    3. Wang, Tao & Zhu, Lixing, 2011. "Consistent tuning parameter selection in high dimensional sparse linear regression," Journal of Multivariate Analysis, Elsevier, vol. 102(7), pages 1141-1151, August.
    4. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    5. Fei Xue & Annie Qu, 2021. "Integrating Multisource Block-Wise Missing Data in Model Selection," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(536), pages 1914-1927, October.
    6. Johnson, Brent A. & Lin, D.Y. & Zeng, Donglin, 2008. "Penalized Estimating Functions and Variable Selection in Semiparametric Regression Models," Journal of the American Statistical Association, American Statistical Association, vol. 103, pages 672-680, June.
    7. José R. Zubizarreta, 2015. "Stable Weights that Balance Covariates for Estimation With Incomplete Outcome Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(511), pages 910-922, September.
    8. Tian, Ruiqin & Xue, Liugen & Liu, Chunling, 2014. "Penalized quadratic inference functions for semiparametric varying coefficient partially linear models with longitudinal data," Journal of Multivariate Analysis, Elsevier, vol. 132(C), pages 94-110.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dasom Lee & Shu Yang & Lin Dong & Xiaofei Wang & Donglin Zeng & Jianwen Cai, 2023. "Improving trial generalizability using observational studies," Biometrics, The International Biometric Society, vol. 79(2), pages 1213-1225, June.
    2. Alexandre Belloni & Victor Chernozhukov & Ivan Fernandez-Val & Christian Hansen, 2013. "Program evaluation with high-dimensional data," CeMMAP working papers CWP77/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    3. Yongjin Li & Qingzhao Zhang & Qihua Wang, 2017. "Penalized estimation equation for an extended single-index model," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 69(1), pages 169-187, February.
    4. Zhang, Tonglin, 2024. "Variables selection using L0 penalty," Computational Statistics & Data Analysis, Elsevier, vol. 190(C).
    5. Brittany Green & Heng Lian & Yan Yu & Tianhai Zu, 2021. "Ultra high‐dimensional semiparametric longitudinal data analysis," Biometrics, The International Biometric Society, vol. 77(3), pages 903-913, September.
    6. Xiaochao Xia & Binyan Jiang & Jialiang Li & Wenyang Zhang, 2016. "Low-dimensional confounder adjustment and high-dimensional penalized estimation for survival analysis," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 22(4), pages 547-569, October.
    7. Xingwei Tong & Xin He & Liuquan Sun & Jianguo Sun, 2009. "Variable Selection for Panel Count Data via Non‐Concave Penalized Estimating Function," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 36(4), pages 620-635, December.
    8. Jin, Fei & Lee, Lung-fei, 2018. "Irregular N2SLS and LASSO estimation of the matrix exponential spatial specification model," Journal of Econometrics, Elsevier, vol. 206(2), pages 336-358.
    9. Hongyu An & Boping Tian, 2024. "Varying Index Coefficient Model for Tail Index Regression," Mathematics, MDPI, vol. 12(13), pages 1-35, June.
    10. Dong, Chaohua & Gao, Jiti & Linton, Oliver, 2023. "High dimensional semiparametric moment restriction models," Journal of Econometrics, Elsevier, vol. 232(2), pages 320-345.
    11. A. Belloni & V. Chernozhukov & I. Fernández‐Val & C. Hansen, 2017. "Program Evaluation and Causal Inference With High‐Dimensional Data," Econometrica, Econometric Society, vol. 85, pages 233-298, January.
    12. Hu, Jianwei & Chai, Hao, 2013. "Adjusted regularized estimation in the accelerated failure time model with high dimensional covariates," Journal of Multivariate Analysis, Elsevier, vol. 122(C), pages 96-114.
    13. Paola Stolfi & Mauro Bernardi & Lea Petrella, 2018. "The sparse method of simulated quantiles: An application to portfolio optimization," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 72(3), pages 375-398, August.
    14. Ye Yang & Osman Dogan & Suleyman Taspinar & Fei Jin, 2023. "A Review of Cross-Sectional Matrix Exponential Spatial Models," Papers 2311.14813, arXiv.org.
    15. Liang, X.; & Sanderson, E.; & Windmeijer, F.;, 2022. "Selecting Valid Instrumental Variables in Linear Models with Multiple Exposure Variables: Adaptive Lasso and the Median-of-Medians Estimator," Health, Econometrics and Data Group (HEDG) Working Papers 22/22, HEDG, c/o Department of Economics, University of York.
    16. Peng Lai & Fangjian Wang & Tingyu Zhu & Qingzhao Zhang, 2021. "Model identification and selection for single-index varying-coefficient models," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 73(3), pages 457-480, June.
    17. Bingduo Yang & Christian M. Hafner & Guannan Liu & Wei Long, 2021. "Semiparametric estimation and variable selection for single‐index copula models," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 36(7), pages 962-988, November.
    18. Feng, Sanying & He, Wenqi & Li, Feng, 2020. "Model detection and estimation for varying coefficient panel data models with fixed effects," Computational Statistics & Data Analysis, Elsevier, vol. 152(C).
    19. Cheng, Chao & Feng, Xingdong & Huang, Jian & Jiao, Yuling & Zhang, Shuang, 2022. "ℓ0-Regularized high-dimensional accelerated failure time model," Computational Statistics & Data Analysis, Elsevier, vol. 170(C).
    20. Xiao, Zhen & Zhang, Qi, 2022. "Dimension reduction for block-missing data based on sparse sliced inverse regression," Computational Statistics & Data Analysis, Elsevier, vol. 167(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:7:p:951-:d:1362481. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.