IDEAS home Printed from https://ideas.repec.org/a/eee/jmvana/v196y2023ics0047259x23000210.html
   My bibliography  Save this article

Semiparametric penalized quadratic inference functions for longitudinal data in ultra-high dimensions

Author

Listed:
  • Green, Brittany
  • Lian, Heng
  • Yu, Yan
  • Zu, Tianhai

Abstract

In many biomedical and health studies, multivariate data arise from repeated measurements on a sample of subjects over time. In order to analyze such longitudinal data, we need to consider the correlations from the same subject, and it is inappropriate to use a simple multivariate model assuming independence structure. Motivated by a large scale longitudinal public health study that requires longitudinal data analysis with correlated multivariate discrete responses from repeated measurements and very high dimensional covariates, we adopt a flexible semiparametric approach for simultaneous variable selection and estimation without the requirement of specifying the full likelihood. Specifically, we propose generalized partially linear single-index models using penalized quadratic inference functions for longitudinal data in ultra-high dimensions. A key feature is that we allow the number of single-index covariates in the nonparametric term to diverge and even to be in ultra-high dimensions. The penalized quadratic inference functions easily incorporate within-subject correlation and pursue efficient estimation, and the single-index models can incorporate nonlinearity and some interactions while avoiding the curse of dimensionality. In this challenging setting, we contribute both an efficient algorithm and new asymptotic theory for our proposed approach for diverging and even ultra-dimensional covariates and a multivariate correlated response in longitudinal data. We apply our method to investigate diabetes status within a continuing longitudinal public health study with very high-dimensional genetic variables and phenotype variables.

Suggested Citation

  • Green, Brittany & Lian, Heng & Yu, Yan & Zu, Tianhai, 2023. "Semiparametric penalized quadratic inference functions for longitudinal data in ultra-high dimensions," Journal of Multivariate Analysis, Elsevier, vol. 196(C).
  • Handle: RePEc:eee:jmvana:v:196:y:2023:i:c:s0047259x23000210
    DOI: 10.1016/j.jmva.2023.105175
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0047259X23000210
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.jmva.2023.105175?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Hansheng Wang & Bo Li & Chenlei Leng, 2009. "Shrinkage tuning parameter selection with a diverging number of parameters," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(3), pages 671-683, June.
    2. Hansen, Lars Peter, 1982. "Large Sample Properties of Generalized Method of Moments Estimators," Econometrica, Econometric Society, vol. 50(4), pages 1029-1054, July.
    3. Jianhui Zhou & Annie Qu, 2012. "Informative Estimation and Selection of Correlation Structure for Longitudinal Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(498), pages 701-710, June.
    4. Lan Wang & Jianhui Zhou & Annie Qu, 2012. "Penalized Generalized Estimating Equations for High-Dimensional Longitudinal Data Analysis," Biometrics, The International Biometric Society, vol. 68(2), pages 353-360, June.
    5. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    6. Bai, Yang & Fung, Wing K. & Zhu, Zhong Yi, 2009. "Penalized quadratic inference functions for single-index models with longitudinal data," Journal of Multivariate Analysis, Elsevier, vol. 100(1), pages 152-161, January.
    7. Lihua Cai & Honglong Wu & Dongfang Li & Ke Zhou & Fuhao Zou, 2015. "Type 2 Diabetes Biomarkers of Human Gut Microbiota Selected via Iterative Sure Independent Screening Method," PLOS ONE, Public Library of Science, vol. 10(10), pages 1-15, October.
    8. Xuming He, 2002. "Estimation in a semiparametric model for longitudinal data with unspecified dependence structure," Biometrika, Biometrika Trust, vol. 89(3), pages 579-590, August.
    9. Lai, Peng & Li, Gaorong & Lian, Heng, 2013. "Quadratic inference functions for partially linear single-index models with longitudinal data," Journal of Multivariate Analysis, Elsevier, vol. 118(C), pages 115-127.
    10. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    11. Annie Qu & Runze Li, 2006. "Quadratic Inference Functions for Varying-Coefficient Models with Longitudinal Data," Biometrics, The International Biometric Society, vol. 62(2), pages 379-391, June.
    12. Peng Wang & Guei-feng Tsai & Annie Qu, 2012. "Conditional Inference Functions for Mixed-Effects Models With Unspecified Random-Effects Distribution," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(498), pages 725-736, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Geng, Shuli & Zhang, Lixin, 2024. "Decorrelated empirical likelihood for generalized linear models with high-dimensional longitudinal data," Statistics & Probability Letters, Elsevier, vol. 211(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lei Wang & Wei Ma, 2021. "Improved empirical likelihood inference and variable selection for generalized linear models with longitudinal nonignorable dropouts," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 73(3), pages 623-647, June.
    2. Ma, Shujie & Liang, Hua & Tsai, Chih-Ling, 2014. "Partially linear single index models for repeated measurements," Journal of Multivariate Analysis, Elsevier, vol. 130(C), pages 354-375.
    3. Rui Li & Chenlei Leng & Jinhong You, 2017. "A Semiparametric Regression Model for Longitudinal Data with Non-stationary Errors," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 44(4), pages 932-950, December.
    4. Zhang, Ting & Wang, Lei, 2020. "Smoothed empirical likelihood inference and variable selection for quantile regression with nonignorable missing response," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    5. Guang Cheng & Hao Zhang & Zuofeng Shang, 2015. "Sparse and efficient estimation for partial spline models with increasing dimension," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 67(1), pages 93-127, February.
    6. Huicong Yu & Jiaqi Wu & Weiping Zhang, 2024. "Simultaneous subgroup identification and variable selection for high dimensional data," Computational Statistics, Springer, vol. 39(6), pages 3181-3205, September.
    7. Brittany Green & Heng Lian & Yan Yu & Tianhai Zu, 2021. "Ultra high‐dimensional semiparametric longitudinal data analysis," Biometrics, The International Biometric Society, vol. 77(3), pages 903-913, September.
    8. Xiaochao Xia & Binyan Jiang & Jialiang Li & Wenyang Zhang, 2016. "Low-dimensional confounder adjustment and high-dimensional penalized estimation for survival analysis," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 22(4), pages 547-569, October.
    9. Fan, Jianqing & Liao, Yuan, 2012. "Endogeneity in ultrahigh dimension," MPRA Paper 38698, University Library of Munich, Germany.
    10. Jin, Fei & Lee, Lung-fei, 2018. "Irregular N2SLS and LASSO estimation of the matrix exponential spatial specification model," Journal of Econometrics, Elsevier, vol. 206(2), pages 336-358.
    11. Zhang, Shucong & Zhou, Yong, 2018. "Variable screening for ultrahigh dimensional heterogeneous data via conditional quantile correlations," Journal of Multivariate Analysis, Elsevier, vol. 165(C), pages 1-13.
    12. Fei Wang & Lu Wang & Peter X.‐K. Song, 2016. "Fused lasso with the adaptation of parameter ordering in combining multiple studies with repeated measurements," Biometrics, The International Biometric Society, vol. 72(4), pages 1184-1193, December.
    13. Joel L. Horowitz, 2015. "Variable selection and estimation in high-dimensional models," CeMMAP working papers 35/15, Institute for Fiscal Studies.
    14. Huiwen Wang & Ruiping Liu & Shanshan Wang & Zhichao Wang & Gilbert Saporta, 2020. "Ultra-high dimensional variable screening via Gram–Schmidt orthogonalization," Computational Statistics, Springer, vol. 35(3), pages 1153-1170, September.
    15. Zheng, Xueying & Xue, Lan & Qu, Annie, 2018. "Time-varying correlation structure estimation and local-feature detection for spatio-temporal data," Journal of Multivariate Analysis, Elsevier, vol. 168(C), pages 221-239.
    16. Hou, Zhaohan & Wang, Lei, 2024. "Heterogeneous quantile regression for longitudinal data with subgroup structures," Computational Statistics & Data Analysis, Elsevier, vol. 194(C).
    17. Joel L. Horowitz, 2015. "Variable selection and estimation in high‐dimensional models," Canadian Journal of Economics/Revue canadienne d'économique, John Wiley & Sons, vol. 48(2), pages 389-407, May.
    18. Lai, Peng & Li, Gaorong & Lian, Heng, 2013. "Quadratic inference functions for partially linear single-index models with longitudinal data," Journal of Multivariate Analysis, Elsevier, vol. 118(C), pages 115-127.
    19. Mozhgan Taavoni & Mohammad Arashi & Samuel Manda, 2023. "Multicollinearity and Linear Predictor Link Function Problems in Regression Modelling of Longitudinal Data," Mathematics, MDPI, vol. 11(3), pages 1-9, January.
    20. Lian, Heng & Li, Jianbo & Tang, Xingyu, 2014. "SCAD-penalized regression in additive partially linear proportional hazards models with an ultra-high-dimensional linear part," Journal of Multivariate Analysis, Elsevier, vol. 125(C), pages 50-64.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:jmvana:v:196:y:2023:i:c:s0047259x23000210. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/622892/description#description .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.