IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v194y2024ics0167947324000033.html
   My bibliography  Save this article

A unified framework of analyzing missing data and variable selection using regularized likelihood

Author

Listed:
  • Bian, Yuan
  • Yi, Grace Y.
  • He, Wenqing

Abstract

Missing data arise commonly in applications, and research on this topic has received extensive attention in the past few decades. Various inference methods have been developed under different missing data mechanisms, including missing at random and missing not at random. The assessment of a feasible missing data mechanism is, however, difficult due to the lack of validation data. The problem is further complicated by the presence of spurious variables in covariates. Focusing on missingness in the response variable, a unified modeling scheme is proposed by utilizing the parametric generalized additive model to characterize various types of missing data processes. Taking the generalized linear model to facilitate the dependence of the response on the associated covariates, the concurrent estimation and variable selection procedures are developed using regularized likelihood, and the asymptotic properties for the resultant estimators are rigorously established. The proposed methods are appealing in their flexibility and generality; they circumvent the need of assuming a particular missing data mechanism that is required by most available methods. Empirical studies demonstrate that the proposed methods result in satisfactory performance in finite sample settings. Extensions to accommodating missingness in both the response and covariates are also discussed.

Suggested Citation

  • Bian, Yuan & Yi, Grace Y. & He, Wenqing, 2024. "A unified framework of analyzing missing data and variable selection using regularized likelihood," Computational Statistics & Data Analysis, Elsevier, vol. 194(C).
  • Handle: RePEc:eee:csdana:v:194:y:2024:i:c:s0167947324000033
    DOI: 10.1016/j.csda.2024.107919
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947324000033
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2024.107919?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Wang Miao & Eric J. Tchetgen Tchetgen, 2016. "On varieties of doubly robust estimators under missingness not at random with a shadow variable," Biometrika, Biometrika Trust, vol. 103(2), pages 475-482.
    2. Ruoxu Tan, 2023. "Nonparametric regression with nonignorable missing covariates and outcomes using bounded inverse weighting," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 35(4), pages 927-946, October.
    3. Richard M. Golden & Steven S. Henley & Halbert White & T. Michael Kashner, 2019. "Consequences of Model Misspecification for Maximum Likelihood Estimation with Missing Data," Econometrics, MDPI, vol. 7(3), pages 1-27, September.
    4. Geert Molenberghs & Caroline Beunckens & Cristina Sotto & Michael G. Kenward, 2008. "Every missingness not at random model has a missingness at random counterpart with equal fit," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(2), pages 371-388, April.
    5. Ibrahim, Joseph G. & Zhu, Hongtu & Tang, Niansheng, 2008. "Model Selection Criteria for Missing-Data Problems Using the EM Algorithm," Journal of the American Statistical Association, American Statistical Association, vol. 103(484), pages 1648-1658.
    6. P. Diggle & M. G. Kenward, 1994. "Informative Drop‐Out in Longitudinal Data Analysis," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 43(1), pages 49-73, March.
    7. Zhang, Yiyun & Li, Runze & Tsai, Chih-Ling, 2010. "Regularization Parameter Selections via Generalized Information Criterion," Journal of the American Statistical Association, American Statistical Association, vol. 105(489), pages 312-323.
    8. Jun Shao & Lei Wang, 2016. "Semiparametric inverse propensity weighting for nonignorable missing data," Biometrika, Biometrika Trust, vol. 103(1), pages 175-187.
    9. A. Qu & G. Y. Yi & P. X.-K. Song & P. Wang, 2011. "Assessing the validity of weighted generalized estimating equations," Biometrika, Biometrika Trust, vol. 98(1), pages 215-224.
    10. Mee Young Park & Trevor Hastie, 2007. "L1‐regularization path algorithm for generalized linear models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 69(4), pages 659-677, September.
    11. Chen, Qingxia & Ibrahim, Joseph G. & Chen, Ming-Hui & Senchaudhuri, Pralay, 2008. "Theory and inference for regression models with missing responses and covariates," Journal of Multivariate Analysis, Elsevier, vol. 99(6), pages 1302-1331, July.
    12. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    13. Yang Ning & Grace Yi & Nancy Reid, 2018. "A Class of Weighted Estimating Equations for Semiparametric Transformation Models with Missing Covariates," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 45(1), pages 87-109, March.
    14. Xuerong Chen & Guoqing Diao & Jing Qin, 2020. "Pseudo likelihood‐based estimation and testing of missingness mechanism function in nonignorable missing data problems," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 47(4), pages 1377-1400, December.
    15. Chen, Baojiang & Yi, Grace Y. & Cook, Richard J., 2010. "Weighted Generalized Estimating Functions for Longitudinal Response and Covariate Data That Are Missing at Random," Journal of the American Statistical Association, American Statistical Association, vol. 105(489), pages 336-353.
    16. repec:mpr:mprres:8160 is not listed on IDEAS
    17. Jiwei Zhao & Yanyuan Ma, 2022. "A Versatile Estimation Procedure Without Estimating the Nonignorable Missingness Mechanism," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 117(540), pages 1916-1930, October.
    18. Lyall, Jason, 2010. "Do Democracies Make Inferior Counterinsurgents? Reassessing Democracy's Impact on War Outcomes and Duration," International Organization, Cambridge University Press, vol. 64(1), pages 167-192, January.
    19. d'Haultfoeuille, Xavier, 2010. "A new instrumental method for dealing with endogenous selection," Journal of Econometrics, Elsevier, vol. 154(1), pages 1-15, January.
    20. Hansheng Wang & Runze Li & Chih-Ling Tsai, 2007. "Tuning parameter selectors for the smoothly clipped absolute deviation method," Biometrika, Biometrika Trust, vol. 94(3), pages 553-568.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Fei Jin & Lung-fei Lee, 2018. "Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices," Econometrics, MDPI, vol. 6(1), pages 1-24, February.
    2. Ramon I. Garcia & Joseph G. Ibrahim & Hongtu Zhu, 2010. "Variable Selection in the Cox Regression Model with Covariates Missing at Random," Biometrics, The International Biometric Society, vol. 66(1), pages 97-104, March.
    3. Jin, Fei & Lee, Lung-fei, 2018. "Irregular N2SLS and LASSO estimation of the matrix exponential spatial specification model," Journal of Econometrics, Elsevier, vol. 206(2), pages 336-358.
    4. Alan T. K. Wan & Jinhong You & Riquan Zhang, 2016. "A Seemingly Unrelated Nonparametric Additive Model with Autoregressive Errors," Econometric Reviews, Taylor & Francis Journals, vol. 35(5), pages 894-928, May.
    5. Qian, Junhui & Su, Liangjun, 2016. "Shrinkage estimation of common breaks in panel data models via adaptive group fused Lasso," Journal of Econometrics, Elsevier, vol. 191(1), pages 86-109.
    6. Yuta Umezu & Yusuke Shimizu & Hiroki Masuda & Yoshiyuki Ninomiya, 2019. "AIC for the non-concave penalized likelihood method," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 71(2), pages 247-274, April.
    7. Francis K. C. Hui & Samuel Müller & A. H. Welsh, 2017. "Joint Selection in Mixed Models using Regularized PQL," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(519), pages 1323-1333, July.
    8. Bindele, Huybrechts F. & Nguelifack, Brice M., 2019. "Generalized signed-rank estimation for regression models with non-ignorable missing responses," Computational Statistics & Data Analysis, Elsevier, vol. 139(C), pages 14-33.
    9. Weihua Zhao & Riquan Zhang & Yazhao Lv & Jicai Liu, 2014. "Variable selection for varying dispersion beta regression model," Journal of Applied Statistics, Taylor & Francis Journals, vol. 41(1), pages 95-108, January.
    10. Yingying Fan & Cheng Yong Tang, 2013. "Tuning parameter selection in high dimensional penalized likelihood," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 75(3), pages 531-552, June.
    11. Fan, Rui & Lee, Ji Hyung & Shin, Youngki, 2023. "Predictive quantile regression with mixed roots and increasing dimensions: The ALQR approach," Journal of Econometrics, Elsevier, vol. 237(2).
    12. Joseph G. Ibrahim & Hongtu Zhu & Ramon I. Garcia & Ruixin Guo, 2011. "Fixed and Random Effects Selection in Mixed Effects Models," Biometrics, The International Biometric Society, vol. 67(2), pages 495-503, June.
    13. Ling Zhou & Huazhen Lin & Xinyuan Song & Yi Li, 2014. "Selection of Latent Variables for Multiple Mixed-outcome Models," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 41(4), pages 1064-1082, December.
    14. Xin Cheng & Wenbin Lu & Mengling Liu, 2015. "Identification of homogeneous and heterogeneous variables in pooled cohort studies," Biometrics, The International Biometric Society, vol. 71(2), pages 397-403, June.
    15. Ping Zeng & Yongyue Wei & Yang Zhao & Jin Liu & Liya Liu & Ruyang Zhang & Jianwei Gou & Shuiping Huang & Feng Chen, 2014. "Variable selection approach for zero-inflated count data via adaptive lasso," Journal of Applied Statistics, Taylor & Francis Journals, vol. 41(4), pages 879-894, April.
    16. Shohoudi, Azadeh & Khalili, Abbas & Wolfson, David B. & Asgharian, Masoud, 2016. "Simultaneous variable selection and de-coarsening in multi-path change-point models," Journal of Multivariate Analysis, Elsevier, vol. 147(C), pages 202-217.
    17. Lei Wang & Wei Ma, 2021. "Improved empirical likelihood inference and variable selection for generalized linear models with longitudinal nonignorable dropouts," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 73(3), pages 623-647, June.
    18. Eun Ryung Lee & Hohsuk Noh & Byeong U. Park, 2014. "Model Selection via Bayesian Information Criterion for Quantile Regression Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(505), pages 216-229, March.
    19. Mehrabani, Ali, 2023. "Estimation and identification of latent group structures in panel data," Journal of Econometrics, Elsevier, vol. 235(2), pages 1464-1482.
    20. Shonosuke Sugasawa & Kosuke Morikawa & Keisuke Takahata, 2022. "Bayesian semiparametric modeling of response mechanism for nonignorable missing data," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(1), pages 101-117, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:194:y:2024:i:c:s0167947324000033. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.