IDEAS home Printed from https://ideas.repec.org/a/taf/jnlasa/v110y2015i511p1136-1147.html
   My bibliography  Save this article

The E-MS Algorithm: Model Selection With Incomplete Data

Author

Listed:
  • Jiming Jiang
  • Thuan Nguyen
  • J. Sunil Rao

Abstract

We propose a procedure associated with the idea of the E-M algorithm for model selection in the presence of missing data. The idea extends the concept of parameters to include both the model and the parameters under the model, and thus allows the model to be part of the E-M iterations. We develop the procedure, known as the E-MS algorithm, under the assumption that the class of candidate models is finite. Some special cases of the procedure are considered, including E-MS with the generalized information criteria (GIC), and E-MS with the adaptive fence (AF; Jiang et al.). We prove numerical convergence of the E-MS algorithm as well as consistency in model selection of the limiting model of the E-MS convergence, for E-MS with GIC and E-MS with AF. We study the impact on model selection of different missing data mechanisms. Furthermore, we carry out extensive simulation studies on the finite-sample performance of the E-MS with comparisons to other procedures. The methodology is also illustrated on a real data analysis involving QTL mapping for an agricultural study on barley grains. Supplementary materials for this article are available online.

Suggested Citation

  • Jiming Jiang & Thuan Nguyen & J. Sunil Rao, 2015. "The E-MS Algorithm: Model Selection With Incomplete Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(511), pages 1136-1147, September.
  • Handle: RePEc:taf:jnlasa:v:110:y:2015:i:511:p:1136-1147
    DOI: 10.1080/01621459.2014.948545
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/01621459.2014.948545
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/01621459.2014.948545?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Samuel Müller & Alan H. Welsh, 2010. "On Model Selection Curves," International Statistical Review, International Statistical Institute, vol. 78(2), pages 240-256, August.
    2. Ibrahim, Joseph G. & Zhu, Hongtu & Tang, Niansheng, 2008. "Model Selection Criteria for Missing-Data Problems Using the EM Algorithm," Journal of the American Statistical Association, American Statistical Association, vol. 103(484), pages 1648-1658.
    3. J. G. Booth & J. P. Hobert, 1999. "Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(1), pages 265-285.
    4. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    5. Jiang, Jiming & Nguyen, Thuan & Rao, J. Sunil, 2009. "A simplified adaptive fence procedure," Statistics & Probability Letters, Elsevier, vol. 79(5), pages 625-629, March.
    6. Jiang, Jiming & Nguyen, Thuan & Rao, J. Sunil, 2011. "Best Predictive Small Area Estimation," Journal of the American Statistical Association, American Statistical Association, vol. 106(494), pages 732-745.
    7. Karl W. Broman & Terence P. Speed, 2002. "A model selection approach for the identification of quantitative trait loci in experimental crosses," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(4), pages 641-656, October.
    8. Muller, Samuel & Welsh, A.H., 2005. "Outlier Robust Model Selection in Linear Regression," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 1297-1310, December.
    9. Gerda Claeskens & Fabrizio Consentino, 2008. "Variable Selection with Incomplete Covariate Data," Biometrics, The International Biometric Society, vol. 64(4), pages 1062-1069, December.
    10. John Copas & Shinto Eguchi, 2005. "Local model uncertainty and incomplete‐data bias (with discussion)," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(4), pages 459-513, September.
    11. Howard D. Bondell & Arun Krishna & Sujit K. Ghosh, 2010. "Joint Variable Selection for Fixed and Random Effects in Linear Mixed-Effects Models," Biometrics, The International Biometric Society, vol. 66(4), pages 1069-1077, December.
    12. Schomaker, Michael & Wan, Alan T.K. & Heumann, Christian, 2010. "Frequentist Model Averaging with missing observations," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3336-3347, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Sierra A. Bainter & Thomas G. McCauley & Mahmoud M. Fahmy & Zachary T. Goodman & Lauren B. Kupis & J. Sunil Rao, 2023. "Comparing Bayesian Variable Selection to Lasso Approaches for Applications in Psychology," Psychometrika, Springer;The Psychometric Society, vol. 88(3), pages 1032-1055, September.
    2. Jiang, Wei & Josse, Julie & Lavielle, Marc, 2020. "Logistic regression with missing covariates—Parameter estimation, model selection and prediction within a joint-modeling framework," Computational Statistics & Data Analysis, Elsevier, vol. 145(C).
    3. Wei, Yuting & Wang, Qihua & Duan, Xiaogang & Qin, Jing, 2021. "Bias-corrected Kullback–Leibler distance criterion based model selection with covariables missing at random," Computational Statistics & Data Analysis, Elsevier, vol. 160(C).
    4. Zhongqi Liang & Qihua Wang & Yuting Wei, 2022. "Robust model selection with covariables missing at random," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 74(3), pages 539-557, June.
    5. Yuting Wei & Qihua Wang & Wei Liu, 2021. "Model averaging for linear models with responses missing at random," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 73(3), pages 535-553, June.
    6. María José Lombardía & Esther López‐Vizcaíno & Cristina Rueda, 2017. "Mixed generalized Akaike information criterion for small area models," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 180(4), pages 1229-1252, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Joseph G. Ibrahim & Hongtu Zhu & Ramon I. Garcia & Ruixin Guo, 2011. "Fixed and Random Effects Selection in Mixed Effects Models," Biometrics, The International Biometric Society, vol. 67(2), pages 495-503, June.
    2. Simona Buscemi & Antonella Plaia, 2020. "Model selection in linear mixed-effect models," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 104(4), pages 529-575, December.
    3. Francis K. C. Hui & Samuel Müller & A. H. Welsh, 2017. "Joint Selection in Mixed Models using Regularized PQL," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(519), pages 1323-1333, July.
    4. Zhimeng Sun & Zhi Su & Jingyi Ma, 2014. "Focused vector information criterion model selection and model averaging regression with missing response," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 77(3), pages 415-432, April.
    5. Adriano Zanin Zambom & Gregory J. Matthews, 2021. "Sure independence screening in the presence of missing data," Statistical Papers, Springer, vol. 62(2), pages 817-845, April.
    6. Zak-Szatkowska, Malgorzata & Bogdan, Malgorzata, 2011. "Modified versions of the Bayesian Information Criterion for sparse Generalized Linear Models," Computational Statistics & Data Analysis, Elsevier, vol. 55(11), pages 2908-2924, November.
    7. Ping Wu & Xinchao Luo & Peirong Xu & Lixing Zhu, 2017. "New variable selection for linear mixed-effects models," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 69(3), pages 627-646, June.
    8. Zhongqi Liang & Qihua Wang & Yuting Wei, 2022. "Robust model selection with covariables missing at random," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 74(3), pages 539-557, June.
    9. Mojtaba Ganjali & Taban Baghfalaki, 2018. "Application of Penalized Mixed Model in Identification of Genes in Yeast Cell-Cycle Gene Expression Data," Biostatistics and Biometrics Open Access Journal, Juniper Publishers Inc., vol. 6(2), pages 38-41, April.
    10. Ramon I. Garcia & Joseph G. Ibrahim & Hongtu Zhu, 2010. "Variable Selection in the Cox Regression Model with Covariates Missing at Random," Biometrics, The International Biometric Society, vol. 66(1), pages 97-104, March.
    11. Bian, Yuan & Yi, Grace Y. & He, Wenqing, 2024. "A unified framework of analyzing missing data and variable selection using regularized likelihood," Computational Statistics & Data Analysis, Elsevier, vol. 194(C).
    12. Luoying Yang & Tong Tong Wu, 2023. "Model‐based clustering of high‐dimensional longitudinal data via regularization," Biometrics, The International Biometric Society, vol. 79(2), pages 761-774, June.
    13. Peirong Xu & Lixing Zhu & Yi Li, 2014. "Ultrahigh dimensional time course feature selection," Biometrics, The International Biometric Society, vol. 70(2), pages 356-365, June.
    14. Gerhard Tutz & Gunther Schauberger, 2015. "A Penalty Approach to Differential Item Functioning in Rasch Models," Psychometrika, Springer;The Psychometric Society, vol. 80(1), pages 21-43, March.
    15. Tang, Niansheng & Xia, Linli & Yan, Xiaodong, 2019. "Feature screening in ultrahigh-dimensional partially linear models with missing responses at random," Computational Statistics & Data Analysis, Elsevier, vol. 133(C), pages 208-227.
    16. Zhang, Yan-Qing & Tian, Guo-Liang & Tang, Nian-Sheng, 2016. "Latent variable selection in structural equation models," Journal of Multivariate Analysis, Elsevier, vol. 152(C), pages 190-205.
    17. Zhao, Shangwei & Zhou, Jianhong & Li, Hongjun, 2016. "Model averaging with high-dimensional dependent data," Economics Letters, Elsevier, vol. 148(C), pages 68-71.
    18. Keiji Takai & Kenichi Hayashi, 2023. "Model Selection with Missing Data Embedded in Missing-at-Random Data," Stats, MDPI, vol. 6(2), pages 1-11, April.
    19. Yawei He & Zehua Chen, 2016. "The EBIC and a sequential procedure for feature selection in interactive linear models with high-dimensional data," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 68(1), pages 155-180, February.
    20. Nitzan Cohen & Yakir Berchenko, 2021. "Normalized Information Criteria and Model Selection in the Presence of Missing Data," Mathematics, MDPI, vol. 9(19), pages 1-23, October.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:jnlasa:v:110:y:2015:i:511:p:1136-1147. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/UASA20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.