IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v9y2021i19p2474-d649347.html
   My bibliography  Save this article

Normalized Information Criteria and Model Selection in the Presence of Missing Data

Author

Listed:
  • Nitzan Cohen

    (Department of Industrial Engineering and Management, Ben-Gurion University of the Negev, P.O. Box 653, Beer-Sheva 84105, Israel)

  • Yakir Berchenko

    (Department of Industrial Engineering and Management, Ben-Gurion University of the Negev, P.O. Box 653, Beer-Sheva 84105, Israel)

Abstract

Information criteria such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC) are commonly used for model selection. However, the current theory does not support unconventional data, so naive use of these criteria is not suitable for data with missing values. Imputation, at the core of most alternative methods, is both distorted as well as computationally demanding. We propose a new approach that enables the use of classic well-known information criteria for model selection when there are missing data. We adapt the current theory of information criteria through normalization, accounting for the different sample sizes used for each candidate model (focusing on AIC and BIC). Interestingly, when the sample sizes are different, our theoretical analysis finds that A I C j / n j is the proper correction for A I C j that we need to optimize (where n j is the sample size available to the j th model) while − ( B I C j − B I C i ) / ( n j − n i ) is the correction of BIC. Furthermore, we find that the computational complexity of normalized information criteria methods is exponentially better than that of imputation methods. In a series of simulation studies, we find that normalized-AIC and normalized-BIC outperform previous methods (i.e., normalized-AIC is more efficient, and normalized BIC includes only important variables, although it tends to exclude some of them in cases of large correlation). We propose three additional methods aimed at increasing the statistical efficiency of normalized-AIC: post-selection imputation , Akaike sub-model averaging , and minimum-variance averaging . The latter succeeds in increasing efficiency further.

Suggested Citation

  • Nitzan Cohen & Yakir Berchenko, 2021. "Normalized Information Criteria and Model Selection in the Presence of Missing Data," Mathematics, MDPI, vol. 9(19), pages 1-23, October.
  • Handle: RePEc:gam:jmathe:v:9:y:2021:i:19:p:2474-:d:649347
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/9/19/2474/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/9/19/2474/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Marco Doretti & Sara Geneletti & Elena Stanghellini, 2018. "Missing Data: A Unified Taxonomy Guided by Conditional Independence," International Statistical Review, International Statistical Institute, vol. 86(2), pages 189-204, August.
    2. Schomaker, Michael & Heumann, Christian, 2014. "Model selection and model averaging after multiple imputation," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 758-770.
    3. Xiaowei Yang & Thomas R. Belin & W. John Boscardin, 2005. "Imputation and Variable Selection in Linear Regression Models with Missing Covariates," Biometrics, The International Biometric Society, vol. 61(2), pages 498-506, June.
    4. Gerda Claeskens & Fabrizio Consentino, 2008. "Variable Selection with Incomplete Covariate Data," Biometrics, The International Biometric Society, vol. 64(4), pages 1062-1069, December.
    5. Zeugner, Stefan & Feldkircher, Martin, 2015. "Bayesian Model Averaging Employing Fixed and Flexible Priors: The BMS Package for R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 68(i04).
    6. Schomaker, Michael & Wan, Alan T.K. & Heumann, Christian, 2010. "Frequentist Model Averaging with missing observations," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3336-3347, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jiming Jiang & Thuan Nguyen & J. Sunil Rao, 2015. "The E-MS Algorithm: Model Selection With Incomplete Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(511), pages 1136-1147, September.
    2. Zhimeng Sun & Zhi Su & Jingyi Ma, 2014. "Focused vector information criterion model selection and model averaging regression with missing response," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 77(3), pages 415-432, April.
    3. Adriano Zanin Zambom & Gregory J. Matthews, 2021. "Sure independence screening in the presence of missing data," Statistical Papers, Springer, vol. 62(2), pages 817-845, April.
    4. Schomaker, Michael & Heumann, Christian, 2014. "Model selection and model averaging after multiple imputation," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 758-770.
    5. Hai Wang & Xinjie Chen & Nancy Flournoy, 2016. "The focused information criterion for varying-coefficient partially linear measurement error models," Statistical Papers, Springer, vol. 57(1), pages 99-113, March.
    6. Riccardo (Jack) Lucchetti & Luca Pedini, 2020. "ParMA: Parallelised Bayesian Model Averaging for Generalised Linear Models," Working Papers 2020:28, Department of Economics, University of Venice "Ca' Foscari".
    7. Anna Sokolova, 2023. "Marginal Propensity to Consume and Unemployment: a Meta-analysis," Review of Economic Dynamics, Elsevier for the Society for Economic Dynamics, vol. 51, pages 813-846, December.
    8. Zhongqi Liang & Qihua Wang & Yuting Wei, 2022. "Robust model selection with covariables missing at random," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 74(3), pages 539-557, June.
    9. Schomaker Michael & Heumann Christian, 2011. "Model Averaging in Factor Analysis: An Analysis of Olympic Decathlon Data," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 7(1), pages 1-15, January.
    10. Janus, Jakub, 2021. "The COVID-19 shock and long-term interest rates in emerging market economies," Finance Research Letters, Elsevier, vol. 43(C).
    11. Rajeev K. Goel & James W. Saunoris, 2020. "A Replication of “Sorting through Global Corruption Determinants: Institutions and Education Matter—Not Culture†(World Development 2018)," Public Finance Review, , vol. 48(4), pages 538-567, July.
    12. Morvillier, Florian, 2020. "Do currency undervaluations affect the impact of inflation on growth?," Economic Modelling, Elsevier, vol. 84(C), pages 275-292.
    13. Mark F. J. Steel, 2020. "Model Averaging and Its Use in Economics," Journal of Economic Literature, American Economic Association, vol. 58(3), pages 644-719, September.
    14. Roman Horvath & Ali Elminejad & Tomas Havranek, 2020. "Publication and Identification Biases in Measuring the Intertemporal Substitution of Labor Supply," Working Papers IES 2020/32, Charles University Prague, Faculty of Social Sciences, Institute of Economic Studies, revised Sep 2020.
    15. Tomas Havranek & Zuzana Irsova & Lubica Laslopova & Olesia Zeynalova, 2020. "Skilled and Unskilled Labor Are Less Substitutable than Commonly Thought," Working Papers IES 2020/29, Charles University Prague, Faculty of Social Sciences, Institute of Economic Studies, revised Sep 2020.
    16. Gric, Zuzana & Bajzík, Josef & Badura, Ondřej, 2023. "Does sentiment affect stock returns? A meta-analysis across survey-based measures," International Review of Financial Analysis, Elsevier, vol. 89(C).
    17. Njindan Iyke, Bernard, 2015. "Macro Determinants of the Real Exchange Rate in a Small Open Small Island Economy: Evidence from Mauritius via BMA," MPRA Paper 68968, University Library of Munich, Germany.
    18. Dražanová, Lenka & Gonnot, Jérôme & Heidland, Tobias & Krüger, Finja, 2022. "Understanding differences in attitudes to immigration: A meta-analysis of individual-level factors," Kiel Working Papers 2235, Kiel Institute for the World Economy (IfW Kiel).
    19. Elminejad, Ali & Havranek, Tomas & Irsova, Zuzana, 2022. "Relative Risk Aversion: A Meta-Analysis," MetaArXiv b8uhe, Center for Open Science.
    20. Petr Cala & Tomas Havranek & Zuzana Irsova & Jindrich Matousek & Jiri Novak, 2022. "Financial Incentives and Performance: A Meta-Analysis of Economics Evidence," Working Papers IES 2022/27, Charles University Prague, Faculty of Social Sciences, Institute of Economic Studies, revised Nov 2022.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:9:y:2021:i:19:p:2474-:d:649347. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.