IDEAS home Printed from https://ideas.repec.org/a/gam/jecnmx/v7y2019i3p37-d264548.html
   My bibliography  Save this article

Consequences of Model Misspecification for Maximum Likelihood Estimation with Missing Data

Author

Listed:
  • Richard M. Golden

    (School of Behavioral and Brain Sciences, GR4.1, 800 W. Campbell Rd., University of Texas at Dallas, Richardson, TX 75080, USA)

  • Steven S. Henley

    (Martingale Research Corporation, 101 E. Park Blvd., Suite 600, Plano, TX 75074, USA
    Department of Medicine, Loma Linda University School of Medicine, Loma Linda, CA 92357, USA
    Center for Advanced Statistics in Education, VA Loma Linda Healthcare System, Loma Linda, CA 92357, USA)

  • Halbert White

    (Department of Economics, University of California San Diego, La Jolla, CA 92093, USA
    Halbert White sadly passed away before this article was published.)

  • T. Michael Kashner

    (Department of Medicine, Loma Linda University School of Medicine, Loma Linda, CA 92357, USA
    Center for Advanced Statistics in Education, VA Loma Linda Healthcare System, Loma Linda, CA 92357, USA
    Office of Academic Affiliations (10X1), Department of Veterans Affairs, 810 Vermont Ave. NW, Washington, DC 20420, USA)

Abstract

Researchers are often faced with the challenge of developing statistical models with incomplete data. Exacerbating this situation is the possibility that either the researcher’s complete-data model or the model of the missing-data mechanism is misspecified. In this article, we create a formal theoretical framework for developing statistical models and detecting model misspecification in the presence of incomplete data where maximum likelihood estimates are obtained by maximizing the observable-data likelihood function when the missing-data mechanism is assumed ignorable. First, we provide sufficient regularity conditions on the researcher’s complete-data model to characterize the asymptotic behavior of maximum likelihood estimates in the simultaneous presence of both missing data and model misspecification. These results are then used to derive robust hypothesis testing methods for possibly misspecified models in the presence of Missing at Random (MAR) or Missing Not at Random (MNAR) missing data. Second, we introduce a method for the detection of model misspecification in missing data problems using recently developed Generalized Information Matrix Tests (GIMT). Third, we identify regularity conditions for the Missing Information Principle (MIP) to hold in the presence of model misspecification so as to provide useful computational covariance matrix estimation formulas. Fourth, we provide regularity conditions that ensure the observable-data expected negative log-likelihood function is convex in the presence of partially observable data when the amount of missingness is sufficiently small and the complete-data likelihood is convex. Fifth, we show that when the researcher has correctly specified a complete-data model with a convex negative likelihood function and an ignorable missing-data mechanism, then its strict local minimizer is the true parameter value for the complete-data model when the amount of missingness is sufficiently small. Our results thus provide new robust estimation, inference, and specification analysis methods for developing statistical models with incomplete data.

Suggested Citation

  • Richard M. Golden & Steven S. Henley & Halbert White & T. Michael Kashner, 2019. "Consequences of Model Misspecification for Maximum Likelihood Estimation with Missing Data," Econometrics, MDPI, vol. 7(3), pages 1-27, September.
  • Handle: RePEc:gam:jecnmx:v:7:y:2019:i:3:p:37-:d:264548
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2225-1146/7/3/37/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2225-1146/7/3/37/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Jin Seo Cho & Halbert White, 2014. "Notations in "Testing the Equality of Two Positive-Definite Matrices with Application to Information Matrix Testing" by Cho and White (2014)," Working papers 2014rwp-67a, Yonsei University, Yonsei Economics Research Institute.
    2. Verbeke, Geert & Lesaffre, Emmanuel, 1997. "The effect of misspecifying the random-effects distribution in linear mixed models for longitudinal data," Computational Statistics & Data Analysis, Elsevier, vol. 23(4), pages 541-556, February.
    3. Wanling Huang & Artem Prokhorov, 2014. "A Goodness-of-fit Test for Copulas," Econometric Reviews, Taylor & Francis Journals, vol. 33(7), pages 751-771, October.
    4. Andrzej S. Kosinski & Huiman X. Barnhart, 2003. "Accounting for Nonignorable Verification Bias in Assessment of Diagnostic Tests," Biometrics, The International Biometric Society, vol. 59(1), pages 163-171, March.
    5. White, Halbert, 1982. "Maximum Likelihood Estimation of Misspecified Models," Econometrica, Econometric Society, vol. 50(1), pages 1-25, January.
    6. Gourieroux, Christian & Monfort, Alain & Trognon, Alain, 1984. "Pseudo Maximum Likelihood Methods: Theory," Econometrica, Econometric Society, vol. 52(3), pages 681-700, May.
    7. Rhoads Christopher H., 2012. "Problems with Tests of the Missingness Mechanism in Quantitative Policy Studies," Statistics, Politics and Policy, De Gruyter, vol. 3(1), pages 1-25, March.
    8. White, Halbert, 1980. "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity," Econometrica, Econometric Society, vol. 48(4), pages 817-838, May.
    9. Wooldridge, Jeffrey M., 2007. "Inverse probability weighted estimation for general missing data problems," Journal of Econometrics, Elsevier, vol. 141(2), pages 1281-1301, December.
    10. M. Jamshidian & R. I. Jennrich, 2000. "Standard errors for EM estimation," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 62(2), pages 257-270.
    11. Jason Abrevaya & Stephen G. Donald, 2017. "A GMM Approach for Dealing with Missing Data on Regressors," The Review of Economics and Statistics, MIT Press, vol. 99(4), pages 657-662, July.
    12. Ernst R. Berndt & Bronwyn H. Hall & Robert E. Hall & Jerry A. Hausman, 1974. "Estimation and Inference in Nonlinear Structural Models," NBER Chapters, in: Annals of Economic and Social Measurement, Volume 3, number 4, pages 653-665, National Bureau of Economic Research, Inc.
    13. King, Gary & Honaker, James & Joseph, Anne & Scheve, Kenneth, 2001. "Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation," American Political Science Review, Cambridge University Press, vol. 95(1), pages 49-69, March.
    14. James W. Hardin, 2003. "The Sandwich Estimate Of Variance," Advances in Econometrics, in: Maximum Likelihood Estimation of Misspecified Models: Twenty Years Later, pages 45-73, Emerald Group Publishing Limited.
    15. Breunig, Christoph, 2017. "Testing Missing At Random Using Instrumental Variables," Rationality and Competition Discussion Paper Series 59, CRC TRR 190 Rationality and Competition.
    16. Geert Molenberghs & Caroline Beunckens & Cristina Sotto & Michael G. Kenward, 2008. "Every missingness not at random model has a missingness at random counterpart with equal fit," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(2), pages 371-388, April.
    17. Bo-Cheng Wei & Jian-Qing Shi & Wing-Kam Fung & Yue-Qing Hu, 1998. "Testing for Varying Dispersion in Exponential Family Nonlinear Models," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 50(2), pages 277-294, June.
    18. White,Halbert, 1996. "Estimation, Inference and Specification Analysis," Cambridge Books, Cambridge University Press, number 9780521574464.
    19. Artem Prokhorov & Ulf Schepsmeier & Yajing Zhu, 2019. "Generalized information matrix tests for copulas," Econometric Reviews, Taylor & Francis Journals, vol. 38(9), pages 1024-1054, October.
    20. McDonough, Ian K. & Millimet, Daniel L., 2017. "Missing data, imputation, and endogeneity," Journal of Econometrics, Elsevier, vol. 199(2), pages 141-155.
    21. R. Golden, 2003. "Discrepancy Risk Model Selection Test theory for comparing possibly misspecified or nonnested models," Psychometrika, Springer;The Psychometric Society, vol. 68(2), pages 229-249, June.
    22. Christoph Breunig, 2019. "Testing Missing at Random Using Instrumental Variables," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 37(2), pages 223-234, April.
    23. Breunig, Christoph, 2017. "Testing missing at random using instrumental variables," SFB 649 Discussion Papers 2017-007, Humboldt University Berlin, Collaborative Research Center 649: Economic Risk.
    24. Cho, Jin Seo & Phillips, Peter C.B., 2018. "Pythagorean generalization of testing the equality of two symmetric positive definite matrices," Journal of Econometrics, Elsevier, vol. 202(1), pages 45-56.
    25. J. Isaac Miller, 2010. "Cointegrating regressions with messy regressors and an application to mixed‐frequency series," Journal of Time Series Analysis, Wiley Blackwell, vol. 31(4), pages 255-277, July.
    26. David Clayton & David Spiegelhalter & Graham Dunn & Andrew Pickles, 1998. "Analysis of longitudinal binary data from multiphase sampling," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 60(1), pages 71-87.
    27. Joseph G. Ibrahim & Ming-Hui Chen & Stuart R. Lipsitz & Amy H. Herring, 2005. "Missing-Data Methods for Generalized Linear Models: A Comparative Review," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 332-346, March.
    28. Yuan, Ke-Hai, 2009. "Normal distribution based pseudo ML for missing data: With applications to mean and covariance structure analysis," Journal of Multivariate Analysis, Elsevier, vol. 100(9), pages 1900-1918, October.
    29. Hua Yun Chen, 2004. "Nonparametric and Semiparametric Models for Missing Covariates in Parametric Regression," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 1176-1189, December.
    30. Xiaohong Chen & Norman R. Swanson (ed.), 2013. "Recent Advances and Future Directions in Causality, Prediction, and Specification Analysis," Springer Books, Springer, edition 127, number 978-1-4614-1653-1, December.
    31. Richard M. Golden & Steven S. Henley & Halbert White & T. Michael Kashner, 2016. "Generalized Information Matrix Tests for Detecting Model Misspecification," Econometrics, MDPI, vol. 4(4), pages 1-24, November.
    32. J. G. Ibrahim & S. R. Lipsitz & M.‐H. Chen, 1999. "Missing covariates in generalized linear models when the missing data mechanism is non‐ignorable," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(1), pages 173-190.
    33. Schepsmeier, Ulf, 2015. "Efficient information based goodness-of-fit tests for vine copula models with fixed margins: A comprehensive review," Journal of Multivariate Analysis, Elsevier, vol. 138(C), pages 34-52.
    34. Jin Seo Cho & Halbert White, 2014. "Testing the Equality of Two Positive-Definite Matrices with Application to Information Matrix Testing," Working papers 2014rwp-67, Yonsei University, Yonsei Economics Research Institute.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Bian, Yuan & Yi, Grace Y. & He, Wenqing, 2024. "A unified framework of analyzing missing data and variable selection using regularized likelihood," Computational Statistics & Data Analysis, Elsevier, vol. 194(C).
    2. Wenbo Ren & Xinran Bian & Jiayuan Gong & Anqing Chen & Ming Li & Zhuofei Xia & Jingnan Wang, 2022. "Analysis and Visualization of New Energy Vehicle Battery Data," Future Internet, MDPI, vol. 14(8), pages 1-16, July.
    3. Chih-Wen Hsiao & Ya-Chuan Chan & Mei-Yu Lee & Hsi-Peng Lu, 2021. "Heteroscedasticity and Precise Estimation Model Approach for Complex Financial Time-Series Data: An Example of Taiwan Stock Index Futures before and during COVID-19," Mathematics, MDPI, vol. 9(21), pages 1-18, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Richard M. Golden & Steven S. Henley & Halbert White & T. Michael Kashner, 2016. "Generalized Information Matrix Tests for Detecting Model Misspecification," Econometrics, MDPI, vol. 4(4), pages 1-24, November.
    2. Lijuan Huo & Jin Seo Cho, 2021. "Testing for the sandwich-form covariance matrix of the quasi-maximum likelihood estimator," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(2), pages 293-317, June.
    3. Jin Seo Cho & Peter C.B. Phillips, "undated". "Testing Equality of Covariance Matrices via Pythagorean Means," Cowles Foundation Discussion Papers 1970, Cowles Foundation for Research in Economics, Yale University.
    4. Cho, Jin Seo & Phillips, Peter C.B., 2018. "Pythagorean generalization of testing the equality of two symmetric positive definite matrices," Journal of Econometrics, Elsevier, vol. 202(1), pages 45-56.
    5. MacKinnon, James G. & Nielsen, Morten Ørregaard & Webb, Matthew D., 2023. "Testing for the appropriate level of clustering in linear regression models," Journal of Econometrics, Elsevier, vol. 235(2), pages 2027-2056.
    6. Jin Seo Cho & Peter C. B. Phillips & Juwon Seo, 2022. "Parametric Conditional Mean Inference With Functional Data Applied To Lifetime Income Curves," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 63(1), pages 391-456, February.
    7. Bang, Minji & Gao, Wayne Yuan & Postlewaite, Andrew & Sieg, Holger, 2023. "Using monotonicity restrictions to identify models with partially latent covariates," Journal of Econometrics, Elsevier, vol. 235(2), pages 892-921.
    8. Gabriele Fiorentini & Enrique Sentana, 2021. "Specification tests for non‐Gaussian maximum likelihood estimators," Quantitative Economics, Econometric Society, vol. 12(3), pages 683-742, July.
    9. de Luna, Xavier & Johansson, Per, 2006. "Exogeneity in structural equation models," Journal of Econometrics, Elsevier, vol. 132(2), pages 527-543, June.
    10. Gouriéroux, Christian, 1994. "Modèles économétriques : utilisation et interprétation (les)," CEPREMAP Working Papers (Couverture Orange) 9423, CEPREMAP.
    11. White, Halbert & Pettenuzzo, Davide, 2014. "Granger causality, exogeneity, cointegration, and economic policy analysis," Journal of Econometrics, Elsevier, vol. 178(P2), pages 316-330.
    12. Michael R. Baye & J. Rupert J. Gatti & Paul Kattuman & John Morgan, 2009. "Clicks, Discontinuities, and Firm Demand Online," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 18(4), pages 935-975, December.
    13. Demos, Antonis & Sentana, Enrique, 1998. "Testing for GARCH effects: a one-sided approach," Journal of Econometrics, Elsevier, vol. 86(1), pages 97-127, June.
    14. Fiorentini, Gabriele & Calzolari, Giorgio & Panattoni, Lorenzo, 1996. "Analytic Derivatives and the Computation of GARCH Estimates," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 11(4), pages 399-417, July-Aug..
    15. Calzolari, Giorgio & Panattoni, Lorenzo, 1983. "Hessian and approximated Hessian matrices in maximum likelihood estimation: a Monte Carlo study," MPRA Paper 28847, University Library of Munich, Germany.
    16. C. Gouriéroux & A. Monfort & J.‐M. Zakoïan, 2019. "Consistent Pseudo‐Maximum Likelihood Estimators and Groups of Transformations," Econometrica, Econometric Society, vol. 87(1), pages 327-345, January.
    17. Sin, C.Y. (Chor-yiu) & Lee, Cheng-Few, 2021. "Using heteroscedasticity-non-consistent or heteroscedasticity-consistent variances in linear regression," Econometrics and Statistics, Elsevier, vol. 18(C), pages 117-142.
    18. Yuan, Ke-Hai, 2009. "Normal distribution based pseudo ML for missing data: With applications to mean and covariance structure analysis," Journal of Multivariate Analysis, Elsevier, vol. 100(9), pages 1900-1918, October.
    19. Baye, Michael & GATTI, RUPERT J & Kattuman, Paul & Morgan, John, 2004. "Estimating Firm-Level Demand at a Price Comparison Site: Accounting for Shoppers and the Number of Competitors," Competition Policy Center, Working Paper Series qt923692d1, Competition Policy Center, Institute for Business and Economic Research, UC Berkeley.
    20. Magnus, Jan R., 2007. "The Asymptotic Variance Of The Pseudo Maximum Likelihood Estimator," Econometric Theory, Cambridge University Press, vol. 23(5), pages 1022-1032, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jecnmx:v:7:y:2019:i:3:p:37-:d:264548. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.