IDEAS home Printed from https://ideas.repec.org/p/ehl/lserod/87227.html
   My bibliography  Save this paper

Missing data: a unified taxonomy guided by conditional independence

Author

Listed:
  • Doretti, Marco
  • Geneletti, Sara
  • Stanghellini, Elena

Abstract

Recent work (Seaman et al., 2013; Mealli & Rubin, 2015) attempts to clarify the not always well-understood difference between realised and everywhere definitions of missing at random (MAR) and missing completely at random. Another branch of the literature (Mohan et al., 2013; Pearl & Mohan, 2013) exploits always-observed covariates to give variable-based definitions of MAR and missing completely at random. In this paper, we develop a unified taxonomy encompassing all approaches. In this taxonomy, the new concept of ‘complementary MAR’ is introduced, and its relationship with the concept of data observed at random is discussed. All relationships among these definitions are analysed and represented graphically. Conditional independence, both at the random variable and at the event level, is the formal language we adopt to connect all these definitions. Our paper covers both the univariate and the multivariate case, where attention is paid to monotone missingness and to the concept of sequential MAR. Specifically, for monotone missingness, we propose a sequential MAR definition that might be more appropriate than both everywhere and variable-based MAR to model dropout in certain contexts.

Suggested Citation

  • Doretti, Marco & Geneletti, Sara & Stanghellini, Elena, 2018. "Missing data: a unified taxonomy guided by conditional independence," LSE Research Online Documents on Economics 87227, London School of Economics and Political Science, LSE Library.
  • Handle: RePEc:ehl:lserod:87227
    as

    Download full text from publisher

    File URL: http://eprints.lse.ac.uk/87227/
    File Function: Open access version.
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. M. G. Kenward, 2003. "Pattern-mixture models with proper time dependence," Biometrika, Biometrika Trust, vol. 90(1), pages 53-71, March.
    2. Geert Molenberghs & Caroline Beunckens & Cristina Sotto & Michael G. Kenward, 2008. "Every missingness not at random model has a missingness at random counterpart with equal fit," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(2), pages 371-388, April.
    3. Fabrizia Mealli & Donald B. Rubin, 2015. "Clarifying missing at random and related definitions, and implications when coupled with exchangeability," Biometrika, Biometrika Trust, vol. 102(4), pages 995-1000.
    4. G. Molenberghs & B. Michiels & M. G. Kenward & P. J. Diggle, 1998. "Monotone missing data and pattern‐mixture models," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 52(2), pages 153-161, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Mehboob Ali & Göran Kauermann, 2021. "A split questionnaire survey design in the context of statistical matching," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(4), pages 1219-1236, October.
    2. Nitzan Cohen & Yakir Berchenko, 2021. "Normalized Information Criteria and Model Selection in the Presence of Missing Data," Mathematics, MDPI, vol. 9(19), pages 1-23, October.
    3. Thakur Narendra Singh & Shukla Diwakar, 2022. "Missing data estimation based on the chaining technique in survey sampling," Statistics in Transition New Series, Polish Statistical Association, vol. 23(4), pages 91-111, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bunouf, Pierre & Molenberghs, Geert & Grouin, Jean-Marie & Thijs, Herbert, 2015. "A SAS Program Combining R Functionalities to Implement Pattern-Mixture Models," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 68(i08).
    2. D. M. Farewell & C. Huang & V. Didelez, 2017. "Ignorability for general longitudinal data," Biometrika, Biometrika Trust, vol. 104(2), pages 317-326.
    3. Shu Xu & Shelley A. Blozis, 2011. "Sensitivity Analysis of Mixed Models for Incomplete Longitudinal Data," Journal of Educational and Behavioral Statistics, , vol. 36(2), pages 237-256, April.
    4. Bian, Yuan & Yi, Grace Y. & He, Wenqing, 2024. "A unified framework of analyzing missing data and variable selection using regularized likelihood," Computational Statistics & Data Analysis, Elsevier, vol. 194(C).
    5. Kott Phillip S. & Liao Dan, 2018. "Calibration Weighting for Nonresponse with Proxy Frame Variables (So that Unit Nonresponse Can Be Not Missing at Random)," Journal of Official Statistics, Sciendo, vol. 34(1), pages 107-120, March.
    6. A. R. Linero, 2017. "Bayesian nonparametric analysis of longitudinal studies in the presence of informative missingness," Biometrika, Biometrika Trust, vol. 104(2), pages 327-341.
    7. Hairu Wang & Zhiping Lu & Yukun Liu, 2023. "Score test for missing at random or not under logistic missingness models," Biometrics, The International Biometric Society, vol. 79(2), pages 1268-1279, June.
    8. Andrew T. Karl & Yan Yang & Sharon L. Lohr, 2013. "A Correlated Random Effects Model for Nonignorable Missing Data in Value-Added Assessment of Teacher Effects," Journal of Educational and Behavioral Statistics, , vol. 38(6), pages 577-603, December.
    9. Xiaojun Mao & Zhonglei Wang & Shu Yang, 2023. "Matrix completion under complex survey sampling," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 75(3), pages 463-492, June.
    10. Rianne Margaretha Schouten & Gerko Vink, 2021. "The Dance of the Mechanisms: How Observed Information Influences the Validity of Missingness Assumptions," Sociological Methods & Research, , vol. 50(3), pages 1243-1258, August.
    11. Daniel, Rhian M. & Kenward, Michael G., 2012. "A method for increasing the robustness of multiple imputation," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1624-1643.
    12. Janicki, Ryan & Malec, Donald, 2013. "A Bayesian model averaging approach to analyzing categorical data with nonignorable nonresponse," Computational Statistics & Data Analysis, Elsevier, vol. 57(1), pages 600-614.
    13. Yu Cao & Nitai D. Mukhopadhyay, 2021. "Statistical Modeling of Longitudinal Data with Non-Ignorable Non-Monotone Missingness with Semiparametric Bayesian and Machine Learning Components," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 83(1), pages 152-169, May.
    14. Florian M. Hollenbach & Iavor Bojinov & Shahryar Minhas & Nils W. Metternich & Michael D. Ward & Alexander Volfovsky, 2021. "Multiple Imputation Using Gaussian Copulas," Sociological Methods & Research, , vol. 50(3), pages 1259-1283, August.
    15. Aidan G. O’Keeffe & Daniel M. Farewell & Brian D. M. Tom & Vernon T. Farewell, 2016. "Multiple Imputation of Missing Composite Outcomes in Longitudinal Data," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 8(2), pages 310-332, October.
    16. Yuan, Ke-Hai, 2009. "Normal distribution based pseudo ML for missing data: With applications to mean and covariance structure analysis," Journal of Multivariate Analysis, Elsevier, vol. 100(9), pages 1900-1918, October.
    17. Margarita Moreno-Betancur & Grégoire Rey & Aurélien Latouche, 2015. "Direct likelihood inference and sensitivity analysis for competing risks regression with missing causes of failure," Biometrics, The International Biometric Society, vol. 71(2), pages 498-507, June.
    18. Michael J. Daniels & Arkendu S. Chatterjee & Chenguang Wang, 2012. "Bayesian Model Selection for Incomplete Data Using the Posterior Predictive Distribution," Biometrics, The International Biometric Society, vol. 68(4), pages 1055-1063, December.
    19. Yuriko Takeda & Toshihiro Misumi & Kouji Yamamoto, 2022. "Joint Models for Incomplete Longitudinal Data and Time-to-Event Data," Mathematics, MDPI, vol. 10(19), pages 1-7, October.
    20. Roula Tsonaka & Dimitris Rizopoulos & Geert Verbeke & Emmanuel Lesaffre, 2010. "Nonignorable Models for Intermittently Missing Categorical Longitudinal Responses," Biometrics, The International Biometric Society, vol. 66(3), pages 834-844, September.

    More about this item

    JEL classification:

    • C1 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ehl:lserod:87227. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: LSERO Manager (email available below). General contact details of provider: https://edirc.repec.org/data/lsepsuk.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.