IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i21p3419-d1511819.html
   My bibliography  Save this article

Missing Data Imputation in Balanced Construction for Incomplete Block Designs

Author

Listed:
  • Haiyan Yu

    (Center for Data and Decision Sciences, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
    These authors contributed equally to this work.)

  • Bing Han

    (School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China
    These authors contributed equally to this work.)

  • Nicholas Rios

    (Department of Statistics, George Mason University, Fairfax, VA 22031, USA
    These authors contributed equally to this work.)

  • Jianbin Chen

    (School of Mathematics and Statistics, Beijing Institute of Technology, Beijing 100081, China
    These authors contributed equally to this work.)

Abstract

Observational data with massive sample sizes are often distributed on many local machines. From an experimental design perspective, investigators often desire to identify the effect of new treatments (even ML algorithms) on many blocks of experimental data. With time requirements or budget constraints, assigning all treatments to each block is not always feasible. This creates incomplete responses with respect to a randomized complete block design (RCBD). These incomplete responses are missing by design. However, whether they can be estimated with missing imputation methods is not well understood. Thus, it is challenging to correctly identify the treatment effects with missing data. To this end, this paper provides a method for imputation and analysis of the responses with missing data. The proposed method consists of three steps: Reconstruction, Imputation, and ‘Complete’-data Analysis (RICA). The incomplete responses are imputed with the expectation-maximization (EM) algorithm. The RCBD model is then fitted by the resulting dataset. The identifiability result suggests that the missing may be nonignorable for each block, but the whole data of an incomplete design are missing by design when the design is balanced. Theoretical results on relative efficiency also inform us when the missingness should be imputed for incomplete designs with the role of balanced variance. Applications on real-world data verify the efficacy of this method.

Suggested Citation

  • Haiyan Yu & Bing Han & Nicholas Rios & Jianbin Chen, 2024. "Missing Data Imputation in Balanced Construction for Incomplete Block Designs," Mathematics, MDPI, vol. 12(21), pages 1-22, October.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:21:p:3419-:d:1511819
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/21/3419/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/21/3419/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Wright, Marvin N. & Ziegler, Andreas, 2017. "ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 77(i01).
    2. De Boeck, Paul & Bakker, Marjan & Zwitser, Robert & Nivard, Michel & Hofman, Abe & Tuerlinckx, Francis & Partchev, Ivailo, 2011. "The Estimation of Item Response Models with the lmer Function from the lme4 Package in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 39(i12).
    3. Fabrizia Mealli & Donald B. Rubin, 2015. "Clarifying missing at random and related definitions, and implications when coupled with exchangeability," Biometrika, Biometrika Trust, vol. 102(4), pages 995-1000.
    4. Yu, Haiyan & Yang, Ching-Chi & Yu, Ping, 2023. "Constrained optimization for stratified treatment rules in reducing hospital readmission rates of diabetic patients," European Journal of Operational Research, Elsevier, vol. 308(3), pages 1355-1364.
    5. Chen, Jianbin & Mukerjee, Rahul & Lin, Dennis K.J., 2020. "Construction of optimal fractional Order-of-Addition designs via block designs," Statistics & Probability Letters, Elsevier, vol. 161(C).
    6. Honaker, James & King, Gary & Blackwell, Matthew, 2011. "Amelia II: A Program for Missing Data," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i07).
    7. Huang, Chien-Ming & Lee, Yuh-Jye & Lin, Dennis K.J. & Huang, Su-Yun, 2007. "Model selection for support vector machines via uniform design," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 335-346, September.
    8. Z Tan, 2020. "Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data," Biometrika, Biometrika Trust, vol. 107(1), pages 137-158.
    9. Su, Yu-Sung & Gelman, Andrew & Hill, Jennifer & Yajima, Masanao, 2011. "Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i02).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Cheng, Xiaoyue & Cook, Dianne & Hofmann, Heike, 2015. "Visually Exploring Missing Values in Multivariable Data Using a Graphical User Interface," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 68(i06).
    2. Josse, Julie & Husson, François, 2016. "missMDA: A Package for Handling Missing Values in Multivariate Data Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 70(i01).
    3. Tendeiro, Jorge N. & Meijer, Rob R. & Niessen, A. Susan M., 2016. "PerFit: An R Package for Person-Fit Analysis in IRT," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 74(i05).
    4. Y Cui & E J Tchetgen Tchetgen, 2024. "Selective machine learning of doubly robust functionals," Biometrika, Biometrika Trust, vol. 111(2), pages 517-535.
    5. Joost Ginkel & Pieter Kroonenberg, 2014. "Using Generalized Procrustes Analysis for Multiple Imputation in Principal Component Analysis," Journal of Classification, Springer;The Classification Society, vol. 31(2), pages 242-269, July.
    6. Robert A. Jackson & Matthew Pietryka, 2022. "The influence of becoming a parent on political participation in the United States," Social Science Quarterly, Southwestern Social Science Association, vol. 103(3), pages 565-580, May.
    7. Cohen, Joseph N, 2010. "Neoliberalism’s relationship with economic growth in the developing world: Was it the power of the market or the resolution of financial crisis?," MPRA Paper 24527, University Library of Munich, Germany.
    8. Andrés López-Sepulcre & Sebastiano De Bona & Janne K. Valkonen & Kate D.L. Umbers & Johanna Mappes, 2015. "Item Response Trees: a recommended method for analyzing categorical data in behavioral studies," Behavioral Ecology, International Society for Behavioral Ecology, vol. 26(5), pages 1268-1273.
    9. Matei Demetrescu & Christoph Hanck & Robinson Kruse‐Becher, 2022. "Robust inference under time‐varying volatility: A real‐time evaluation of professional forecasters," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 37(5), pages 1010-1030, August.
    10. Bokelmann, Björn & Lessmann, Stefan, 2024. "Improving uplift model evaluation on randomized controlled trial data," European Journal of Operational Research, Elsevier, vol. 313(2), pages 691-707.
    11. Joel Podgorski & Oliver Kracht & Luis Araguas-Araguas & Stefan Terzer-Wassmuth & Jodie Miller & Ralf Straub & Rolf Kipfer & Michael Berg, 2024. "Groundwater vulnerability to pollution in Africa’s Sahel region," Nature Sustainability, Nature, vol. 7(5), pages 558-567, May.
    12. Gerko Vink & Laurence E. Frank & Jeroen Pannekoek & Stef Buuren, 2014. "Predictive mean matching imputation of semicontinuous variables," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 68(1), pages 61-90, February.
    13. Hassan, Mahmoud & Oueslati, Walid & Rousselière, Damien, 2020. "Environmental taxes, reforms and economic growth: an empirical analysis of panel data," Economic Systems, Elsevier, vol. 44(3).
    14. Segaro, Ethiopia L. & Larimo, Jorma & Jones, Marian V., 2014. "Internationalisation of family small and medium sized enterprises: The role of stewardship orientation, family commitment culture and top management team," International Business Review, Elsevier, vol. 23(2), pages 381-395.
    15. Joseph A. Lewnard & Parag Mahale & Debbie Malden & Vennis Hong & Bradley K. Ackerson & Bruno J. Lewin & Ruth Link-Gelles & Leora R. Feldstein & Marc Lipsitch & Sara Y. Tartof, 2024. "Immune escape and attenuated severity associated with the SARS-CoV-2 BA.2.86/JN.1 lineage," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    16. Shige Song, 2013. "Prenatal malnutrition and subsequent foetal loss risk: Evidence from the 1959-1961 Chinese famine," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 29(26), pages 707-728.
    17. Chakravorty, Bhaskar & Arulampalam, Wiji & Bhatiya, Apurav Yash & Imbert, Clément & Rathelot, Roland, 2024. "Can information about jobs improve the effectiveness of vocational training? Experimental evidence from India," Journal of Development Economics, Elsevier, vol. 169(C).
    18. Albert Stuart Reece & Gary Kenneth Hulse, 2022. "European Epidemiological Patterns of Cannabis- and Substance-Related Congenital Neurological Anomalies: Geospatiotemporal and Causal Inferential Study," IJERPH, MDPI, vol. 20(1), pages 1-35, December.
    19. Phillipp Schwarzfischer & Dariusz Gruszfeld & Piotr Socha & Veronica Luque & Ricardo Closa-Monasterolo & Déborah Rousseaux & Melissa Moretti & Alice ReDionigi & Elvira Verduci & Berthold Koletzko & Ve, 2020. "Effects of screen time and playing outside on anthropometric measures in preschool aged children," PLOS ONE, Public Library of Science, vol. 15(3), pages 1-15, March.
    20. Foutzopoulos, Giorgos & Pandis, Nikolaos & Tsagris, Michail, 2024. "Predicting full retirement attainment of NBA players," MPRA Paper 121540, University Library of Munich, Germany.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:21:p:3419-:d:1511819. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.