IDEAS home Printed from https://ideas.repec.org/a/spr/aistmt/v75y2023i3d10.1007_s10463-022-00851-5.html
   My bibliography  Save this article

Matrix completion under complex survey sampling

Author

Listed:
  • Xiaojun Mao

    (Shanghai Jiao Tong University)

  • Zhonglei Wang

    (Xiamen University)

  • Shu Yang

    (North Carolina State University)

Abstract

Multivariate nonresponse is often encountered in complex survey sampling, and simply ignoring it leads to erroneous inference. In this paper, we propose a new matrix completion method for complex survey sampling. Different from existing works either conducting row-wise or column-wise imputation, the data matrix is treated as a whole which allows for exploiting both row and column patterns simultaneously. A column-space-decomposition model is adopted incorporating a low-rank structured matrix for the finite population with easy-to-obtain demographic information as covariates. Besides, we propose a computationally efficient projection strategy to identify the model parameters under complex survey sampling. Then, an augmented inverse probability weighting estimator is used to estimate the parameter of interest, and the corresponding asymptotic upper bound of the estimation error is derived. Simulation studies show that the proposed estimator has a smaller mean squared error than other competitors, and the corresponding variance estimator performs well. The proposed method is applied to assess the health status of the U.S. population.

Suggested Citation

  • Xiaojun Mao & Zhonglei Wang & Shu Yang, 2023. "Matrix completion under complex survey sampling," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 75(3), pages 463-492, June.
  • Handle: RePEc:spr:aistmt:v:75:y:2023:i:3:d:10.1007_s10463-022-00851-5
    DOI: 10.1007/s10463-022-00851-5
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10463-022-00851-5
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10463-022-00851-5?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Geneviève Robin & Olga Klopp & Julie Josse & Éric Moulines & Robert Tibshirani, 2020. "Main Effects and Interactions in Mixed and Incomplete Data Frames," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(531), pages 1292-1303, July.
    2. van Buuren, Stef & Groothuis-Oudshoorn, Karin, 2011. "mice: Multivariate Imputation by Chained Equations in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i03).
    3. Ted Chang & Phillip S. Kott, 2008. "Using calibration weighting to adjust for nonresponse under a plausible model," Biometrika, Biometrika Trust, vol. 95(3), pages 555-571.
    4. S Yang & L Wang & P Ding, 2019. "Causal inference with confounders missing not at random," Biometrika, Biometrika Trust, vol. 106(4), pages 875-888.
    5. Yilin Chen & Pengfei Li & Changbao Wu, 2020. "Doubly Robust Inference With Nonprobability Survey Samples," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(532), pages 2011-2021, December.
    6. Fan, Jianqing & Gong, Wenyan & Zhu, Ziwei, 2019. "Generalized high-dimensional trace regression via nuclear norm regularization," Journal of Econometrics, Elsevier, vol. 212(1), pages 177-202.
    7. Zhonglei Wang & Liuhua Peng & Jae Kwang Kim, 2022. "Bootstrap inference for the finite population mean under complex sampling designs," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(4), pages 1150-1174, September.
    8. Xuan Bi & Annie Qu & Junhui Wang & Xiaotong Shen, 2017. "A Group-Specific Recommender System," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(519), pages 1344-1353, July.
    9. G. Molenberghs & B. Michiels & M. G. Kenward & P. J. Diggle, 1998. "Monotone missing data and pattern‐mixture models," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 52(2), pages 153-161, June.
    10. Z. Tan, 2013. "Simple design-efficient calibration estimators for rejective and high-entropy sampling," Biometrika, Biometrika Trust, vol. 100(2), pages 399-415.
    11. Kim, Jae Kwang & Yu, Cindy Long, 2011. "A Semiparametric Estimation of Mean Functionals With Nonignorable Missing Data," Journal of the American Statistical Association, American Statistical Association, vol. 106(493), pages 157-165.
    12. Jae Kwang Kim, 2004. "Fractional hot deck imputation," Biometrika, Biometrika Trust, vol. 91(3), pages 559-578, September.
    13. Jae Kwang Kim & J. Michael Brick & Wayne A. Fuller & Graham Kalton, 2006. "On the bias of the multiple‐imputation variance estimator in survey sampling," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 68(3), pages 509-521, June.
    14. S. Yang & J. K. Kim, 2016. "A note on multiple imputation for method of moments estimation," Biometrika, Biometrika Trust, vol. 103(1), pages 244-251.
    15. Jing Qin & Biao Zhang & Denis H.Y. Leung, 2017. "Efficient Augmented Inverse Probability Weighted Estimation in Missing Data Problems," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 35(1), pages 86-97, January.
    16. repec:mpr:mprres:8160 is not listed on IDEAS
    17. Changbao Wu, 2003. "Optimal calibration estimators in survey sampling," Biometrika, Biometrika Trust, vol. 90(4), pages 937-951, December.
    18. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    19. Gong Tang, 2003. "Analysis of multivariate missing data with nonignorable nonresponse," Biometrika, Biometrika Trust, vol. 90(4), pages 747-764, December.
    20. Niels Keiding & Thomas A. Louis, 2016. "Perils and potentials of self-selected entry to epidemiological studies and surveys," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 179(2), pages 319-376, February.
    21. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    22. Xiaojun Mao & Song Xi Chen & Raymond K. W. Wong, 2019. "Matrix Completion With Covariate Information," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(525), pages 198-210, January.
    23. Rebecca R. Andridge & Roderick J. A. Little, 2010. "A Review of Hot Deck Imputation for Survey Non‐response," International Statistical Review, International Statistical Institute, vol. 78(1), pages 40-64, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Danhyang Lee & Jae Kwang Kim, 2022. "Semiparametric imputation using conditional Gaussian mixture models under item nonresponse," Biometrics, The International Biometric Society, vol. 78(1), pages 227-237, March.
    2. Christopher J Greenwood & George J Youssef & Primrose Letcher & Jacqui A Macdonald & Lauryn J Hagg & Ann Sanson & Jenn Mcintosh & Delyse M Hutchinson & John W Toumbourou & Matthew Fuller-Tyszkiewicz &, 2020. "A comparison of penalised regression methods for informing the selection of predictive markers," PLOS ONE, Public Library of Science, vol. 15(11), pages 1-14, November.
    3. Christopher Kath & Florian Ziel, 2018. "The value of forecasts: Quantifying the economic gains of accurate quarter-hourly electricity price forecasts," Papers 1811.08604, arXiv.org.
    4. Zhang, Ting & Wang, Lei, 2020. "Smoothed empirical likelihood inference and variable selection for quantile regression with nonignorable missing response," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    5. Kath, Christopher & Ziel, Florian, 2018. "The value of forecasts: Quantifying the economic gains of accurate quarter-hourly electricity price forecasts," Energy Economics, Elsevier, vol. 76(C), pages 411-423.
    6. Jiang, Depeng & Zhao, Puying & Tang, Niansheng, 2016. "A propensity score adjustment method for regression models with nonignorable missing covariates," Computational Statistics & Data Analysis, Elsevier, vol. 94(C), pages 98-119.
    7. Jeongsub Choi & Youngdoo Son & Myong K. Jeong, 2024. "Gaussian kernel with correlated variables for incomplete data," Annals of Operations Research, Springer, vol. 341(1), pages 223-244, October.
    8. Yilin Li & Wang Miao & Ilya Shpitser & Eric J. Tchetgen Tchetgen, 2023. "A self‐censoring model for multivariate nonignorable nonmonotone missing data," Biometrics, The International Biometric Society, vol. 79(4), pages 3203-3214, December.
    9. Li, Mengyan & Ma, Yanyuan & Zhao, Jiwei, 2022. "Efficient estimation in a partially specified nonignorable propensity score model," Computational Statistics & Data Analysis, Elsevier, vol. 174(C).
    10. Liang, Lixing & Zhuang, Yipeng & Yu, Philip L.H., 2024. "Variable selection for high-dimensional incomplete data," Computational Statistics & Data Analysis, Elsevier, vol. 192(C).
    11. Ballering, Aranka V. & Bonvanie, Irma J. & Olde Hartman, Tim C. & Monden, Rei & Rosmalen, Judith G.M., 2020. "Gender and sex independently associate with common somatic symptoms and lifetime prevalence of chronic disease," Social Science & Medicine, Elsevier, vol. 253(C).
    12. Puying Zhao & Hui Zhao & Niansheng Tang & Zhaohai Li, 2017. "Weighted composite quantile regression analysis for nonignorable missing data using nonresponse instrument," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 29(2), pages 189-212, April.
    13. Halewijn M. Drent & Barbara van den Hoofdakker & Jan K. Buitelaar & Pieter J. Hoekstra & Andrea Dietrich, 2022. "Factors Related to Perceived Stigma in Parents of Children and Adolescents in Outpatient Mental Healthcare," IJERPH, MDPI, vol. 19(19), pages 1-14, October.
    14. Ling Peng & Xiaohui Liu & Xiangyong Tan & Yiweng Zhou & Shihua Luo, 2024. "The statistical rate for support matrix machines under low rankness and row (column) sparsity," Statistical Papers, Springer, vol. 65(7), pages 4567-4598, September.
    15. Faisal Maqbool Zahid & Shahla Faisal & Christian Heumann, 2020. "Variable selection techniques after multiple imputation in high-dimensional data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 29(3), pages 553-580, September.
    16. Michael Bergrab & Christian Aßmann, 2024. "Automated Bayesian variable selection methods for binary regression models with missing covariate data," AStA Wirtschafts- und Sozialstatistisches Archiv, Springer;Deutsche Statistische Gesellschaft - German Statistical Society, vol. 18(2), pages 203-244, June.
    17. Shonosuke Sugasawa & Kosuke Morikawa & Keisuke Takahata, 2022. "Bayesian semiparametric modeling of response mechanism for nonignorable missing data," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(1), pages 101-117, March.
    18. Kreutzmann, Ann-Kristin & Marek, Philipp & Salvati, Nicola & Schmid, Timo, 2019. "Estimating regional wealth in Germany: How different are East and West really?," Discussion Papers 35/2019, Deutsche Bundesbank.
    19. Pengfei Li & Jing Qin & Yukun Liu, 2023. "Instability of inverse probability weighting methods and a remedy for nonignorable missing data," Biometrics, The International Biometric Society, vol. 79(4), pages 3215-3226, December.
    20. Stephanie Houle & Ryan Macdonald, 2023. "Identifying Nascent High-Growth Firms Using Machine Learning," Staff Working Papers 23-53, Bank of Canada.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:aistmt:v:75:y:2023:i:3:d:10.1007_s10463-022-00851-5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.