IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v11y2023i8p1782-d1118775.html
   My bibliography  Save this article

An Approach to Integrating a Non-Probability Sample in the Population Census

Author

Listed:
  • Ieva Burakauskaitė

    (Institute of Data Science and Digital Technologies, Vilnius University, Akademijos Str. 4, LT-08412 Vilnius, Lithuania)

  • Andrius Čiginas

    (Institute of Data Science and Digital Technologies, Vilnius University, Akademijos Str. 4, LT-08412 Vilnius, Lithuania)

Abstract

Population censuses are increasingly using administrative information and sampling as alternatives to collecting detailed data from individuals. Non-probability samples can also be an additional, relatively inexpensive data source, although they require special treatment. In this paper, we consider methods for integrating a non-representative volunteer sample into a population census survey, where the complementary probability sample is drawn from the rest of the population. We investigate two approaches to correcting non-probability sample selection bias: adjustment using propensity scores, which models participation in the voluntary sample, and doubly robust estimation, which has the property of persisting possible misspecification of the latter model. We combine the estimators of population parameters that correct the selection bias with the estimators based on a representative union of both samples. Our analysis shows that the availability of detailed auxiliary information simplifies the applied estimation procedures, which are efficient in the Lithuanian census survey. Our findings also reveal the biased nature of the non-probability sample. For instance, when estimating the proportions of professed religions, smaller religious communities exhibit a higher participation rate than other groups. The combination of estimators corrects such selection bias. Our methodology for combining the voluntary and probability samples can be applied to other sample surveys.

Suggested Citation

  • Ieva Burakauskaitė & Andrius Čiginas, 2023. "An Approach to Integrating a Non-Probability Sample in the Population Census," Mathematics, MDPI, vol. 11(8), pages 1-14, April.
  • Handle: RePEc:gam:jmathe:v:11:y:2023:i:8:p:1782-:d:1118775
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/11/8/1782/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/11/8/1782/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Kowarik, Alexander & Templ, Matthias, 2016. "Imputation with the R Package VIM," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 74(i07).
    2. Jae Kwang Kim & Seho Park & Yilin Chen & Changbao Wu, 2021. "Combining non‐probability and probability survey samples through mass imputation," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(3), pages 941-963, July.
    3. Yilin Chen & Pengfei Li & Changbao Wu, 2020. "Doubly Robust Inference With Nonprobability Survey Samples," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(532), pages 2011-2021, December.
    4. Wu C. & Sitter R. R, 2001. "A Model-Calibration Approach to Using Complete Auxiliary Information From Survey Data," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 185-193, March.
    5. J. N. K. Rao, 2021. "On Making Valid Inferences by Integrating Data from Surveys and Other Sources," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 83(1), pages 242-272, May.
    6. Jae‐Kwang Kim & Siu‐Ming Tam, 2021. "Data Integration by Combining Big Data and Survey Sample Data for Finite Population Inference," International Statistical Review, International Statistical Institute, vol. 89(2), pages 382-401, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chien-Min Huang & F. Jay Breidt, 2023. "A dual-frame approach for estimation with respondent-driven samples," METRON, Springer;Sapienza Università di Roma, vol. 81(1), pages 65-81, April.
    2. Medous, Estelle & Goga, Camelia & Ruiz-Gazen, Anne & Beaumont, Jean-François & Dessertaine, Alain & Puech, Pauline, 2022. "QR Prediction for Statistical Data Integration," TSE Working Papers 22-1344, Toulouse School of Economics (TSE).
    3. Maciej Berk{e}sewicz & Greta Bia{l}kowska & Krzysztof Marcinkowski & Magdalena Ma'slak & Piotr Opiela & Robert Pater & Katarzyna Zadroga, 2019. "Enhancing the Demand for Labour survey by including skills from online job advertisements using model-assisted calibration," Papers 1908.06731, arXiv.org.
    4. Sixia Chen & Alexandra May Woodruff & Janis Campbell & Sara Vesely & Zheng Xu & Cuyler Snider, 2023. "Combining Probability and Nonprobability Samples by Using Multivariate Mass Imputation Approaches with Application to Biomedical Research," Stats, MDPI, vol. 6(2), pages 1-9, May.
    5. Luis Castro-Martín & María del Mar Rueda & Ramón Ferri-García & César Hernando-Tamayo, 2021. "On the Use of Gradient Boosting Methods to Improve the Estimation with Data Obtained with Self-Selection Procedures," Mathematics, MDPI, vol. 9(23), pages 1-23, November.
    6. Debashis Ghosh & Michael S. Sabel, 2022. "A Weighted Sample Framework to Incorporate External Calculators for Risk Modeling," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 14(3), pages 363-379, December.
    7. Henry Webel & Lili Niu & Annelaura Bach Nielsen & Marie Locard-Paulet & Matthias Mann & Lars Juhl Jensen & Simon Rasmussen, 2024. "Imputation of label-free quantitative mass spectrometry-based proteomics data using self-supervised deep learning," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    8. Domingo Morales & María del Mar Rueda & Dolores Esteban, 2018. "Model-Assisted Estimation of Small Area Poverty Measures: An Application within the Valencia Region in Spain," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 138(3), pages 873-900, August.
    9. Bram Janssens & Matthias Bogaert & Mathijs Maton, 2023. "Predicting the next Pogačar: a data analytical approach to detect young professional cycling talents," Annals of Operations Research, Springer, vol. 325(1), pages 557-588, June.
    10. Chhetri, Netra & Ghimire, Rajiv & Wagner, Melissa & Wang, Meng, 2020. "Global citizen deliberation: Case of world-wide views on climate and energy," Energy Policy, Elsevier, vol. 147(C).
    11. Carlos Miguel Lemos & Ross Joseph Gore & Ivan Puga-Gonzalez & F LeRon Shults, 2019. "Dimensionality and factorial invariance of religiosity among Christians and the religiously unaffiliated: A cross-cultural analysis based on the International Social Survey Programme," PLOS ONE, Public Library of Science, vol. 14(5), pages 1-36, May.
    12. M. Rueda & I. Sánchez-Borrego & A. Arcos & S. Martínez, 2010. "Model-calibration estimation of the distribution function using nonparametric regression," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 71(1), pages 33-44, January.
    13. Denis Devaud & Yves Tillé, 2019. "Deville and Särndal’s calibration: revisiting a 25-years-old successful optimization problem," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(4), pages 1033-1065, December.
    14. A. Arcos & M. Rueda & M. Martínez-Miranda, 2005. "Using multiparametric auxiliary information at the estimation stage," Statistical Papers, Springer, vol. 46(3), pages 339-358, July.
    15. Nicholas Tierney & Dianne Cook, 2018. "Expanding tidy data principles to facilitate missing data exploration, visualization and assessment of imputations," Monash Econometrics and Business Statistics Working Papers 14/18, Monash University, Department of Econometrics and Business Statistics.
    16. Barranco-Chamorro, I. & Jiménez-Gamero, M.D. & Moreno-Rebollo, J.L. & Muñoz-Pichardo, J.M., 2012. "Case-deletion type diagnostics for calibration estimators in survey sampling," Computational Statistics & Data Analysis, Elsevier, vol. 56(7), pages 2219-2236.
    17. Samorodnitsky, Sarah & Wendt, Chris H. & Lock, Eric F., 2024. "Bayesian simultaneous factorization and prediction using multi-omic data," Computational Statistics & Data Analysis, Elsevier, vol. 197(C).
    18. Sonja Herrmann & Christian Nagel, 2023. "Early Careers of Graduates from Private and Public Universities in Germany: A Comparison of Income Differences Regarding the First Employment," Research in Higher Education, Springer;Association for Institutional Research, vol. 64(1), pages 129-146, February.
    19. Kim, Jong-Min & Sungur, Engin A. & Heo, Tae-Young, 2007. "Calibration approach estimators in stratified sampling," Statistics & Probability Letters, Elsevier, vol. 77(1), pages 99-103, January.
    20. Oliver Hirsch & Charles Christian Adarkwah, 2018. "The Issue of Burnout and Work Satisfaction in Younger GPs—A Cluster Analysis Utilizing the HaMEdSi Study," IJERPH, MDPI, vol. 15(10), pages 1-10, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:8:p:1782-:d:1118775. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.