IDEAS home Printed from https://ideas.repec.org/a/spr/stpapr/v63y2022i6d10.1007_s00362-022-01296-x.html
   My bibliography  Save this article

Variable selection in Propensity Score Adjustment to mitigate selection bias in online surveys

Author

Listed:
  • Ramón Ferri-García

    (University of Granada)

  • María del Mar Rueda

    (University of Granada)

Abstract

The development of new survey data collection methods such as online surveys has been particularly advantageous for social studies in terms of reduced costs, immediacy and enhanced questionnaire possibilities. However, many such methods are strongly affected by selection bias, leading to unreliable estimates. Calibration and Propensity Score Adjustment (PSA) have been proposed as methods to remove selection bias in online nonprobability surveys. Calibration requires population totals to be known for the auxiliary variables used in the procedure, while PSA estimates the volunteering propensity of an individual using predictive modelling. The variables included in these models must be carefully selected in order to maximise the accuracy of the final estimates. This study presents an application, using synthetic and real data, of variable selection techniques developed for knowledge discovery in data to choose the best subset of variables for propensity estimation. We also compare the performance of PSA using different classification algorithms, after which calibration is applied. We also present an application of this methodology in a real-world situation, using it to obtain estimates of population parameters. The results obtained show that variable selection using appropriate methods can provide less biased and more efficient estimates than using all available covariates.

Suggested Citation

  • Ramón Ferri-García & María del Mar Rueda, 2022. "Variable selection in Propensity Score Adjustment to mitigate selection bias in online surveys," Statistical Papers, Springer, vol. 63(6), pages 1829-1881, December.
  • Handle: RePEc:spr:stpapr:v:63:y:2022:i:6:d:10.1007_s00362-022-01296-x
    DOI: 10.1007/s00362-022-01296-x
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00362-022-01296-x
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00362-022-01296-x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Kursa, Miron B. & Rudnicki, Witold R., 2010. "Feature Selection with the Boruta Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 36(i11).
    2. Matthias Schonlau & Arthur Van Soest & Arie Kapteyn, 2007. "Are 'Webographic' or Attitudinal Questions Useful for Adjusting Estimates From Web Surveys Using Propensity Scoring?," Working Papers WR-506, RAND Corporation.
    3. Jack Kuang Tsung Chen & Richard L. Valliant & Michael R. Elliott, 2019. "Calibrating non‐probability surveys to estimated control totals using LASSO, with an application to political polling," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 68(3), pages 657-681, April.
    4. Matthias Schonlau & Arthur Van Soest & Arie Kapteyn, 2007. "Are 'Webographic' or Attitudinal Questions Useful for Adjusting Estimates From Web Surveys Using Propensity Scoring?," Working Papers 506, RAND Corporation.
    5. Jelke Bethlehem, 2010. "Selection Bias in Web Surveys," International Statistical Review, International Statistical Institute, vol. 78(2), pages 161-188, August.
    6. Bart Buelens & Joep Burger & Jan A. van den Brakel, 2018. "Comparing Inference Methods for Non‐probability Samples," International Statistical Review, International Statistical Institute, vol. 86(2), pages 322-343, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Luis Castro-Martín & Maria del Mar Rueda & Ramón Ferri-García, 2020. "Inference from Non-Probability Surveys with Statistical Matching and Propensity Score Adjustment Using Modern Prediction Techniques," Mathematics, MDPI, vol. 8(6), pages 1-19, June.
    2. Luis Castro-Martín & María del Mar Rueda & Ramón Ferri-García, 2020. "Estimating General Parameters from Non-Probability Surveys Using Propensity Score Adjustment," Mathematics, MDPI, vol. 8(11), pages 1-14, November.
    3. Stéphane Legleye & Géraldine Charrance & Nicolas Razafindratsima & Nathalie Bajos & Aline Bohet & Caroline Moreau, 2018. "The Use of a Nonprobability Internet Panel to Monitor Sexual and Reproductive Health in the General Population," Sociological Methods & Research, , vol. 47(2), pages 314-348, March.
    4. Buil-Gil, David & Solymosi, Reka & Moretti, Angelo, 2019. "Non-parametric bootstrap and small area estimation to mitigate bias in crowdsourced data. Simulation study and application to perceived safety," SocArXiv 8hgjt, Center for Open Science.
    5. Maciej Berk{e}sewicz & Greta Bia{l}kowska & Krzysztof Marcinkowski & Magdalena Ma'slak & Piotr Opiela & Robert Pater & Katarzyna Zadroga, 2019. "Enhancing the Demand for Labour survey by including skills from online job advertisements using model-assisted calibration," Papers 1908.06731, arXiv.org.
    6. Sunghee Lee & Richard Valliant, 2009. "Estimation for Volunteer Panel Web Surveys Using Propensity Score Adjustment and Calibration Adjustment," Sociological Methods & Research, , vol. 37(3), pages 319-343, February.
    7. Richard Valliant & Jill A. Dever, 2011. "Estimating Propensity Adjustments for Volunteer Web Surveys," Sociological Methods & Research, , vol. 40(1), pages 105-137, February.
    8. María del Mar Rueda & Sergio Martínez-Puertas & Luis Castro-Martín, 2022. "Methods to Counter Self-Selection Bias in Estimations of the Distribution Function and Quantiles," Mathematics, MDPI, vol. 10(24), pages 1-19, December.
    9. Maria del Mar Rueda, 2019. "Comments on: Deville and Särndal’s calibration: revisiting a 25 years old successful optimization problem," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(4), pages 1077-1081, December.
    10. repec:aia:aiaswp:wp76 is not listed on IDEAS
    11. Tong, Jianfeng & Liu, Zhenxing & Zhang, Yong & Zheng, Xiujuan & Jin, Junyang, 2023. "Improved multi-gate mixture-of-experts framework for multi-step prediction of gas load," Energy, Elsevier, vol. 282(C).
    12. Asma Shaheen & Javed Iqbal, 2018. "Spatial Distribution and Mobility Assessment of Carcinogenic Heavy Metals in Soil Profiles Using Geostatistics and Random Forest, Boruta Algorithm," Sustainability, MDPI, vol. 10(3), pages 1-20, March.
    13. Yvan Devaux & Lu Zhang & Andrew I. Lumley & Kanita Karaduzovic-Hadziabdic & Vincent Mooser & Simon Rousseau & Muhammad Shoaib & Venkata Satagopam & Muhamed Adilovic & Prashant Kumar Srivastava & Costa, 2024. "Development of a long noncoding RNA-based machine learning model to predict COVID-19 in-hospital mortality," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    14. Ghosh, Indranil & Chaudhuri, Tamal Datta & Alfaro-Cortés, Esteban & Gámez, Matías & García, Noelia, 2022. "A hybrid approach to forecasting futures prices with simultaneous consideration of optimality in ensemble feature selection and advanced artificial intelligence," Technological Forecasting and Social Change, Elsevier, vol. 181(C).
    15. Lehmann, Nico & Sloot, Daniel & Schüle, Christopher & Ardone, Armin & Fichtner, Wolf, 2023. "The motivational drivers behind consumer preferences for regional electricity – Results of a choice experiment in Southern Germany," Energy Economics, Elsevier, vol. 120(C).
    16. Giulia Casu & Marco Giovanni Mariani & Rita Chiesa & Dina Guglielmi & Paola Gremigni, 2021. "The Role of Organizational Citizenship Behavior and Gender between Job Satisfaction and Task Performance," IJERPH, MDPI, vol. 18(18), pages 1-15, September.
    17. Conor Waldock & Bernhard Wegscheider & Dario Josi & Bárbara Borges Calegari & Jakob Brodersen & Luiz Jardim de Queiroz & Ole Seehausen, 2024. "Deconstructing the geography of human impacts on species’ natural distribution," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    18. Bertram, Christine & Rehdanz, Katrin, 2015. "The role of urban green space for human well-being," Ecological Economics, Elsevier, vol. 120(C), pages 139-152.
    19. Manuel J. García Rodríguez & Vicente Rodríguez Montequín & Francisco Ortega Fernández & Joaquín M. Villanueva Balsera, 2019. "Public Procurement Announcements in Spain: Regulations, Data Analysis, and Award Price Estimator Using Machine Learning," Complexity, Hindawi, vol. 2019, pages 1-20, November.
    20. Sangjin Kim & Jong-Min Kim, 2019. "Two-Stage Classification with SIS Using a New Filter Ranking Method in High Throughput Data," Mathematics, MDPI, vol. 7(6), pages 1-16, May.
    21. Galperin, Hernan & Arcidiacono, Malena, 2021. "Employment and the gender digital divide in Latin America: A decomposition analysis," Telecommunications Policy, Elsevier, vol. 45(7).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stpapr:v:63:y:2022:i:6:d:10.1007_s00362-022-01296-x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.