IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v8y2020i6p879-d365757.html
   My bibliography  Save this article

Inference from Non-Probability Surveys with Statistical Matching and Propensity Score Adjustment Using Modern Prediction Techniques

Author

Listed:
  • Luis Castro-Martín

    (Department of Statistics and Operational Research, Faculty of Sciencies, University of Granada, 18071 Granada, Spain)

  • Maria del Mar Rueda

    (Department of Statistics and Operational Research, Faculty of Sciencies, University of Granada, 18071 Granada, Spain)

  • Ramón Ferri-García

    (Department of Statistics and Operational Research, Faculty of Sciencies, University of Granada, 18071 Granada, Spain)

Abstract

Online surveys are increasingly common in social and health studies, as they provide fast and inexpensive results in comparison to traditional ones. However, these surveys often work with biased samples, as the data collection is often non-probabilistic because of the lack of internet coverage in certain population groups and the self-selection procedure that many online surveys rely on. Some procedures have been proposed to mitigate the bias, such as propensity score adjustment (PSA) and statistical matching. In PSA, propensity to participate in a nonprobability survey is estimated using a probability reference survey, and then used to obtain weighted estimates. In statistical matching, the nonprobability sample is used to train models to predict the values of the target variable, and the predictions of the models for the probability sample can be used to estimate population values. In this study, both methods are compared using three datasets to simulate pseudopopulations from which nonprobability and probability samples are drawn and used to estimate population parameters. In addition, the study compares the use of linear models and Machine Learning prediction algorithms in propensity estimation in PSA and predictive modeling in Statistical Matching. The results show that statistical matching outperforms PSA in terms of bias reduction and Root Mean Square Error (RMSE), and that simpler prediction models, such as linear and k-Nearest Neighbors, provide better outcomes than bagging algorithms.

Suggested Citation

  • Luis Castro-Martín & Maria del Mar Rueda & Ramón Ferri-García, 2020. "Inference from Non-Probability Surveys with Statistical Matching and Propensity Score Adjustment Using Modern Prediction Techniques," Mathematics, MDPI, vol. 8(6), pages 1-19, June.
  • Handle: RePEc:gam:jmathe:v:8:y:2020:i:6:p:879-:d:365757
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/8/6/879/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/8/6/879/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. J. B. Copas, 1993. "The Shrinkage of Point Scoring Methods," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 42(2), pages 315-331, June.
    2. Matthias Schonlau & Arthur Van Soest & Arie Kapteyn, 2007. "Are 'Webographic' or Attitudinal Questions Useful for Adjusting Estimates From Web Surveys Using Propensity Scoring?," Working Papers WR-506, RAND Corporation.
    3. Jack Kuang Tsung Chen & Richard L. Valliant & Michael R. Elliott, 2019. "Calibrating non‐probability surveys to estimated control totals using LASSO, with an application to political polling," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 68(3), pages 657-681, April.
    4. J. C. Van Houwelingen, 2001. "Shrinkage and Penalized Likelihood as Methods to Improve Predictive Accuracy," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 55(1), pages 17-34, March.
    5. Matthias Schonlau & Arthur Van Soest & Arie Kapteyn, 2007. "Are 'Webographic' or Attitudinal Questions Useful for Adjusting Estimates From Web Surveys Using Propensity Scoring?," Working Papers 506, RAND Corporation.
    6. Park, Trevor & Casella, George, 2008. "The Bayesian Lasso," Journal of the American Statistical Association, American Statistical Association, vol. 103, pages 681-686, June.
    7. Bart Buelens & Joep Burger & Jan A. van den Brakel, 2018. "Comparing Inference Methods for Non‐probability Samples," International Statistical Review, International Statistical Institute, vol. 86(2), pages 322-343, August.
    8. Baesens, Bart & Viaene, Stijn & Van den Poel, Dirk & Vanthienen, Jan & Dedene, Guido, 2002. "Bayesian neural network learning for repeat purchase modelling in direct marketing," European Journal of Operational Research, Elsevier, vol. 138(1), pages 191-211, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ramón Ferri-García & María del Mar Rueda, 2022. "Variable selection in Propensity Score Adjustment to mitigate selection bias in online surveys," Statistical Papers, Springer, vol. 63(6), pages 1829-1881, December.
    2. Stéphane Legleye & Géraldine Charrance & Nicolas Razafindratsima & Nathalie Bajos & Aline Bohet & Caroline Moreau, 2018. "The Use of a Nonprobability Internet Panel to Monitor Sexual and Reproductive Health in the General Population," Sociological Methods & Research, , vol. 47(2), pages 314-348, March.
    3. Richard Valliant & Jill A. Dever, 2011. "Estimating Propensity Adjustments for Volunteer Web Surveys," Sociological Methods & Research, , vol. 40(1), pages 105-137, February.
    4. Luis Castro-Martín & María del Mar Rueda & Ramón Ferri-García, 2020. "Estimating General Parameters from Non-Probability Surveys Using Propensity Score Adjustment," Mathematics, MDPI, vol. 8(11), pages 1-14, November.
    5. Ferri-García, Ramón & Castro-Martín, Luis & Rueda, María del Mar, 2021. "Evaluating Machine Learning methods for estimation in online surveys with superpopulation modeling," Mathematics and Computers in Simulation (MATCOM), Elsevier, vol. 186(C), pages 19-28.
    6. Buil-Gil, David & Solymosi, Reka & Moretti, Angelo, 2019. "Non-parametric bootstrap and small area estimation to mitigate bias in crowdsourced data. Simulation study and application to perceived safety," SocArXiv 8hgjt, Center for Open Science.
    7. Maciej Berk{e}sewicz & Greta Bia{l}kowska & Krzysztof Marcinkowski & Magdalena Ma'slak & Piotr Opiela & Robert Pater & Katarzyna Zadroga, 2019. "Enhancing the Demand for Labour survey by including skills from online job advertisements using model-assisted calibration," Papers 1908.06731, arXiv.org.
    8. Sunghee Lee & Richard Valliant, 2009. "Estimation for Volunteer Panel Web Surveys Using Propensity Score Adjustment and Calibration Adjustment," Sociological Methods & Research, , vol. 37(3), pages 319-343, February.
    9. repec:aia:aiaswp:wp76 is not listed on IDEAS
    10. Li, Chunyu & Lou, Chenxin & Luo, Dan & Xing, Kai, 2021. "Chinese corporate distress prediction using LASSO: The role of earnings management," International Review of Financial Analysis, Elsevier, vol. 76(C).
    11. Chou, Ping & Chuang, Howard Hao-Chun & Chou, Yen-Chun & Liang, Ting-Peng, 2022. "Predictive analytics for customer repurchase: Interdisciplinary integration of buy till you die modeling and machine learning," European Journal of Operational Research, Elsevier, vol. 296(2), pages 635-651.
    12. Armagan, Artin & Dunson, David, 2011. "Sparse variational analysis of linear mixed models for large data sets," Statistics & Probability Letters, Elsevier, vol. 81(8), pages 1056-1062, August.
    13. Van den Poel, Dirk & Lariviere, Bart, 2004. "Customer attrition analysis for financial services using proportional hazard models," European Journal of Operational Research, Elsevier, vol. 157(1), pages 196-217, August.
    14. Martin Feldkircher & Florian Huber & Gary Koop & Michael Pfarrhofer, 2022. "APPROXIMATE BAYESIAN INFERENCE AND FORECASTING IN HUGE‐DIMENSIONAL MULTICOUNTRY VARs," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 63(4), pages 1625-1658, November.
    15. Eliaz, Kfir & Spiegler, Ran, 2022. "On incentive-compatible estimators," Games and Economic Behavior, Elsevier, vol. 132(C), pages 204-220.
    16. Oguzhan Cepni & I. Ethem Guney & Norman R. Swanson, 2020. "Forecasting and nowcasting emerging market GDP growth rates: The role of latent global economic policy uncertainty and macroeconomic data surprise factors," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 39(1), pages 18-36, January.
    17. Baesens, Bart & Verstraeten, Geert & Van den Poel, Dirk & Egmont-Petersen, Michael & Van Kenhove, Patrick & Vanthienen, Jan, 2004. "Bayesian network classifiers for identifying the slope of the customer lifecycle of long-life customers," European Journal of Operational Research, Elsevier, vol. 156(2), pages 508-523, July.
    18. Hauzenberger, Niko, 2021. "Flexible Mixture Priors for Large Time-varying Parameter Models," Econometrics and Statistics, Elsevier, vol. 20(C), pages 87-108.
    19. Korobilis, Dimitris, 2015. "Quantile forecasts of inflation under model uncertainty," MPRA Paper 64341, University Library of Munich, Germany.
    20. Bernardi, Mauro & Costola, Michele, 2019. "High-dimensional sparse financial networks through a regularised regression model," SAFE Working Paper Series 244, Leibniz Institute for Financial Research SAFE.
    21. Damien Rousselière, 2019. "A Flexible Approach to Age Dependence in Organizational Mortality: Comparing the Life Duration for Cooperative and Non-Cooperative Enterprises Using a Bayesian Generalized Additive Discrete Time Survi," Journal of Quantitative Economics, Springer;The Indian Econometric Society (TIES), vol. 17(4), pages 829-855, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:8:y:2020:i:6:p:879-:d:365757. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.