IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v168y2022ics0167947321002164.html
   My bibliography  Save this article

Correcting for sample selection bias in Bayesian distributional regression models

Author

Listed:
  • Wiemann, Paul F.V.
  • Klein, Nadja
  • Kneib, Thomas

Abstract

In the presence of non-randomly selected samples, many statistical models, including standard regression models, can fail. In particular, without accounting for the underlying selection process estimates might be biased. Sample selection models can correct this bias when an informative selection process governs the availability of the outcome of interest. A copula based approach is presented for correcting the sample selection bias in Bayesian structured additive distributional regression models. This framework relaxes the distributional assumption on the response of the linear or the generalized linear model and models all distributional parameters as functions of the covariates. Covariate effects are not limited to being purely linear and other effect types, such as smooth functional effects, are available. As a consequence, the approach presented provides increased flexibility with respect to the dependence structure, the available predictor specifications and the choice of the marginal distributions compared to Heckman's classic sample selection model. To facilitate estimation in such a complex model, a fully Bayesian approach based on Markov chain Monte Carlo simulations is developed and the presented methodology is empirically evaluated. Furthermore, the introduced approach is compared to a frequentist competitor and an application on a data set from psychological judge-advisor research is presented.

Suggested Citation

  • Wiemann, Paul F.V. & Klein, Nadja & Kneib, Thomas, 2022. "Correcting for sample selection bias in Bayesian distributional regression models," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).
  • Handle: RePEc:eee:csdana:v:168:y:2022:i:c:s0167947321002164
    DOI: 10.1016/j.csda.2021.107382
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947321002164
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2021.107382?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Giampiero Marra & Rosalba Radice & Till Bärnighausen & Simon N. Wood & Mark E. McGovern, 2017. "A Simultaneous Equation Approach to Estimating HIV Prevalence With Nonignorable Missing Responses," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(518), pages 484-496, April.
    2. Sniezek, Janet A. & Buckley, Timothy, 1995. "Cueing and Cognitive Conflict in Judge-Advisor Decision Making," Organizational Behavior and Human Decision Processes, Elsevier, vol. 62(2), pages 159-174, May.
    3. Murray D. Smith, 2003. "Modelling sample selection using Archimedean copulas," Econometrics Journal, Royal Economic Society, vol. 6(1), pages 99-123, June.
    4. Emmanuel O. Ogundimu & Jane L. Hutton, 2016. "A Sample Selection Model with Skew-normal Distribution," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 43(1), pages 172-190, March.
    5. Chen, Songnian & Zhou, Yahong, 2010. "Semiparametric and nonparametric estimation of sample selection models under symmetry," Journal of Econometrics, Elsevier, vol. 157(1), pages 143-150, July.
    6. Whitney K. Newey, 2009. "Two-step series estimation of sample selection models," Econometrics Journal, Royal Economic Society, vol. 12(s1), pages 217-229, January.
    7. Elif F. Acar & Radu V. Craiu & Fang Yao, 2011. "Dependence Calibration in Conditional Copulas: A Nonparametric Approach," Biometrics, The International Biometric Society, vol. 67(2), pages 445-453, June.
    8. Francis Vella, 1998. "Estimating Models with Sample Selection Bias: A Survey," Journal of Human Resources, University of Wisconsin Press, vol. 33(1), pages 127-169.
    9. Koenker,Roger, 2005. "Quantile Regression," Cambridge Books, Cambridge University Press, number 9780521845731.
    10. Manuel Wiesenfarth & Thomas Kneib, 2010. "Bayesian geoadditive sample selection models," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 59(3), pages 381-404, May.
    11. David J. Spiegelhalter & Nicola G. Best & Bradley P. Carlin & Angelika Van Der Linde, 2002. "Bayesian measures of model complexity and fit," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(4), pages 583-639, October.
    12. David S. Lee, 2009. "Training, Wages, and Sample Selection: Estimating Sharp Bounds on Treatment Effects," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 76(3), pages 1071-1102.
    13. Marra, Giampiero & Radice, Rosalba, 2013. "Estimation of a regression spline sample selection model," Computational Statistics & Data Analysis, Elsevier, vol. 61(C), pages 158-173.
    14. Wojtyś, Małgorzata & Marra, Giampiero & Radice, Rosalba, 2018. "Copula based generalized additive models for location, scale and shape with non-random sample selection," Computational Statistics & Data Analysis, Elsevier, vol. 127(C), pages 1-14.
    15. Patrick Puhani, 2000. "The Heckman Correction for Sample Selection and Its Critique," Journal of Economic Surveys, Wiley Blackwell, vol. 14(1), pages 53-68, February.
    16. Ding, Peng, 2014. "Bayesian robust inference of sample selection using selection-t models," Journal of Multivariate Analysis, Elsevier, vol. 124(C), pages 451-464.
    17. Ahn, Hyungtaik & Powell, James L., 1993. "Semiparametric estimation of censored selection models with a nonparametric selection mechanism," Journal of Econometrics, Elsevier, vol. 58(1-2), pages 3-29, July.
    18. Lee, Lung-Fei, 1983. "Generalized Econometric Models with Selectivity," Econometrica, Econometric Society, vol. 51(2), pages 507-512, March.
    19. Heckman, James, 2013. "Sample selection bias as a specification error," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 31(3), pages 129-137.
    20. Torsten Hothorn & Thomas Kneib & Peter Bühlmann, 2014. "Conditional transformation models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(1), pages 3-27, January.
    21. Nadja Klein & Thomas Kneib & Stefan Lang, 2015. "Bayesian Generalized Additive Models for Location, Scale, and Shape for Zero-Inflated and Overdispersed Count Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(509), pages 405-419, March.
    22. Omori, Yasuhiro, 2007. "Efficient Gibbs sampler for Bayesian analysis of a sample selection model," Statistics & Probability Letters, Elsevier, vol. 77(12), pages 1300-1311, July.
    23. repec:cup:judgdm:v:10:y:2015:i:2:p:144-171 is not listed on IDEAS
    24. Margarita Genius & Elisabetta Strazzera, 2008. "Applying the copula approach to sample selection modelling," Applied Economics, Taylor & Francis Journals, vol. 40(11), pages 1443-1455.
    25. van Hasselt, Martijn, 2011. "Bayesian inference in a sample selection model," Journal of Econometrics, Elsevier, vol. 165(2), pages 221-232.
    26. R. A. Rigby & D. M. Stasinopoulos, 2005. "Generalized additive models for location, scale and shape," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 54(3), pages 507-554, June.
    27. Newey, Whitney K & Powell, James L, 1987. "Asymmetric Least Squares Estimation and Testing," Econometrica, Econometric Society, vol. 55(4), pages 819-847, July.
    28. Marra, Giampiero & Radice, Rosalba, 2017. "Bivariate copula additive models for location, scale and shape," Computational Statistics & Data Analysis, Elsevier, vol. 112(C), pages 99-113.
    29. Noël Veraverbeke & Marek Omelka & Irène Gijbels, 2011. "Estimation of a Conditional Copula and Association Measures," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 38(4), pages 766-780, December.
    30. Yulia V. Marchenko & Marc G. Genton, 2012. "A Heckman Selection- t Model," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(497), pages 304-317, March.
    31. Chib, Siddhartha, 1992. "Bayes inference in the Tobit censored regression model," Journal of Econometrics, Elsevier, vol. 51(1-2), pages 79-99.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wojtyś, Małgorzata & Marra, Giampiero & Radice, Rosalba, 2018. "Copula based generalized additive models for location, scale and shape with non-random sample selection," Computational Statistics & Data Analysis, Elsevier, vol. 127(C), pages 1-14.
    2. Mikhail Zhelonkin & Marc G. Genton & Elvezio Ronchetti, 2016. "Robust inference in sample selection models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(4), pages 805-827, September.
    3. Marra, Giampiero & Wyszynski, Karol, 2016. "Semi-parametric copula sample selection models for count responses," Computational Statistics & Data Analysis, Elsevier, vol. 104(C), pages 110-129.
    4. Marra, Giampiero & Radice, Rosalba, 2013. "Estimation of a regression spline sample selection model," Computational Statistics & Data Analysis, Elsevier, vol. 61(C), pages 158-173.
    5. Karol Wyszynski & Giampiero Marra, 2018. "Sample selection models for count data in R," Computational Statistics, Springer, vol. 33(3), pages 1385-1412, September.
    6. Emmanuel O. Ogundimu & Jane L. Hutton, 2016. "A Sample Selection Model with Skew-normal Distribution," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 43(1), pages 172-190, March.
    7. Hasebe, Takuya & Vijverberg, Wim P., 2012. "A Flexible Sample Selection Model: A GTL-Copula Approach," IZA Discussion Papers 7003, Institute of Labor Economics (IZA).
    8. Liu, Ruixuan & Yu, Zhengfei, 2022. "Sample selection models with monotone control functions," Journal of Econometrics, Elsevier, vol. 226(2), pages 321-342.
    9. Schwiebert, Jörg, 2012. "Analyzing the Composition of the Female Workforce - A Semiparametric Copula Approach," Hannover Economic Papers (HEP) dp-503, Leibniz Universität Hannover, Wirtschaftswissenschaftliche Fakultät.
    10. Wojtyś, Magorzata & Marra, Giampiero & Radice, Rosalba, 2016. "Copula Regression Spline Sample Selection Models: The R Package SemiParSampleSel," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 71(i06).
    11. Kneib, Thomas & Silbersdorff, Alexander & Säfken, Benjamin, 2023. "Rage Against the Mean – A Review of Distributional Regression Approaches," Econometrics and Statistics, Elsevier, vol. 26(C), pages 99-123.
    12. Victor Chernozhukov & Ivan Fernandez-Val & Siyi Luo, 2018. "Distribution regression with sample selection, with an application to wage decompositions in the UK," CeMMAP working papers CWP68/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    13. Yulia V. Marchenko & Marc G. Genton, 2012. "A Heckman Selection- t Model," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(497), pages 304-317, March.
    14. Manuel Arellano & Stéphane Bonhomme, 2017. "Quantile Selection Models With an Application to Understanding Changes in Wage Inequality," Econometrica, Econometric Society, vol. 85, pages 1-28, January.
    15. Lachos, Victor H. & Prates, Marcos O. & Dey, Dipak K., 2021. "Heckman selection-t model: Parameter estimation via the EM-algorithm," Journal of Multivariate Analysis, Elsevier, vol. 184(C).
    16. McGovern, Mark E. & Canning, David & Bärnighausen, Till, 2018. "Accounting for non-response bias using participation incentives and survey design: An application using gift vouchers," Economics Letters, Elsevier, vol. 171(C), pages 239-244.
    17. Victor Chernozhukov & Ivan Fernandez-Val & Siyi Luo, 2023. "Distribution regression with sample selection and UK wage decomposition," CeMMAP working papers 09/23, Institute for Fiscal Studies.
    18. Sizhong Sun, 2023. "Firm heterogeneity, worker training and labor productivity: the role of endogenous self-selection," Journal of Productivity Analysis, Springer, vol. 59(2), pages 121-133, April.
    19. Seonho Shin, 2022. "To work or not? Wages or subsidies?: Copula-based evidence of subsidized refugees’ negative selection into employment," Empirical Economics, Springer, vol. 63(4), pages 2209-2252, October.
    20. Nadja Klein & Torsten Hothorn & Luisa Barbanti & Thomas Kneib, 2022. "Multivariate conditional transformation models," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 49(1), pages 116-142, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:168:y:2022:i:c:s0167947321002164. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.