IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v104y2016icp110-129.html
   My bibliography  Save this article

Semi-parametric copula sample selection models for count responses

Author

Listed:
  • Marra, Giampiero
  • Wyszynski, Karol

Abstract

In observational studies, a response of interest (as well as some individual level characteristics) may be observed for a non-randomly selected sample of the population. In this situation, standard models such as linear and probit regressions will yield biased and inconsistent parameter estimates. Selection models can address this issue and mainly consist of two regressions: a binary selection equation which determines whether the statistical units will enter the sample, and an outcome equation which models the response. While sample selection models for continuous and binary outcomes have been widely studied in the literature, the case of count response has not received as much attention. Sample selection models for count data which allow for the use of potentially any discrete distribution, non-Gaussian dependencies between the selection and outcome equations, and flexible covariate effects are introduced. The estimation algorithm is based on the penalized likelihood estimation framework. The method is illustrated in simulation and using data from a United States Veterans Administration Survey.

Suggested Citation

  • Marra, Giampiero & Wyszynski, Karol, 2016. "Semi-parametric copula sample selection models for count responses," Computational Statistics & Data Analysis, Elsevier, vol. 104(C), pages 110-129.
  • Handle: RePEc:eee:csdana:v:104:y:2016:i:c:p:110-129
    DOI: 10.1016/j.csda.2016.06.003
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947316301402
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2016.06.003?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Cameron,A. Colin & Trivedi,Pravin K., 2013. "Regression Analysis of Count Data," Cambridge Books, Cambridge University Press, number 9781107667273.
    2. James J. Heckman, 1976. "The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models," NBER Chapters, in: Annals of Economic and Social Measurement, Volume 5, number 4, pages 475-492, National Bureau of Economic Research, Inc.
    3. Omori, Yasuhiro & Miyawaki, Koji, 2010. "Tobit model with covariate dependent thresholds," Computational Statistics & Data Analysis, Elsevier, vol. 54(11), pages 2736-2752, November.
    4. Murray D. Smith, 2003. "Modelling sample selection using Archimedean copulas," Econometrics Journal, Royal Economic Society, vol. 6(1), pages 99-123, June.
    5. Trivedi, Pravin K. & Zimmer, David M., 2007. "Copula Modeling: An Introduction for Practitioners," Foundations and Trends(R) in Econometrics, now publishers, vol. 1(1), pages 1-111, April.
    6. Kajal Lahiri & Guibo Xing, 2004. "An econometric analysis of veterans’ health care utilization using two-part models," Empirical Economics, Springer, vol. 29(2), pages 431-449, May.
    7. William H. Greene, 1997. "FIML Estimation of Sample Selection Models for Count Data," Working Papers 97-02, New York University, Leonard N. Stern School of Business, Department of Economics.
    8. McGovern, Mark E. & Bärnighausen, Till & Giampiero Marra & Rosalba Radice, 2015. "On the Assumption of Bivariate Normality in Selection Models: A Copula Approach Applied to Estimating HIV Prevalence," Working Paper 199101, Harvard University OpenScholar.
    9. Terza, Joseph V., 1998. "Estimating count data models with endogenous switching: Sample selection and endogenous treatment effects," Journal of Econometrics, Elsevier, vol. 84(1), pages 129-154, May.
    10. A. Colin Cameron & Tong Li & Pravin K. Trivedi & David M. Zimmer, 2004. "Modelling the differences in counted outcomes using bivariate copula models with application to mismeasured counts," Econometrics Journal, Royal Economic Society, vol. 7(2), pages 566-584, December.
    11. Aristidis Nikoloulopoulos & Dimitris Karlis, 2010. "Regression in a copula model for bivariate count data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 37(9), pages 1555-1568.
    12. Genest, Christian & Nešlehová, Johanna, 2007. "A Primer on Copulas for Count Data," ASTIN Bulletin, Cambridge University Press, vol. 37(2), pages 475-515, November.
    13. Hasebe, Takuya & Vijverberg, Wim P., 2012. "A Flexible Sample Selection Model: A GTL-Copula Approach," IZA Discussion Papers 7003, Institute of Labor Economics (IZA).
    14. Gronau, Reuben, 1974. "Wage Comparisons-A Selectivity Bias," Journal of Political Economy, University of Chicago Press, vol. 82(6), pages 1119-1143, Nov.-Dec..
    15. Simon N. Wood, 2004. "Stable and Efficient Multiple Smoothing Parameter Estimation for Generalized Additive Models," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 673-686, January.
    16. Manuel Wiesenfarth & Thomas Kneib, 2010. "Bayesian geoadditive sample selection models," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 59(3), pages 381-404, May.
    17. Wojtyś, Magorzata & Marra, Giampiero & Radice, Rosalba, 2016. "Copula Regression Spline Sample Selection Models: The R Package SemiParSampleSel," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 71(i06).
    18. Marra, Giampiero & Radice, Rosalba, 2013. "Estimation of a regression spline sample selection model," Computational Statistics & Data Analysis, Elsevier, vol. 61(C), pages 158-173.
    19. Alfonso Miranda & Sophia Rabe-Hesketh, 2006. "Maximum likelihood estimation of endogenous switching and sample selection models for binary, ordinal, and count variables," Stata Journal, StataCorp LP, vol. 6(3), pages 285-308, September.
    20. Patrick Puhani, 2000. "The Heckman Correction for Sample Selection and Its Critique," Journal of Economic Surveys, Wiley Blackwell, vol. 14(1), pages 53-68, February.
    21. Ruppert,David & Wand,M. P. & Carroll,R. J., 2003. "Semiparametric Regression," Cambridge Books, Cambridge University Press, number 9780521785167, September.
    22. Li, Phillip, 2011. "Estimation of sample selection models with two selection mechanisms," Computational Statistics & Data Analysis, Elsevier, vol. 55(2), pages 1099-1108, February.
    23. Mealli, Fabrizia & Pacini, Barbara, 2008. "Comparing principal stratification and selection models in parametric causal inference with nonignorable missingness," Computational Statistics & Data Analysis, Elsevier, vol. 53(2), pages 507-516, December.
    24. Brechmann, Eike Christian & Schepsmeier, Ulf, 2013. "Modeling Dependence with C- and D-Vine Copulas: The R Package CDVine," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 52(i03).
    25. Simon N. Wood, 2013. "On p-values for smooth components of an extended generalized additive model," Biometrika, Biometrika Trust, vol. 100(1), pages 221-228.
    26. Heckman, James, 2013. "Sample selection bias as a specification error," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 31(3), pages 129-137.
    27. Margarita Genius & Elisabetta Strazzera, 2008. "Applying the copula approach to sample selection modelling," Applied Economics, Taylor & Francis Journals, vol. 40(11), pages 1443-1455.
    28. Lewis, H Gregg, 1974. "Comments on Selectivity Biases in Wage Comparisons," Journal of Political Economy, University of Chicago Press, vol. 82(6), pages 1145-1155, Nov.-Dec..
    29. Braun, Michael, 2014. "trustOptim: An R Package for Trust Region Optimization with Sparse Hessians," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 60(i04).
    30. A. Colin Cameron & Tong Li & Pravin K. Trivedi & David M. Zimmer, 2004. "Modelling the differences in counted outcomes using bivariate copula models with application to mismeasured counts," Econometrics Journal, Royal Economic Society, vol. 7(2), pages 566-584, December.
    31. R. A. Rigby & D. M. Stasinopoulos, 2005. "Generalized additive models for location, scale and shape," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 54(3), pages 507-554, June.
    32. Stasinopoulos, D. Mikis & Rigby, Robert A., 2007. "Generalized Additive Models for Location Scale and Shape (GAMLSS) in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 23(i07).
    33. Ruppert,David & Wand,M. P. & Carroll,R. J., 2003. "Semiparametric Regression," Cambridge Books, Cambridge University Press, number 9780521780506, September.
    34. Giampiero Marra & Simon N. Wood, 2012. "Coverage Properties of Confidence Intervals for Generalized Additive Model Components," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 39(1), pages 53-74, March.
    35. Inyoung Kim & Noah D. Cohen & Raymond J. Carroll, 2003. "Semiparametric Regression Splines in Matched Case-Control Studies," Biometrics, The International Biometric Society, vol. 59(4), pages 1158-1169, December.
    36. Yulia V. Marchenko & Marc G. Genton, 2012. "A Heckman Selection- t Model," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(497), pages 304-317, March.
    37. Claudia PIGINI, 2012. "Of Butterflies and Caterpillars: Bivariate Normality in the Sample Selection Model," Working Papers 377, Universita' Politecnica delle Marche (I), Dipartimento di Scienze Economiche e Sociali.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Karol Wyszynski & Giampiero Marra, 2018. "Sample selection models for count data in R," Computational Statistics, Springer, vol. 33(3), pages 1385-1412, September.
    2. Adelchi Azzalini & Hyoung-Moon Kim & Hea-Jung Kim, 2019. "Sample selection models for discrete and other non-Gaussian response variables," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 28(1), pages 27-56, March.
    3. Wojtyś, Małgorzata & Marra, Giampiero & Radice, Rosalba, 2018. "Copula based generalized additive models for location, scale and shape with non-random sample selection," Computational Statistics & Data Analysis, Elsevier, vol. 127(C), pages 1-14.
    4. Seonho Shin, 2022. "To work or not? Wages or subsidies?: Copula-based evidence of subsidized refugees’ negative selection into employment," Empirical Economics, Springer, vol. 63(4), pages 2209-2252, October.
    5. Hamori, Shigeyuki & Motegi, Kaiji & Zhang, Zheng, 2019. "Calibration estimation of semiparametric copula models with data missing at random," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 85-109.
    6. Yajie Zou & Xinzhi Zhong & Jinjun Tang & Xin Ye & Lingtao Wu & Muhammad Ijaz & Yinhai Wang, 2019. "A Copula-Based Approach for Accommodating the Underreporting Effect in Wildlife‒Vehicle Crash Analysis," Sustainability, MDPI, vol. 11(2), pages 1-13, January.
    7. Pierfrancesco Alaimo Di Loro & Daria Scacciatelli & Giovanna Tagliaferri, 2023. "2-step Gradient Boosting approach to selectivity bias correction in tax audit: an application to the VAT gap in Italy," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 32(1), pages 237-270, March.
    8. Ding, Chuan & Cao, Xinyu & Yu, Bin & Ju, Yang, 2021. "Non-linear associations between zonal built environment attributes and transit commuting mode choice accounting for spatial heterogeneity," Transportation Research Part A: Policy and Practice, Elsevier, vol. 148(C), pages 22-35.
    9. Tzougas, George & Makariou, Despoina, 2022. "The multivariate Poisson-Generalized Inverse Gaussian claim count regression model with varying dispersion and shape parameters," LSE Research Online Documents on Economics 117197, London School of Economics and Political Science, LSE Library.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Karol Wyszynski & Giampiero Marra, 2018. "Sample selection models for count data in R," Computational Statistics, Springer, vol. 33(3), pages 1385-1412, September.
    2. Marra, Giampiero & Radice, Rosalba, 2013. "Estimation of a regression spline sample selection model," Computational Statistics & Data Analysis, Elsevier, vol. 61(C), pages 158-173.
    3. Wojtyś, Małgorzata & Marra, Giampiero & Radice, Rosalba, 2018. "Copula based generalized additive models for location, scale and shape with non-random sample selection," Computational Statistics & Data Analysis, Elsevier, vol. 127(C), pages 1-14.
    4. Marra Giampiero & Radice Rosalba, 2017. "A joint regression modeling framework for analyzing bivariate binary data in R," Dependence Modeling, De Gruyter, vol. 5(1), pages 268-294, December.
    5. Wojtyś, Magorzata & Marra, Giampiero & Radice, Rosalba, 2016. "Copula Regression Spline Sample Selection Models: The R Package SemiParSampleSel," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 71(i06).
    6. Giampiero Marra & Rosalba Radice & Till Bärnighausen & Simon N. Wood & Mark E. McGovern, 2017. "A Simultaneous Equation Approach to Estimating HIV Prevalence With Nonignorable Missing Responses," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(518), pages 484-496, April.
    7. Wiemann, Paul F.V. & Klein, Nadja & Kneib, Thomas, 2022. "Correcting for sample selection bias in Bayesian distributional regression models," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).
    8. Marra, Giampiero & Radice, Rosalba, 2017. "Bivariate copula additive models for location, scale and shape," Computational Statistics & Data Analysis, Elsevier, vol. 112(C), pages 99-113.
    9. Maike Hohberg & Francesco Donat & Giampiero Marra & Thomas Kneib, 2021. "Beyond unidimensional poverty analysis using distributional copula models for mixed ordered‐continuous outcomes," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(5), pages 1365-1390, November.
    10. Bhat, Chandra R. & Eluru, Naveen, 2009. "A copula-based approach to accommodate residential self-selection effects in travel behavior modeling," Transportation Research Part B: Methodological, Elsevier, vol. 43(7), pages 749-765, August.
    11. Schmidt, Rouven & Kneib, Thomas, 2023. "Multivariate distributional stochastic frontier models," Computational Statistics & Data Analysis, Elsevier, vol. 187(C).
    12. Nadja Klein & Thomas Kneib & Giampiero Marra & Rosalba Radice & Slawa Rokicki & Mark E. McGovern, 2018. "Mixed Binary-Continuous Copula Regression Models with Application to Adverse Birth Outcomes," CHaRMS Working Papers 18-06, Centre for HeAlth Research at the Management School (CHaRMS).
    13. Seonho Shin, 2022. "To work or not? Wages or subsidies?: Copula-based evidence of subsidized refugees’ negative selection into employment," Empirical Economics, Springer, vol. 63(4), pages 2209-2252, October.
    14. Mikhail Zhelonkin & Marc G. Genton & Elvezio Ronchetti, 2016. "Robust inference in sample selection models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(4), pages 805-827, September.
    15. Øystein Sørensen & Anders M. Fjell & Kristine B. Walhovd, 2023. "Longitudinal Modeling of Age-Dependent Latent Traits with Generalized Additive Latent and Mixed Models," Psychometrika, Springer;The Psychometric Society, vol. 88(2), pages 456-486, June.
    16. Klein, Nadja & Denuit, Michel & Lang, Stefan & Kneib, Thomas, 2013. "Nonlife Ratemaking and Risk Management with Bayesian Additive Models for Location, Scale and Shape," LIDAM Discussion Papers ISBA 2013045, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    17. Giampiero Marra & Rosalba Radice & David M. Zimmer, 2020. "Estimating the binary endogenous effect of insurance on doctor visits by copula‐based regression additive models," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 69(4), pages 953-971, August.
    18. Tzougas, George & Makariou, Despoina, 2022. "The multivariate Poisson-Generalized Inverse Gaussian claim count regression model with varying dispersion and shape parameters," LSE Research Online Documents on Economics 117197, London School of Economics and Political Science, LSE Library.
    19. Hasebe, Takuya & Vijverberg, Wim P., 2012. "A Flexible Sample Selection Model: A GTL-Copula Approach," IZA Discussion Papers 7003, Institute of Labor Economics (IZA).
    20. Nathaniel E. Helwig, 2022. "Robust Permutation Tests for Penalized Splines," Stats, MDPI, vol. 5(3), pages 1-18, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:104:y:2016:i:c:p:110-129. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.