IDEAS home Printed from https://ideas.repec.org/a/taf/jnlbes/v42y2024i3p958-969.html
   My bibliography  Save this article

Double Machine Learning for Sample Selection Models

Author

Listed:
  • Michela Bia
  • Martin Huber
  • Lukáš Lafférs

Abstract

This article considers the evaluation of discretely distributed treatments when outcomes are only observed for a subpopulation due to sample selection or outcome attrition. For identification, we combine a selection-on-observables assumption for treatment assignment with either selection-on-observables or instrumental variable assumptions concerning the outcome attrition/sample selection process. We also consider dynamic confounding, meaning that covariates that jointly affect sample selection and the outcome may (at least partly) be influenced by the treatment. To control in a data-driven way for a potentially high dimensional set of pre- and/or post-treatment covariates, we adapt the double machine learning framework for treatment evaluation to sample selection problems. We make use of (a) Neyman-orthogonal, doubly robust, and efficient score functions, which imply the robustness of treatment effect estimation to moderate regularization biases in the machine learning-based estimation of the outcome, treatment, or sample selection models and (b) sample splitting (or cross-fitting) to prevent overfitting bias. We demonstrate that the proposed estimators are asymptotically normal and root-n consistent and investigate their finite sample properties in a simulation study. We also apply our proposed methodology to the Job Corps data. The estimator is available in the causalweight package for the statistical software R.

Suggested Citation

  • Michela Bia & Martin Huber & Lukáš Lafférs, 2024. "Double Machine Learning for Sample Selection Models," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 42(3), pages 958-969, July.
  • Handle: RePEc:taf:jnlbes:v:42:y:2024:i:3:p:958-969
    DOI: 10.1080/07350015.2023.2271071
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/07350015.2023.2271071
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/07350015.2023.2271071?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to look for a different version below or search for a different version of it.

    Other versions of this item:

    References listed on IDEAS

    as
    1. James J. Heckman, 1976. "The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models," NBER Chapters, in: Annals of Economic and Social Measurement, Volume 5, number 4, pages 475-492, National Bureau of Economic Research, Inc.
    2. John Fitzgerald & Peter Gottschalk & Robert Moffitt, 1998. "An Analysis of Sample Attrition in Panel Data: The Michigan Panel Study of Income Dynamics," Journal of Human Resources, University of Wisconsin Press, vol. 33(2), pages 251-299.
    3. Richard W. Blundell & James L. Powell, 2004. "Endogeneity in Semiparametric Binary Response Models," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 71(3), pages 655-679.
    4. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    5. Gronau, Reuben, 1974. "Wage Comparisons-A Selectivity Bias," Journal of Political Economy, University of Chicago Press, vol. 82(6), pages 1119-1143, Nov.-Dec..
    6. Alexandre Belloni & Victor Chernozhukov & Christian Hansen, 2014. "Inference on Treatment Effects after Selection among High-Dimensional Controlsâ€," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 81(2), pages 608-650.
    7. Heejung Bang & James M. Robins, 2005. "Doubly Robust Estimation in Missing Data and Causal Inference Models," Biometrics, The International Biometric Society, vol. 61(4), pages 962-973, December.
    8. Whitney K. Newey & James L. Powell & Francis Vella, 1999. "Nonparametric Estimation of Triangular Simultaneous Equations Models," Econometrica, Econometric Society, vol. 67(3), pages 565-604, May.
    9. Martin Huber, 2012. "Identification of Average Treatment Effects in Social Experiments Under Alternative Forms of Attrition," Journal of Educational and Behavioral Statistics, , vol. 37(3), pages 443-474, June.
    10. Jeffrey M. Wooldridge, 2002. "Inverse probability weighted M-estimators for sample selection, attrition, and stratification," Portuguese Economic Journal, Springer;Instituto Superior de Economia e Gestao, vol. 1(2), pages 117-139, August.
    11. Martin Huber & Blaise Melly, 2015. "A Test of the Conditional Independence Assumption in Sample Selection Models," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 30(7), pages 1144-1168, November.
    12. Guido W. Imbens & Jeffrey M. Wooldridge, 2009. "Recent Developments in the Econometrics of Program Evaluation," Journal of Economic Literature, American Economic Association, vol. 47(1), pages 5-86, March.
    13. Heckman, James, 2013. "Sample selection bias as a specification error," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 31(3), pages 129-137.
    14. Guido W. Imbens & Whitney K. Newey, 2009. "Identification and Estimation of Triangular Simultaneous Equations Models Without Additivity," Econometrica, Econometric Society, vol. 77(5), pages 1481-1512, September.
    15. Kosuke Imai, 2009. "Statistical analysis of randomized experiments with non‐ignorable missing binary outcomes: an application to a voting experiment," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 58(1), pages 83-104, February.
    16. Joshua Angrist & Eric Bettinger & Michael Kremer, 2006. "Long-Term Educational Consequences of Secondary School Vouchers: Evidence from Administrative Records in Colombia," American Economic Review, American Economic Association, vol. 96(3), pages 847-862, June.
    17. Mitali Das & Whitney K. Newey & Francis Vella, 2003. "Nonparametric Estimation of Sample Selection Models," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 70(1), pages 33-58.
    18. Ye Luo & Martin Spindler & Jannis Kuck, 2016. "High-Dimensional $L_2$Boosting: Rate of Convergence," Papers 1602.08927, arXiv.org, revised Jul 2022.
    19. Guido W. Imbens, 2004. "Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review," The Review of Economics and Statistics, MIT Press, vol. 86(1), pages 4-29, February.
    20. Hausman, Jerry A & Wise, David A, 1979. "Attrition Bias in Experimental and Panel Data: The Gary Income Maintenance Experiment," Econometrica, Econometric Society, vol. 47(2), pages 455-473, March.
    21. Ahn, Hyungtaik & Powell, James L., 1993. "Semiparametric estimation of censored selection models with a nonparametric selection mechanism," Journal of Econometrics, Elsevier, vol. 58(1-2), pages 3-29, July.
    22. Abowd J.M. & Crepon B. & Kramarz F., 2001. "Moment Estimation With Attrition: An Application to Economic Models," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1223-1231, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Zhewen Pan & Yifan Zhang, 2024. "Locally robust semiparametric estimation of sample selection models without exclusion restrictions," Papers 2412.01208, arXiv.org.
    2. Haowen Bao & Yongmiao Hong & Yuying Sun & Shouyang Wang, 2024. "Sparse Interval-valued Time Series Modeling with Machine Learning," Papers 2411.09452, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Martin Huber & Anna Solovyeva, 2020. "Direct and Indirect Effects under Sample Selection and Outcome Attrition," Econometrics, MDPI, vol. 8(4), pages 1-25, December.
    2. Martin Huber, 2012. "Identification of Average Treatment Effects in Social Experiments Under Alternative Forms of Attrition," Journal of Educational and Behavioral Statistics, , vol. 37(3), pages 443-474, June.
    3. Martin Huber, 2014. "Treatment Evaluation in the Presence of Sample Selection," Econometric Reviews, Taylor & Francis Journals, vol. 33(8), pages 869-905, November.
    4. Martin Huber, 2010. "Identification of average treatment effects in social experiments under different forms of attrition," University of St. Gallen Department of Economics working paper series 2010 2010-22, Department of Economics, University of St. Gallen.
    5. Markus Frölich & Martin Huber, 2014. "Treatment Evaluation With Multiple Outcome Periods Under Endogeneity and Attrition," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(508), pages 1697-1711, December.
    6. Blundell, Richard & Powell, James L., 2007. "Censored regression quantiles with endogenous regressors," Journal of Econometrics, Elsevier, vol. 141(1), pages 65-83, November.
    7. Martin Huber & Giovanni Mellace, 2015. "Sharp Bounds on Causal Effects under Sample Selection," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 77(1), pages 129-151, February.
    8. Martin Huber & Giovanni Mellace, 2014. "Testing exclusion restrictions and additive separability in sample selection models," Empirical Economics, Springer, vol. 47(1), pages 75-92, August.
    9. Guido W. Imbens & Jeffrey M. Wooldridge, 2009. "Recent Developments in the Econometrics of Program Evaluation," Journal of Economic Literature, American Economic Association, vol. 47(1), pages 5-86, March.
    10. Lewbel, Arthur, 2007. "Endogenous selection or treatment model estimation," Journal of Econometrics, Elsevier, vol. 141(2), pages 777-806, December.
    11. Richard Blundell & Monica Costa Dias, 2009. "Alternative Approaches to Evaluation in Empirical Microeconomics," Journal of Human Resources, University of Wisconsin Press, vol. 44(3).
    12. Hans Fricke & Markus Frölich & Martin Huber & Michael Lechner, 2020. "Endogeneity and non‐response bias in treatment evaluation – nonparametric identification of causal effects by instruments," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 35(5), pages 481-504, August.
    13. Rahul Singh, 2021. "Generalized Kernel Ridge Regression for Causal Inference with Missing-at-Random Sample Selection," Papers 2111.05277, arXiv.org.
    14. Escanciano, Juan Carlos & Jacho-Chávez, David T. & Lewbel, Arthur, 2014. "Uniform convergence of weighted sums of non and semiparametric residuals for estimation and testing," Journal of Econometrics, Elsevier, vol. 178(P3), pages 426-443.
    15. Ruoyao Shi, 2021. "An Averaging Estimator for Two Step M Estimation in Semiparametric Models," Working Papers 202105, University of California at Riverside, Department of Economics.
    16. Huber, Martin & Mellace, Giovanni, 2011. "Testing instrument validity in sample selection models," Economics Working Paper Series 1145, University of St. Gallen, School of Economics and Political Science.
    17. Bodory, Hugo & Huber, Martin, 2018. "The causalweight package for causal inference in R," FSES Working Papers 493, Faculty of Economics and Social Sciences, University of Freiburg/Fribourg Switzerland.
    18. Juan Carlos Escanciano & Telmo P'erez-Izquierdo, 2023. "Automatic Locally Robust Estimation with Generated Regressors," Papers 2301.10643, arXiv.org, revised Nov 2023.
    19. Hamermesh, Daniel S. & Donald, Stephen G., 2008. "The effect of college curriculum on earnings: An affinity identifier for non-ignorable non-response bias," Journal of Econometrics, Elsevier, vol. 144(2), pages 479-491, June.
    20. Hugo Bodory & Martin Huber & Lukáš Lafférs, 2022. "Evaluating (weighted) dynamic treatment effects by double machine learning [Identification of causal effects using instrumental variables]," The Econometrics Journal, Royal Economic Society, vol. 25(3), pages 628-648.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:jnlbes:v:42:y:2024:i:3:p:958-969. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/UBES20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.