IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2409.09894.html
   My bibliography  Save this paper

Estimating Wage Disparities Using Foundation Models

Author

Listed:
  • Keyon Vafa
  • Susan Athey
  • David M. Blei

Abstract

One thread of empirical work in social science focuses on decomposing group differences in outcomes into unexplained components and components explained by observable factors. In this paper, we study gender wage decompositions, which require estimating the portion of the gender wage gap explained by career histories of workers. Classical methods for decomposing the wage gap employ simple predictive models of wages which condition on a small set of simple summaries of labor history. The problem is that these predictive models cannot take advantage of the full complexity of a worker's history, and the resulting decompositions thus suffer from omitted variable bias (OVB), where covariates that are correlated with both gender and wages are not included in the model. Here we explore an alternative methodology for wage gap decomposition that employs powerful foundation models, such as large language models, as the predictive engine. Foundation models excel at making accurate predictions from complex, high-dimensional inputs. We use a custom-built foundation model, designed to predict wages from full labor histories, to decompose the gender wage gap. We prove that the way such models are usually trained might still lead to OVB, but develop fine-tuning algorithms that empirically mitigate this issue. Our model captures a richer representation of career history than simple models and predicts wages more accurately. In detail, we first provide a novel set of conditions under which an estimator of the wage gap based on a fine-tuned foundation model is $\sqrt{n}$-consistent. Building on the theory, we then propose methods for fine-tuning foundation models that minimize OVB. Using data from the Panel Study of Income Dynamics, we find that history explains more of the gender wage gap than standard econometric models can measure, and we identify elements of history that are important for reducing OVB.

Suggested Citation

  • Keyon Vafa & Susan Athey & David M. Blei, 2024. "Estimating Wage Disparities Using Foundation Models," Papers 2409.09894, arXiv.org.
  • Handle: RePEc:arx:papers:2409.09894
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2409.09894
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Alexandre Belloni & Victor Chernozhukov & Christian Hansen, 2014. "High-Dimensional Methods and Inference on Structural and Treatment Effects," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 29-50, Spring.
    2. Victor Chernozhukov & Carlos Cinelli & Whitney Newey & Amit Sharma & Vasilis Syrgkanis, 2021. "Long Story Short: Omitted Variable Bias in Causal Machine Learning," Papers 2112.13398, arXiv.org, revised May 2024.
    3. Jon Kleinberg & Jens Ludwig & Sendhil Mullainathan & Ziad Obermeyer, 2015. "Prediction Policy Problems," American Economic Review, American Economic Association, vol. 105(5), pages 491-495, May.
    4. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    5. Robinson, Peter M, 1988. "Root- N-Consistent Semiparametric Regression," Econometrica, Econometric Society, vol. 56(4), pages 931-954, July.
    6. David H. Autor & David Dorn, 2013. "The Growth of Low-Skill Service Jobs and the Polarization of the US Labor Market," American Economic Review, American Economic Association, vol. 103(5), pages 1553-1597, August.
    7. Athey, Susan & Imbens, Guido W. & Wager, Stefan, 2016. "Efficient Inference of Average Treatment Effects in High Dimensions via Approximate Residual Balancing," Research Papers 3408, Stanford University, Graduate School of Business.
    8. Francine D. Blau & Lawrence M. Kahn, 2017. "The Gender Wage Gap: Extent, Trends, and Explanations," Journal of Economic Literature, American Economic Association, vol. 55(3), pages 789-865, September.
    9. Richard K. Crump & V. Joseph Hotz & Guido W. Imbens & Oscar A. Mitnik, 2006. "Moving the Goalposts: Addressing Limited Overlap in the Estimation of Average Treatment Effects by Changing the Estimand," NBER Technical Working Papers 0330, National Bureau of Economic Research, Inc.
    10. Oaxaca, Ronald, 1973. "Male-Female Wage Differentials in Urban Labor Markets," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 14(3), pages 693-709, October.
    11. Juan C. Perdomo & Tolani Britton & Moritz Hardt & Rediet Abebe, 2023. "Difficult Lessons on Social Prediction from Wisconsin Public Schools," Papers 2304.06205, arXiv.org, revised Sep 2023.
    12. X Nie & S Wager, 2021. "Quasi-oracle estimation of heterogeneous treatment effects [TensorFlow: A system for large-scale machine learning]," Biometrika, Biometrika Trust, vol. 108(2), pages 299-319.
    13. Alan S. Blinder, 1973. "Wage Discrimination: Reduced Form and Structural Estimates," Journal of Human Resources, University of Wisconsin Press, vol. 8(4), pages 436-455.
    14. Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ganesh Karapakula, 2023. "Stable Probability Weighting: Large-Sample and Finite-Sample Estimation and Inference Methods for Heterogeneous Causal Effects of Multivalued Treatments Under Limited Overlap," Papers 2301.05703, arXiv.org, revised Jan 2023.
    2. Pedro Carneiro & Sokbae Lee & Daniel Wilhelm, 2020. "Optimal data collection for randomized control trials," The Econometrics Journal, Royal Economic Society, vol. 23(1), pages 1-31.
    3. Yiyi Huo & Yingying Fan & Fang Han, 2023. "On the adaptation of causal forests to manifold data," Papers 2311.16486, arXiv.org, revised Dec 2023.
    4. Danquah, Michael & Iddrisu, Abdul Malik & Boakye, Ernest Owusu & Owusu, Solomon, 2021. "Do gender wage differences within households influence women's empowerment and welfare? Evidence from Ghana," Journal of Economic Behavior & Organization, Elsevier, vol. 188(C), pages 916-932.
    5. Bonaccolto-Töpfer, Marina & Briel, Stephanie, 2022. "The gender pay gap revisited: Does machine learning offer new insights?," Labour Economics, Elsevier, vol. 78(C).
    6. Valente, Marica, 2023. "Policy evaluation of waste pricing programs using heterogeneous causal effect estimation," Journal of Environmental Economics and Management, Elsevier, vol. 117(C).
    7. Cristian Bonavida, 2022. "Lo que hacemos con lo que sabemos. Brechas de género en habilidades y tareas en América Latina," Asociación Argentina de Economía Política: Working Papers 4542, Asociación Argentina de Economía Política.
    8. Helmut Wasserbacher & Martin Spindler, 2024. "Credit Ratings: Heterogeneous Effect on Capital Structure," Papers 2406.18936, arXiv.org.
    9. Achim Ahrens & Christian B. Hansen & Mark E. Schaffer & Thomas Wiemann, 2024. "Model Averaging and Double Machine Learning," Papers 2401.01645, arXiv.org, revised Sep 2024.
    10. Jonathan Fuhr & Philipp Berens & Dominik Papies, 2024. "Estimating Causal Effects with Double Machine Learning -- A Method Evaluation," Papers 2403.14385, arXiv.org, revised Apr 2024.
    11. Newham, Melissa & Valente, Marica, 2024. "The cost of influence: How gifts to physicians shape prescriptions and drug costs," Journal of Health Economics, Elsevier, vol. 95(C).
    12. Matias Busso & Patrick Kline, 2008. "Do Local Economic Development Programs Work? Evidence from the Federal Empowerment Zone Program," Cowles Foundation Discussion Papers 1639, Cowles Foundation for Research in Economics, Yale University.
    13. Katie Meara & Francesco Pastore & Allan Webster, 2020. "The gender pay gap in the USA: a matching study," Journal of Population Economics, Springer;European Society for Population Economics, vol. 33(1), pages 271-305, January.
    14. Davide Viviano & Jelena Bradic, 2019. "Synthetic learner: model-free inference on treatments over time," Papers 1904.01490, arXiv.org, revised Aug 2022.
    15. Michaela Fuchs & Anja Rossen & Antje Weyh & Gabriele Wydra‐Somaggio, 2021. "Where do women earn more than men? Explaining regional differences in the gender pay gap," Journal of Regional Science, Wiley Blackwell, vol. 61(5), pages 1065-1086, November.
    16. Michael C Knaus & Michael Lechner & Anthony Strittmatter, 2021. "Machine learning estimation of heterogeneous causal effects: Empirical Monte Carlo evidence," The Econometrics Journal, Royal Economic Society, vol. 24(1), pages 134-161.
    17. Tymon Słoczyński, 2022. "Interpreting OLS Estimands When Treatment Effects Are Heterogeneous: Smaller Groups Get Larger Weights," The Review of Economics and Statistics, MIT Press, vol. 104(3), pages 501-509, May.
    18. Hennig, Jan-Luca & Stadler, Balazs, 2021. "Firm-specific pay premiums and the gender wage gap in 21 European countries," VfS Annual Conference 2021 (Virtual Conference): Climate Economics 242354, Verein für Socialpolitik / German Economic Association.
    19. repec:iad:wpaper:0120 is not listed on IDEAS
    20. Daniela Piazzalunga & Maria Laura Di Tommaso, 2019. "The increase of the gender wage gap in Italy during the 2008-2012 economic crisis," The Journal of Economic Inequality, Springer;Society for the Study of Economic Inequality, vol. 17(2), pages 171-193, June.
    21. Sloczynski, Tymon, 2018. "A General Weighted Average Representation of the Ordinary and Two-Stage Least Squares Estimands," IZA Discussion Papers 11866, Institute of Labor Economics (IZA).

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2409.09894. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.