IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2406.17972.html
   My bibliography  Save this paper

LABOR-LLM: Language-Based Occupational Representations with Large Language Models

Author

Listed:
  • Tianyu Du
  • Ayush Kanodia
  • Herman Brunborg
  • Keyon Vafa
  • Susan Athey

Abstract

Many empirical studies of labor market questions rely on estimating relatively simple predictive models using small, carefully constructed longitudinal survey datasets based on hand-engineered features. Large Language Models (LLMs), trained on massive datasets, encode vast quantities of world knowledge and can be used for the next job prediction problem. However, while an off-the-shelf LLM produces plausible career trajectories when prompted, the probability with which an LLM predicts a particular job transition conditional on career history will not, in general, align with the true conditional probability in a given population. Recently, Vafa et al. (2024) introduced a transformer-based "foundation model", CAREER, trained using a large, unrepresentative resume dataset, that predicts transitions between jobs; it further demonstrated how transfer learning techniques can be used to leverage the foundation model to build better predictive models of both transitions and wages that reflect conditional transition probabilities found in nationally representative survey datasets. This paper considers an alternative where the fine-tuning of the CAREER foundation model is replaced by fine-tuning LLMs. For the task of next job prediction, we demonstrate that models trained with our approach outperform several alternatives in terms of predictive performance on the survey data, including traditional econometric models, CAREER, and LLMs with in-context learning, even though the LLM can in principle predict job titles that are not allowed in the survey data. Further, we show that our fine-tuned LLM-based models' predictions are more representative of the career trajectories of various workforce subpopulations than off-the-shelf LLM models and CAREER. We conduct experiments and analyses that highlight the sources of the gains in the performance of our models for representative predictions.

Suggested Citation

  • Tianyu Du & Ayush Kanodia & Herman Brunborg & Keyon Vafa & Susan Athey, 2024. "LABOR-LLM: Language-Based Occupational Representations with Large Language Models," Papers 2406.17972, arXiv.org.
  • Handle: RePEc:arx:papers:2406.17972
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2406.17972
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Jon Kleinberg & Jens Ludwig & Sendhil Mullainathan & Ziad Obermeyer, 2015. "Prediction Policy Problems," American Economic Review, American Economic Association, vol. 105(5), pages 491-495, May.
    2. Victor Chernozhukov & Mert Demirer & Esther Duflo & Iv'an Fern'andez-Val, 2017. "Fisher-Schultz Lecture: Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments, with an Application to Immunization in India," Papers 1712.04802, arXiv.org, revised Oct 2023.
    3. Susan Athey & Julie Tibshirani & Stefan Wager, 2016. "Generalized Random Forests," Papers 1610.01271, arXiv.org, revised Apr 2018.
    4. Robert Donnelly & Francisco J. R. Ruiz & David Blei & Susan Athey, 2021. "Correction to: Counterfactual inference for consumer choice across many product categories," Quantitative Marketing and Economics (QME), Springer, vol. 19(3), pages 409-409, December.
    5. Susan Athey & Lisa K. Simon & Oskar N. Skans & Johan Vikstrom & Yaroslav Yakymovych, 2023. "The Heterogeneous Earnings Impact of Job Loss Across Workers, Establishments, and Markets," Papers 2307.06684, arXiv.org, revised Feb 2024.
    6. David S. Johnson & Katherine A. McGonagle & Vicki A. Freedman & Narayan Sastry, 2018. "Fifty Years of the Panel Study of Income Dynamics: Past, Present, and Future," The ANNALS of the American Academy of Political and Social Science, , vol. 680(1), pages 9-28, November.
    7. Boskin, Michael J, 1974. "A Conditional Logit Model of Occupational Choice," Journal of Political Economy, University of Chicago Press, vol. 82(2), pages 389-398, Part I, M.
    8. Robert Donnelly & Francisco J.R. Ruiz & David Blei & Susan Athey, 2021. "Counterfactual inference for consumer choice across many product categories," Quantitative Marketing and Economics (QME), Springer, vol. 19(3), pages 369-407, December.
    9. Susan Athey & Guido W. Imbens, 2019. "Machine Learning Methods That Economists Should Know About," Annual Review of Economics, Annual Reviews, vol. 11(1), pages 685-725, August.
    10. Athey, Susan & Imbens, Guido W., 2019. "Machine Learning Methods Economists Should Know About," Research Papers 3776, Stanford University, Graduate School of Business.
    11. Keyon Vafa & Emil Palikot & Tianyu Du & Ayush Kanodia & Susan Athey & David M. Blei, 2022. "CAREER: A Foundation Model for Labor Sequence Data," Papers 2202.08370, arXiv.org, revised Feb 2024.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tatiana de Macedo Nogueira Lima, 2022. "Documento de Trabalho 03/2022 - Aprendizado de máquina e antitruste," Documentos de Trabalho 2022030, Conselho Administrativo de Defesa Econômica (Cade), Departamento de Estudos Econômicos.
    2. Adam N. Smith & Stephan Seiler & Ishant Aggarwal, 2023. "Optimal Price Targeting," Marketing Science, INFORMS, vol. 42(3), pages 476-499, May.
    3. Sophie-Charlotte Klose & Johannes Lederer, 2020. "A Pipeline for Variable Selection and False Discovery Rate Control With an Application in Labor Economics," Papers 2006.12296, arXiv.org, revised Jun 2020.
    4. Hannes Wallimann & Silvio Sticher, 2023. "On suspicious tracks: machine-learning based approaches to detect cartels in railway-infrastructure procurement," Papers 2304.11888, arXiv.org.
    5. Urmat Dzhunkeev, 2022. "Forecasting Unemployment in Russia Using Machine Learning Methods," Russian Journal of Money and Finance, Bank of Russia, vol. 81(1), pages 73-87, March.
    6. Filmer,Deon P. & Nahata,Vatsal & Sabarwal,Shwetlena, 2021. "Preparation, Practice, and Beliefs : A Machine Learning Approach to Understanding Teacher Effectiveness," Policy Research Working Paper Series 9847, The World Bank.
    7. Tobias Cagala & Ulrich Glogowsky & Johannes Rincke & Anthony Strittmatter, 2021. "Optimal Targeting in Fundraising: A Machine-Learning Approach," Economics working papers 2021-08, Department of Economics, Johannes Kepler University Linz, Austria.
    8. Falco J. Bargagli Stoffi & Kenneth De Beckker & Joana E. Maldonado & Kristof De Witte, 2021. "Assessing Sensitivity of Machine Learning Predictions.A Novel Toolbox with an Application to Financial Literacy," Papers 2102.04382, arXiv.org.
    9. Gallego, Jorge & Rivero, Gonzalo & Martínez, Juan, 2021. "Preventing rather than punishing: An early warning model of malfeasance in public procurement," International Journal of Forecasting, Elsevier, vol. 37(1), pages 360-377.
    10. Hannes Wallimann & David Imhof & Martin Huber, 2023. "A Machine Learning Approach for Flagging Incomplete Bid-Rigging Cartels," Computational Economics, Springer;Society for Computational Economics, vol. 62(4), pages 1669-1720, December.
    11. de Blasio, Guido & D'Ignazio, Alessio & Letta, Marco, 2022. "Gotham city. Predicting ‘corrupted’ municipalities with machine learning," Technological Forecasting and Social Change, Elsevier, vol. 184(C).
    12. Hannes Mueller & Christopher Rauh, 2022. "The Hard Problem of Prediction for Conflict Prevention," Journal of the European Economic Association, European Economic Association, vol. 20(6), pages 2440-2467.
    13. Uguccioni, James, 2022. "The long-run effects of parental unemployment in childhood," CLEF Working Paper Series 45, Canadian Labour Economics Forum (CLEF), University of Waterloo.
    14. Liyang Tang, 2020. "Application of Nonlinear Autoregressive with Exogenous Input (NARX) neural network in macroeconomic forecasting, national goal setting and global competitiveness assessment," Papers 2005.08735, arXiv.org.
    15. Tobias Cagala & Ulrich Glogowsky & Johannes Rincke & Anthony Strittmatter, 2021. "Optimal Targeting in Fundraising: A Causal Machine-Learning Approach," Papers 2103.10251, arXiv.org, revised Sep 2021.
    16. Mark Musumba & Naureen Fatema & Shahriar Kibriya, 2021. "Prevention Is Better Than Cure: Machine Learning Approach to Conflict Prediction in Sub-Saharan Africa," Sustainability, MDPI, vol. 13(13), pages 1-18, July.
    17. Silveira, Douglas & Vasconcelos, Silvinha & Resende, Marcelo & Cajueiro, Daniel O., 2022. "Won’t Get Fooled Again: A supervised machine learning approach for screening gasoline cartels," Energy Economics, Elsevier, vol. 105(C).
    18. Falco J. Bargagli-Stoffi & Jan Niederreiter & Massimo Riccaboni, 2020. "Supervised learning for the prediction of firm dynamics," Papers 2009.06413, arXiv.org.
    19. Alessandra Garbero & Marco Letta, 2022. "Predicting household resilience with machine learning: preliminary cross-country tests," Empirical Economics, Springer, vol. 63(4), pages 2057-2070, October.
    20. Lundberg, Ian & Brand, Jennie E. & Jeon, Nanum, 2022. "Researcher reasoning meets computational capacity: Machine learning for social science," SocArXiv s5zc8, Center for Open Science.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2406.17972. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.