LABOR-LLM: Language-Based Occupational Representations with Large Language Models

My bibliography Save this paper

LABOR-LLM: Language-Based Occupational Representations with Large Language Models

Author

Listed:

Susan Athey
Herman Brunborg
Tianyu Du
Ayush Kanodia
Keyon Vafa

Registered:

Susan Carleton Athey

Abstract

Vafa et al. (2024) introduced a transformer-based econometric model, CAREER, that predicts a worker's next job as a function of career history (an "occupation model"). CAREER was initially estimated ("pre-trained") using a large, unrepresentative resume dataset, which served as a "foundation model," and parameter estimation was continued ("fine-tuned") using data from a representative survey. CAREER had better predictive performance than benchmarks. This paper considers an alternative where the resume-based foundation model is replaced by a large language model (LLM). We convert tabular data from the survey into text files that resemble resumes and fine-tune the LLMs using these text files with the objective to predict the next token (word). The resulting fine-tuned LLM is used as an input to an occupation model. Its predictive performance surpasses all prior models. We demonstrate the value of fine-tuning and further show that by adding more career data from a different population, fine-tuning smaller LLMs surpasses the performance of fine-tuning larger models.

Suggested Citation

Susan Athey & Herman Brunborg & Tianyu Du & Ayush Kanodia & Keyon Vafa, 2024. "LABOR-LLM: Language-Based Occupational Representations with Large Language Models," Papers 2406.17972, arXiv.org, revised Feb 2025.

Handle: RePEc:arx:papers:2406.17972

Download full text from publisher

Other versions of this item:

Du, Tianyu & Kanodia, Ayush & Brunborg, Herman & Vafa, Keyon & Athey, Susan, 2024. "Labor-LLM: Language-Based Occupational Representations with Large Language Models," Research Papers 4188, Stanford University, Graduate School of Business.

References listed on IDEAS

Jon Kleinberg & Jens Ludwig & Sendhil Mullainathan & Ziad Obermeyer, 2015. "Prediction Policy Problems," American Economic Review, American Economic Association, vol. 105(5), pages 491-495, May.
Victor Chernozhukov & Mert Demirer & Esther Duflo & Iv'an Fern'andez-Val, 2017. "Fisher-Schultz Lecture: Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments, with an Application to Immunization in India," Papers 1712.04802, arXiv.org, revised Oct 2023.
- Victor Chernozhukov & Mert Demirer & Esther Duflo & Iván Fernández-Val, 2023. "Fischer-Schultz Lecture: Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments, with an Application to Immunization in India," Working Papers hal-04238425, HAL.
Susan Athey & Julie Tibshirani & Stefan Wager, 2016. "Generalized Random Forests," Papers 1610.01271, arXiv.org, revised Apr 2018.
- Athey, Susan & Tibshirani, Julie & Wager, Stefan, 2017. "Generalized Random Forests," Research Papers 3575, Stanford University, Graduate School of Business.
Robert Donnelly & Francisco J. R. Ruiz & David Blei & Susan Athey, 2021. "Correction to: Counterfactual inference for consumer choice across many product categories," Quantitative Marketing and Economics (QME), Springer, vol. 19(3), pages 409-409, December.
Susan Athey & Lisa K. Simon & Oskar N. Skans & Johan Vikstrom & Yaroslav Yakymovych, 2023. "The Heterogeneous Earnings Impact of Job Loss Across Workers, Establishments, and Markets," Papers 2307.06684, arXiv.org, revised Feb 2024.
- Athey, Susan & Simon, Lisa & Skans, Oskar & Johan Vikström, Johan & Yakymovych, Yaroslav, 2024. "The heterogeneous earnings impact of job lossacross workers, establishments, and markets," Working Paper Series 2024:10, IFAU - Institute for Evaluation of Labour Market and Education Policy.
- Athey, Susan & Simon, Lisa K. & Skans, Oskar N. & Vikstrom, Johan & Yakymovych, Yaroslav, 2023. "The Heterogeneous Earnings Impact of Job Loss across Workers, Establishments, and Markets," Research Papers 4148, Stanford University, Graduate School of Business.
David S. Johnson & Katherine A. McGonagle & Vicki A. Freedman & Narayan Sastry, 2018. "Fifty Years of the Panel Study of Income Dynamics: Past, Present, and Future," The ANNALS of the American Academy of Political and Social Science, , vol. 680(1), pages 9-28, November.
Robert E. Hall, 1972. "Turnover in the Labor Force," Brookings Papers on Economic Activity, Economic Studies Program, The Brookings Institution, vol. 3(3), pages 709-764.
Boskin, Michael J, 1974. "A Conditional Logit Model of Occupational Choice," Journal of Political Economy, University of Chicago Press, vol. 82(2), pages 389-398, Part I, M.
Robert Donnelly & Francisco J.R. Ruiz & David Blei & Susan Athey, 2021. "Counterfactual inference for consumer choice across many product categories," Quantitative Marketing and Economics (QME), Springer, vol. 19(3), pages 369-407, December.
- Rob Donnelly & Francisco R. Ruiz & David Blei & Susan Athey, 2019. "Counterfactual Inference for Consumer Choice Across Many Product Categories," Papers 1906.02635, arXiv.org, revised Aug 2023.
Susan Athey & Guido W. Imbens, 2019. "Machine Learning Methods That Economists Should Know About," Annual Review of Economics, Annual Reviews, vol. 11(1), pages 685-725, August.
Athey, Susan & Imbens, Guido W., 2019. "Machine Learning Methods Economists Should Know About," Research Papers 3776, Stanford University, Graduate School of Business.
- Susan Athey & Guido Imbens, 2019. "Machine Learning Methods Economists Should Know About," Papers 1903.10075, arXiv.org.
Keyon Vafa & Emil Palikot & Tianyu Du & Ayush Kanodia & Susan Athey & David M. Blei, 2022. "CAREER: A Foundation Model for Labor Sequence Data," Papers 2202.08370, arXiv.org, revised Feb 2024.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Tatiana de Macedo Nogueira Lima, 2022. "Documento de Trabalho 03/2022 - Aprendizado de máquina e antitruste," Documentos de Trabalho 2022030, Conselho Administrativo de Defesa Econômica (Cade), Departamento de Estudos Econômicos.
Adam N. Smith & Stephan Seiler & Ishant Aggarwal, 2023. "Optimal Price Targeting," Marketing Science, INFORMS, vol. 42(3), pages 476-499, May.
Sophie-Charlotte Klose & Johannes Lederer, 2020. "A Pipeline for Variable Selection and False Discovery Rate Control With an Application in Labor Economics," Papers 2006.12296, arXiv.org, revised Jun 2020.
Rama K. Malladi, 2024. "Benchmark Analysis of Machine Learning Methods to Forecast the U.S. Annual Inflation Rate During a High-Decile Inflation Period," Computational Economics, Springer;Society for Computational Economics, vol. 64(1), pages 335-375, July.
Tranos, Emmanouil & Incera, Andre Carrascal & Willis, George, 2022. "Using the web to predict regional trade flows: data extraction, modelling, and validation," OSF Preprints 9bu5z, Center for Open Science.
Hannes Wallimann & Silvio Sticher, 2023. "On suspicious tracks: machine-learning based approaches to detect cartels in railway-infrastructure procurement," Papers 2304.11888, arXiv.org.
Delogu, Marco & Lagravinese, Raffaele & Paolini, Dimitri & Resce, Giuliano, 2024. "Predicting dropout from higher education: Evidence from Italy," Economic Modelling, Elsevier, vol. 130(C).
- Marco Delogu & Raffaelle Lagravinese & Dimitri Paolini & Giuliano Resce, 2020. "Predicting dropout from higher education: Evidence from Italy," DEM Discussion Paper Series 22-06, Department of Economics at the University of Luxembourg.
Urmat Dzhunkeev, 2022. "Forecasting Unemployment in Russia Using Machine Learning Methods," Russian Journal of Money and Finance, Bank of Russia, vol. 81(1), pages 73-87, March.
Filmer,Deon P. & Nahata,Vatsal & Sabarwal,Shwetlena, 2021. "Preparation, Practice, and Beliefs : A Machine Learning Approach to Understanding Teacher Effectiveness," Policy Research Working Paper Series 9847, The World Bank.
Bas Bosma & Arjen Witteloostuijn, 2024. "Machine learning in international business," Journal of International Business Studies, Palgrave Macmillan;Academy of International Business, vol. 55(6), pages 676-702, August.
Tobias Cagala & Ulrich Glogowsky & Johannes Rincke & Anthony Strittmatter, 2021. "Optimal Targeting in Fundraising: A Machine-Learning Approach," Economics working papers 2021-08, Department of Economics, Johannes Kepler University Linz, Austria.
- Tobias Cagala & Ulrich Glogowsky & Johannes Rincke & Anthony Strittmatter, 2021. "Optimal Targeting in Fundraising: A Machine-Learning Approach," CESifo Working Paper Series 9037, CESifo.
Falco J. Bargagli Stoffi & Kenneth De Beckker & Joana E. Maldonado & Kristof De Witte, 2021. "Assessing Sensitivity of Machine Learning Predictions.A Novel Toolbox with an Application to Financial Literacy," Papers 2102.04382, arXiv.org.
Gallego, Jorge & Rivero, Gonzalo & Martínez, Juan, 2021. "Preventing rather than punishing: An early warning model of malfeasance in public procurement," International Journal of Forecasting, Elsevier, vol. 37(1), pages 360-377.
- Gallego, J & Rivero, G & Martínez, J.D., 2018. "Preventing rather than Punishing: An Early Warning Model of Malfeasance in Public Procurement," Documentos de Trabajo 16724, Universidad del Rosario.
Hannes Wallimann & David Imhof & Martin Huber, 2023. "A Machine Learning Approach for Flagging Incomplete Bid-Rigging Cartels," Computational Economics, Springer;Society for Computational Economics, vol. 62(4), pages 1669-1720, December.
- Hannes Wallimann & David Imhof & Martin Huber, 2020. "A Machine Learning Approach for Flagging Incomplete Bid-rigging Cartels," Papers 2004.05629, arXiv.org.
- Wallimann, Hannes & Imhof, David & Huber, Martin, 2020. "A Machine Learning Approach for Flagging Incomplete Bid-rigging Cartels," FSES Working Papers 513, Faculty of Economics and Social Sciences, University of Freiburg/Fribourg Switzerland.
Erokhin, Dmitry & Zagler, Martin, 2024. "Who will sign a double tax treaty next? A prediction based on economic determinants and machine learning algorithms," Economic Modelling, Elsevier, vol. 139(C).
Wallimann, Hannes & Sticher, Silvio, 2023. "On suspicious tracks: Machine-learning based approaches to detect cartels in railway-infrastructure procurement," Transport Policy, Elsevier, vol. 143(C), pages 121-131.
de Blasio, Guido & D'Ignazio, Alessio & Letta, Marco, 2022. "Gotham city. Predicting ‘corrupted’ municipalities with machine learning," Technological Forecasting and Social Change, Elsevier, vol. 184(C).
Zhu, Jingjing & Huang, Tianyuan, 2024. "Public debt and welfare with machine learning," Finance Research Letters, Elsevier, vol. 69(PA).
Strittmatter, Anthony, 2023. "What is the value added by using causal machine learning methods in a welfare experiment evaluation?," Labour Economics, Elsevier, vol. 84(C).
Hannes Mueller & Christopher Rauh, 2022. "The Hard Problem of Prediction for Conflict Prevention," Journal of the European Economic Association, European Economic Association, vol. 20(6), pages 2440-2467.
- Hannes Mueller & Christopher Rauh, 2019. "The hard problem of prediction for conflict prevention," Cahiers de recherche 2019-02, Universite de Montreal, Departement de sciences economiques.
- Mueller, H. & Rauh, C., 2020. "The Hard Problem of Prediction for Conflict Prevention," Cambridge Working Papers in Economics 2015, Faculty of Economics, University of Cambridge.
- Hannes Mueller, 2021. "The Hard Problem of Prediction for Conflict Prevention," Working Papers 1244, Barcelona School of Economics.
- Mueller, H. & Rauh, C., 2021. "The Hard Problem of Prediction for Conflict Prevention," Cambridge Working Papers in Economics 2103, Faculty of Economics, University of Cambridge.
- Hannes Mueller & Christopher Rauh, 2019. "The Hard Problem of Prediction for Conflict Prevention," Cahiers de recherche 02-2019, Centre interuniversitaire de recherche en Ã©conomie quantitative, CIREQ.
- Mueller, Hannes & Rauh, Christopher, 2019. "The Hard Problem of Prediction for Conflict Prevention," CEPR Discussion Papers 13748, C.E.P.R. Discussion Papers.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-AIN-2024-07-29 (Artificial Intelligence)
NEP-BIG-2024-07-29 (Big Data)
NEP-CMP-2024-07-29 (Computational Economics)
NEP-MAC-2024-07-29 (Macroeconomics)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2406.17972. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

LABOR-LLM: Language-Based Occupational Representations with Large Language Models

Author

Abstract

Suggested Citation

Download full text from publisher

Other versions of this item:

References listed on IDEAS

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data