Recovering Overlooked Information in Categorical Variables with LLMs: An Application to Labor Market Mismatch

My bibliography Save this paper

Recovering Overlooked Information in Categorical Variables with LLMs: An Application to Labor Market Mismatch

Author

Listed:

Yi Chen
(ShanghaiTech University)
Hanming Fang
(University of Pennsylvania)
Yi Zhao
(Tsinghua University)
Zibo Zhao
(ShanghaiTech University)

Registered:

Abstract

Categorical variables have no intrinsic ordering, and researchers often adopt a fixed-effect (FE) approach in empirical analysis. However, this approach has two significant limitations: it overlooks textual information associated with the categorical variables; and it produces unstable results when there are only limited observations in a category. In this paper, we propose a novel method that utilizes recent advances in large language models (LLMs) to recover overlooked information in categorical variables. We apply this method to investigate labor market mismatch. Specifically, we task LLMs with simulating the role of a human resources specialist to assess the suitability of an applicant with specific characteristics for a given job. Our main findings can be summarized in three parts. First, using comprehensive administrative data from an online job posting platform, we show that our new match quality measure is positively correlated with several traditional measures in the literature, and we highlight the LLM’s capability to provide additional information beyond that contained in the traditional measures. Second, we demonstrate the broad applicability of the new method with a survey data containing significantly less information than the administrative data, which makes it impossible to compute most of the traditional match quality measures. Our LLM measure successfully replicates most of the salient patterns observed in a hard-to-access administrative dataset using easily accessible survey data. Third, we investigate the gender gap in match quality and explore whether there exists gender stereotypes in the hiring process. We simulate an audit study, examining whether revealing gender information to LLMs influences their assessment. We show that when gender information is disclosed to the LLMs, the model deems females better suited for traditionally female-dominated roles.

Suggested Citation

Yi Chen & Hanming Fang & Yi Zhao & Zibo Zhao, 2024. "Recovering Overlooked Information in Categorical Variables with LLMs: An Application to Labor Market Mismatch," PIER Working Paper Archive 24-017, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania.

Handle: RePEc:pen:papers:24-017

Download full text from publisher

References listed on IDEAS

Tyna Eloundou & Sam Manning & Pamela Mishkin & Daniel Rock, 2023. "GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models," Papers 2303.10130, arXiv.org, revised Aug 2023.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Christoph Riedl & Eric Bogert, 2024. "Effects of AI Feedback on Learning, the Skill Gap, and Intellectual Diversity," Papers 2409.18660, arXiv.org.
Carvajal, Daniel & Franco, Catalina & Isaksson, Siri, 2024. "Will Artificial Intelligence Get in the Way of Achieving Gender Equality?," Discussion Paper Series in Economics 3/2024, Norwegian School of Economics, Department of Economics, revised 20 Mar 2025.
Evangelos Katsamakas & Oleg V. Pavlov & Ryan Saklad, 2024. "Artificial intelligence and the transformation of higher education institutions," Papers 2402.08143, arXiv.org.
Draca, Mirko & Nathan, Max & Nguyen-Tien, Viet & Oliveira-Cunha, Juliana & Rosso, Anna & Valero, Anna, 2024. "The New Wave? The Role of Human Capital and STEM Skills in Technology Adoption in the UK," The Warwick Economics Research Paper Series (TWERPS) 1521, University of Warwick, Department of Economics.
- Mirko Draca & Max Nathan & Viet Nguyen-Tien & Juliana Oliveira-Cunha & Anna Rosso & Anna Valero, 2024. "The New Wave? The Role of Human Capital and STEM Skills in Technology Adoption in the UK," Development Working Papers 495, Centro Studi Luca d'Agliano, University of Milano.
- Draca, Mirko & Nathan, Max & Nguyen-Tien, Viet & Oliveira-Cunha, Juliana & Rosso, Anna & Valero, Anna, 2024. "The New Wave? The Role of Human Capital and STEM Skills in Technology Adoption in the UK," CAGE Online Working Paper Series 726, Competitive Advantage in the Global Economy (CAGE).
- Mirko Draca & Max Nathan & Viet Nguyen-Tien & Juliana Oliveira-Cunha & Anna Rosso & Anna Valero, 2024. "The new wave? The role of human capital and STEM skills in technology adoption in the UK," CEP Discussion Papers dp2040, Centre for Economic Performance, LSE.
- Draca, Mirko & Nathan, Max & Nguyen-Tien, Viet & Oliveira-Cunha, Juliana & Rosso, Anna & Valero, Anna, 2024. "The New Wave? The Role of Human Capital and STEM Skills in Technology Adoption in the UK," IZA Discussion Papers 17329, Institute of Labor Economics (IZA).
Berlinski, Elise & Morales, Jérémy & Sponem, Samuel, 2024. "Artificial imaginaries: Generative AIs as an advanced form of capitalism," CRITICAL PERSPECTIVES ON ACCOUNTING, Elsevier, vol. 99(C).
Caleb Peppiatt, 2024. "The Future of Work: Inequality, Artificial Intelligence, and What Can Be Done About It. A Literature Review," Papers 2408.13300, arXiv.org.
D'Al, Francesco & Santarelli, Enrico & Vivarelli, Marco, 2024. "The KSTE+I approach and the advent of AI technologies: evidence from the European regions," GLO Discussion Paper Series 1473, Global Labor Organization (GLO).
Amali Matharaarachchi & Wishmitha Mendis & Kanishka Randunu & Daswin De Silva & Gihan Gamage & Harsha Moraliyage & Nishan Mills & Andrew Jennings, 2024. "Optimizing Generative AI Chatbots for Net-Zero Emissions Energy Internet-of-Things Infrastructure," Energies, MDPI, vol. 17(8), pages 1-19, April.
Anna Davies & Betsy Donald & Mia Gray, 2023. "The power of platforms—precarity and place," Cambridge Journal of Regions, Economy and Society, Cambridge Political Economy Society, vol. 16(2), pages 245-256.
Samir Huseynov, 2023. "ChatGPT and the Labor Market: Unraveling the Effect of AI Discussions on Students' Earnings Expectations," Papers 2305.11900, arXiv.org, revised Aug 2023.
Christian Peukert & Florian Abeillon & Jérémie Haese & Franziska Kaiser & Alexander Staub, 2024. "Strategic Behavior and AI Training Data," CESifo Working Paper Series 11099, CESifo.
Francesco D'Alessandro & Enrico Santarelli & Marco Vivarelli, 2024. "The KSTE+I approach and the AI technologies," DISCE - Working Papers del Dipartimento di Politica Economica dipe0039, Università Cattolica del Sacro Cuore, Dipartimenti e Istituti di Scienze Economiche (DISCE).
- D'Allesandro, Francesco & Santarelli, Enrico & Vivarelli, Marco, 2024. "The KSTE+I approach and the AI technologies," MERIT Working Papers 2024-016, United Nations University - Maastricht Economic and Social Research Institute on Innovation and Technology (MERIT).
Thomas Cantens, 2023. "How will the State think with the assistance of ChatGPT? The case of customs as an example of generative artificial intelligence in public administrations," CERDI Working papers hal-04233370, HAL.
Avi Goldfarb, 2024. "Pause artificial intelligence research? Understanding AI policy challenges," Canadian Journal of Economics/Revue canadienne d'économique, John Wiley & Sons, vol. 57(2), pages 363-377, May.
Anil R. Doshi & Oliver P. Hauser, 2023. "Generative artificial intelligence enhances creativity but reduces the diversity of novel content," Papers 2312.00506, arXiv.org, revised Mar 2024.
James Bono & Alec Xu, 2024. "Randomized Controlled Trials for Security Copilot for IT Administrators," Papers 2411.01067, arXiv.org, revised Nov 2024.
Ekaterina Novozhilova & Kate Mays & James E. Katz, 2024. "Looking towards an automated future: U.S. attitudes towards future artificial intelligence instantiations and their effect," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-11, December.
Anton Korinek & Donghyun Suh, 2024. "Scenarios for the Transition to AGI," NBER Working Papers 32255, National Bureau of Economic Research, Inc.
- Korinek, Anton & Suh, Donghyun, 2024. "Scenarios for the Transition to AGI," CEPR Discussion Papers 18928, C.E.P.R. Discussion Papers.
- Anton Korinek & Donghyun Suh, 2024. "Scenarios for the Transition to AGI," Papers 2403.12107, arXiv.org.
D’Alessandro, Francesco & Santarelli, Enrico & Vivarelli, Marco, 2024. "The Knowledge Spillover Theory of Entrepreneurship and Innovation (KSTE+I) Approach and the Advent of AI Technologies: Evidence from the European Regions," IZA Discussion Papers 17206, Institute of Labor Economics (IZA).
Albanesi, Stefania & Dias da Silva, Antonio & Jimeno, Juan Francisco & Lamo, Ana & Wabitsch, Alena, 2023. "New Technologies and Jobs in Europe," CEPR Discussion Papers 18220, C.E.P.R. Discussion Papers.
- Albanesi, Stefania & Da Silva, António Dias & Jimeno, Juan F. & Lamo, Ana & Wabitsch, Alena, 2023. "New technologies and jobs in Europe," Working Paper Series 2831, European Central Bank.
- Stefania Albanesi & António Dias da Silva & Juan F. Jimeno & Ana Lamo & Alena Wabitsch, 2023. "New Technologies and Jobs in Europe," NBER Working Papers 31357, National Bureau of Economic Research, Inc.
- Stefania Albanesi, 2023. "New Technologies and Jobs in Europe," Working Papers 2023-01, University of Miami, Department of Economics.
- Albanesi, Stefania & Silva, António Dias da & Jimeno, Juan F. & Lamo, Ana & Wabitsch, Alena, 2023. "New Technologies and Jobs in Europe," IZA Discussion Papers 16227, Institute of Labor Economics (IZA).
- Stefania Albanesi & António Dias da Silva & Juan F. Jimeno & Ana Lamo & Alena Wabitsch, 2024. "New Technologies and Jobs in Europe," Opportunity and Inclusive Growth Institute Working Papers 105, Federal Reserve Bank of Minneapolis.
- Stefania Albanesi & António Dias da Silva & Juan F. Jimeno & Ana Lamo & Alena Wabitsch, 2023. "New technologies and jobs in Europe," Working Papers 2322, Banco de España.

More about this item

Keywords

Large Language Models; Categorical Variables; Labor Market Mismatch;
All these keywords.

JEL classification:

C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis
J16 - Labor and Demographic Economics - - Demographic Economics - - - Economics of Gender; Non-labor Discrimination
J24 - Labor and Demographic Economics - - Demand and Supply of Labor - - - Human Capital; Skills; Occupational Choice; Labor Productivity
J31 - Labor and Demographic Economics - - Wages, Compensation, and Labor Costs - - - Wage Level and Structure; Wage Differentials

NEP fields

This paper has been announced in the following NEP Reports:

NEP-AIN-2024-08-19 (Artificial Intelligence)
NEP-BIG-2024-08-19 (Big Data)
NEP-LMA-2024-08-19 (Labor Markets - Supply, Demand, and Wages)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:pen:papers:24-017. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Administrator (email available below). General contact details of provider: https://edirc.repec.org/data/deupaus.html .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Recovering Overlooked Information in Categorical Variables with LLMs: An Application to Labor Market Mismatch

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

JEL classification:

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data