IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2411.03779.html
   My bibliography  Save this paper

Multilingual hierarchical classification of job advertisements for job vacancy statistics

Author

Listed:
  • Maciej Berk{e}sewicz
  • Marek Wydmuch
  • Herman Cherniaiev
  • Robert Pater

Abstract

The goal of this paper is to develop a multilingual classifier and conditional probability estimator of occupation codes for online job advertisements according in accordance with the International Standard Classification of Occupations (ISCO) extended with the Polish Classification of Occupations and Specializations (KZiS), which is analogous to the European Classification of Occupations. In this paper, we utilise a range of data sources, including a novel one, namely the Central Job Offers Database, which is a register of all vacancies submitted to Public Employment Offices. Their staff members code the vacancies according to the ISCO and KZiS. A hierarchical multi-class classifier has been developed based on the transformer architecture. The classifier begins by encoding the jobs found in advertisements to the widest 1-digit occupational group, and then narrows the assignment to a 6-digit occupation code. We show that incorporation of the hierarchical structure of occupations improves prediction accuracy by 1-2 percentage points, particularly for the hand-coded online job advertisements. Finally, a bilingual (Polish and English) and multilingual (24 languages) model is developed based on data translated using closed and open-source software. The open-source software is provided for the benefit of the official statistics community, with a particular focus on international comparability.

Suggested Citation

  • Maciej Berk{e}sewicz & Marek Wydmuch & Herman Cherniaiev & Robert Pater, 2024. "Multilingual hierarchical classification of job advertisements for job vacancy statistics," Papers 2411.03779, arXiv.org.
  • Handle: RePEc:arx:papers:2411.03779
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2411.03779
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Lumley, Thomas, 2004. "Analysis of Complex Survey Samples," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 9(i08).
    2. Maciej Berk{e}sewicz & Herman Cherniaiev & Robert Pater, 2021. "Estimating the number of entities with vacancies using administrative and online data," Papers 2106.03263, arXiv.org.
    3. Brad Hershbein & Lisa B. Kahn, 2018. "Do Recessions Accelerate Routine-Biased Technological Change? Evidence from Vacancy Postings," American Economic Review, American Economic Association, vol. 108(7), pages 1737-1772, July.
    4. Jonathan Hersh & Matthew Harding, 2018. "Big Data in economics," IZA World of Labor, Institute of Labor Economics (IZA), pages 451-451, September.
    5. Ewa Gałecka-Burdziak & Robert Pater, 2015. "Ile jest wolnych miejsc pracy w Polsce?," Gospodarka Narodowa. The Polish Journal of Economics, Warsaw School of Economics, issue 5, pages 171-186.
    6. Turrell, Arthur & Speigner, Bradley & Copple, David & Djumalieva, Jyldyz & Thurgood, James, 2021. "Is the UK’s productivity puzzle mostly driven by occupational mismatch? An analysis using big data on job vacancies," Labour Economics, Elsevier, vol. 71(C).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Maciej Berk{e}sewicz & Herman Cherniaiev & Robert Pater, 2021. "Estimating the number of entities with vacancies using administrative and online data," Papers 2106.03263, arXiv.org.
    2. Nicholas Bloom & Tarek Alexander Hassan & Aakash Kalyani & Josh Lerner & Ahmed Tahoun, 2021. "The diffusion of disruptive technologies," CEP Discussion Papers dp1798, Centre for Economic Performance, LSE.
    3. José Azar & Emiliano Huet & Ioana Marinescu & Bledi Taska & Till von, 2024. "Minimum Wage Employment Effects and Labour Market Concentration," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 91(4), pages 1843-1883.
    4. John Carter Braxton & Kyle F. Herkenhoff & Jonathan Rothbaum & Lawrence Schmidt, 2021. "Changing Income Risk across the US Skill Distribution: Evidence from a Generalized Kalman Filter," Opportunity and Inclusive Growth Institute Working Papers 55, Federal Reserve Bank of Minneapolis.
    5. Hershbein, Brad, 2018. "Discussion for JME special issue: APST paper," Journal of Monetary Economics, Elsevier, vol. 97(C), pages 68-70.
    6. Barth, Erling & Davis, James C. & Freeman, Richard B. & McElheran, Kristina, 2023. "Twisting the demand curve: Digitalization and the older workforce," Journal of Econometrics, Elsevier, vol. 233(2), pages 443-467.
    7. Grinis, Inna, 2017. "The STEM requirements of "non-STEM" jobs: evidence from UK online vacancy postings and implications for skills & knowledge shortages," LSE Research Online Documents on Economics 85123, London School of Economics and Political Science, LSE Library.
    8. Qin, Fei & Wu, Steven Y., 2022. "Estimating Consumer Segments and Choices from Limited Information: The Application of Machine Learning Methods," 2022 Annual Meeting, July 31-August 2, Anaheim, California 322473, Agricultural and Applied Economics Association.
    9. Stephan Brunow & Stefanie Lösch & Ostap Okhrin, 2022. "Labor market tightness and individual wage growth: evidence from Germany," Journal for Labour Market Research, Springer;Institute for Employment Research/ Institut für Arbeitsmarkt- und Berufsforschung (IAB), vol. 56(1), pages 1-21, December.
    10. David Deming & Lisa B. Kahn, 2018. "Skill Requirements across Firms and Labor Markets: Evidence from Job Postings for Professionals," Journal of Labor Economics, University of Chicago Press, vol. 36(S1), pages 337-369.
    11. Filippos Petroulakis, 2023. "Task Content and Job Losses in the Great Lockdown," ILR Review, Cornell University, ILR School, vol. 76(3), pages 586-613, May.
    12. Zilian, Laura S. & Zilian, Stella S. & Jäger, Georg, 2021. "Labour market polarisation revisited: evidence from Austrian vacancy data," Journal for Labour Market Research, Institut für Arbeitsmarkt- und Berufsforschung (IAB), Nürnberg [Institute for Employment Research, Nuremberg, Germany], vol. 55, pages 1-7.
    13. Georg Graetz & Guy Michaels, 2017. "Is Modern Technology Responsible for Jobless Recoveries?," American Economic Review, American Economic Association, vol. 107(5), pages 168-173, May.
    14. Audra Bowlus & Lance Lochner & Chris Robinson & Eda Suleymanoglu, 2023. "Wages, Skills, and Skill-Biased Technical Change: The Canonical Model Revisited," Journal of Human Resources, University of Wisconsin Press, vol. 58(6), pages 1783-1819.
    15. Stephen Hansen & Tejas Ramdas & Raffaella Sadun & Joe Fuller, 2021. "The Demand for Executive Skills," NBER Working Papers 28959, National Bureau of Economic Research, Inc.
    16. Brancatelli,Calogero & Marguerie,Alicia Charlene & Koettl-Brodmann,Stefanie, 2020. "Job Creation and Demand for Skills in Kosovo : What Can We Learn from Job Portal Data?," Policy Research Working Paper Series 9266, The World Bank.
    17. Karelin, Iliya & Kapelyuk, Sergey, 2023. "Digital Skills of Russian Citizens: Regional Differences," MPRA Paper 119494, University Library of Munich, Germany.
    18. Alex Chernoff & Casey Warman, 2023. "COVID-19 and implications for automation," Applied Economics, Taylor & Francis Journals, vol. 55(17), pages 1939-1957, April.
    19. Mary A. Burke & Alicia Sasser Modestino & Shahriar Sadighi & Rachel B. Sederberg & Bledi Taska, 2019. "No Longer Qualified? Changes in the Supply and Demand for Skills within Occupations," Working Papers 20-3, Federal Reserve Bank of Boston.
    20. Klein, Daniel & Ludwig, Christopher A. & Nicolay, Katharina, 2020. "Internal digitalization and tax-efficient decision making," ZEW Discussion Papers 20-051, ZEW - Leibniz Centre for European Economic Research.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2411.03779. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.