IDEAS home Printed from https://ideas.repec.org/a/vrs/offsta/v33y2017i1p101-122n6.html
   My bibliography  Save this article

Three Methods for Occupation Coding Based on Statistical Learning

Author

Listed:
  • Gweon Hyukjun

    (Department of Statistics and Actuarial Science, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1 Canada)

  • Schonlau Matthias

    (Department of Statistics and Actuarial Science, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1 Canada)

  • Kaczmirek Lars

    (GESIS – Leibniz-Institute for the Social Sciences, PO Box 12 21 55, D-68072 Mannheim, Germany)

  • Blohm Michael

    (GESIS – Leibniz-Institute for the Social Sciences, PO Box 12 21 55, D-68072 Mannheim, Germany)

  • Steiner Stefan

    (Department of Statistics and Actuarial Science, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1 Canada)

Abstract

Occupation coding, an important task in official statistics, refers to coding a respondent’s text answer into one of many hundreds of occupation codes. To date, occupation coding is still at least partially conducted manually, at great expense. We propose three methods for automatic coding: combining separate models for the detailed occupation codes and for aggregate occupation codes, a hybrid method that combines a duplicate-based approach with a statistical learning algorithm, and a modified nearest neighbor approach. Using data from the German General Social Survey (ALLBUS), we show that the proposed methods improve on both the coding accuracy of the underlying statistical learning algorithm and the coding accuracy of duplicates where duplicates exist. Further, we find defining duplicates based on ngram variables (a concept from text mining) is preferable to one based on exact string matches.

Suggested Citation

  • Gweon Hyukjun & Schonlau Matthias & Kaczmirek Lars & Blohm Michael & Steiner Stefan, 2017. "Three Methods for Occupation Coding Based on Statistical Learning," Journal of Official Statistics, Sciendo, vol. 33(1), pages 101-122, March.
  • Handle: RePEc:vrs:offsta:v:33:y:2017:i:1:p:101-122:n:6
    DOI: 10.1515/jos-2017-0006
    as

    Download full text from publisher

    File URL: https://doi.org/10.1515/jos-2017-0006
    Download Restriction: no

    File URL: https://libkey.io/10.1515/jos-2017-0006?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. repec:aia:aiaswp:151 is not listed on IDEAS
    2. Tijdens Kea, 2014. "Dropout Rates and Response Times of an Occupation Search Tree in a Web Survey," Journal of Official Statistics, Sciendo, vol. 30(1), pages 23-43, March.
    3. Michele Belloni & Agar Brugiavini & Elena Maschi & Kea Tijdens, 2014. "Measurement error in occupational coding:an analysis on SHARE data," Working Papers 2014: 24, Department of Economics, University of Venice "Ca' Foscari".
    4. Peter Elias, 1997. "Occupational Classification (ISCO-88): Concepts, Methods, Reliability, Validity and Cross-National Comparability," OECD Labour Market and Social Policy Occasional Papers 20, OECD Publishing.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jyldyz Djumalieva & Antonio Lima & Cath Sleeman, 2018. "Classifying Occupations According to Their Skill Requirements in Job Advertisements," Economic Statistics Centre of Excellence (ESCoE) Discussion Papers ESCoE DP-2018-04, Economic Statistics Centre of Excellence (ESCoE).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Schierholz, Malte & Gensicke, Miriam & Tschersich, Nikolai, 2016. "Occupation coding during the interview," IAB-Discussion Paper 201617, Institut für Arbeitsmarkt- und Berufsforschung (IAB), Nürnberg [Institute for Employment Research, Nuremberg, Germany].
    2. Massing Natascha & Wasmer Martina & Wolf Christof & Zuell Cornelia, 2019. "How Standardized is Occupational Coding? A Comparison of Results from Different Coding Agencies in Germany," Journal of Official Statistics, Sciendo, vol. 35(1), pages 167-187, March.
    3. Tijdens Kea, 2014. "Dropout Rates and Response Times of an Occupation Search Tree in a Web Survey," Journal of Official Statistics, Sciendo, vol. 30(1), pages 23-43, March.
    4. Brian Fabo & Miroslav BEBLAVY & Karolien LENAERTS & Zachary KILHOFFER, 2017. "An overview of European Platforms: Scope and Business Models," JRC Research Reports JRC109190, Joint Research Centre.
    5. Alejandra Bellatin & Gabriela Galassi, 2022. "What COVID-19 May Leave Behind: Technology-Related Job Postings in Canada," Staff Working Papers 22-17, Bank of Canada.
    6. Malgorzata Mikucka, 2016. "How to Measure Employment Status and Occupation in Analyses of Survey Data? (Jak mierzyc status zatrudnienia i pozycjê zawodowa w analizach danych sondazowych?)," Problemy Zarzadzania, University of Warsaw, Faculty of Management, vol. 14(60), pages 40-60.
    7. Michael Fritsch & Michael Stützer, 2012. "The Geography of Creative People in Germany revisited," Jena Economics Research Papers 2012-065, Friedrich-Schiller-University Jena.
    8. Latorre, Maria C., 2014. "CGE analysis of the impact of foreign direct investment and tariff reform on female and male wages," Policy Research Working Paper Series 7073, The World Bank.
    9. Kässi, Otto & Lehdonvirta, Vili, 2018. "Online labour index: Measuring the online gig economy for policy and research," Technological Forecasting and Social Change, Elsevier, vol. 137(C), pages 241-248.
    10. Necker, Sarah & Voskort, Andrea, 2014. "Intergenerational transmission of risk attitudes – A revealed preference approach," European Economic Review, Elsevier, vol. 65(C), pages 66-89.
    11. Rafael Muñoz de Bustillo & Enrique Fernández-Macías & José-Ignacio Antón & Fernando Esteve, 2011. "Measuring More than Money," Books, Edward Elgar Publishing, number 14072.
    12. Turrell, Arthur & Thurgood, James & Djumalieva, Jyldyz & Copple, David & Speigner, Bradley, 2018. "Using online job vacancies to understand the UK labour market from the bottom-up," Bank of England working papers 742, Bank of England.
    13. Magdalena Smyk & Joanna Tyrowicz & Lucas van der Velde, 2021. "A Cautionary Note on the Reliability of the Online Survey Data: The Case of Wage Indicator," Sociological Methods & Research, , vol. 50(1), pages 429-464, February.
    14. Latorre, María C., 2016. "A CGE Analysis of the Impact of Foreign Direct Investment and Tariff Reform on Female and Male Workers in Tanzania," World Development, Elsevier, vol. 77(C), pages 346-366.
    15. Fauser, Margit & Liebau, Elisabeth & Voigtländer, Sven & Tuncer, Hidayet & Faist, Thomas & Razum, Oliver, 2015. "Measuring Transnationality of Immigrants in Germany: Prevalence and Relationship with Social Inequalities," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 38(9), pages 1497-1519.
    16. Leuze, Kathrin, 2010. "Smooth Path or Long and Winding Road? How Institutions Shape the Transition from Higher Education to Work," EconStor Books, ZBW - Leibniz Information Centre for Economics, number 251573, December.
    17. Malte Schierholz, 2018. "Eine Hilfsklassifikation mit Tätigkeitsbeschreibungen für Zwecke der Berufskodierung [An auxiliary classification with work activity descriptions for occupation coding]," AStA Wirtschafts- und Sozialstatistisches Archiv, Springer;Deutsche Statistische Gesellschaft - German Statistical Society, vol. 12(3), pages 285-298, December.
    18. Han, Seong Won, 2016. "National education systems and gender gaps in STEM occupational expectations," International Journal of Educational Development, Elsevier, vol. 49(C), pages 175-187.
    19. Fritsch, Michael & Stuetzer, Michael, 2008. "The Geography of Creative People in Germany," MPRA Paper 21965, University Library of Munich, Germany.
    20. Jyldyz Djumalieva & Antonio Lima & Cath Sleeman, 2018. "Classifying Occupations According to Their Skill Requirements in Job Advertisements," Economic Statistics Centre of Excellence (ESCoE) Discussion Papers ESCoE DP-2018-04, Economic Statistics Centre of Excellence (ESCoE).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:vrs:offsta:v:33:y:2017:i:1:p:101-122:n:6. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.sciendo.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.