IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v9y2021i19p2475-d649449.html
   My bibliography  Save this article

Data Mining of Job Requirements in Online Job Advertisements Using Machine Learning and SDCA Logistic Regression

Author

Listed:
  • Bogdan Walek

    (Department of Informatics and Computers, University of Ostrava, 30 dubna 22, 70103 Ostrava, Czech Republic)

  • Ondrej Pektor

    (Department of Informatics and Computers, University of Ostrava, 30 dubna 22, 70103 Ostrava, Czech Republic)

Abstract

There are currently many job portals offering job positions in the form of job advertisements. In this article, we are proposing an approach to mine data from job advertisements on job portals. Mainly, it would concern job requirements mining from individual job advertisements. Our proposed system consists of a data mining module, a machine learning module, and a postprocessing module. The machine learning module is based on the SDCA logistic regression. The postprocessing module includes several approaches to increase the success rate of the job requirements identification. The proposed system was verified on 20 most searched IT job positions from the selected job portal. In total, 9971 job advertisements were analyzed. Our system’s verification is finding all job requirements in 80% of analyzed advertisements. The detected job requirements were also compared with the Open Skills database. Based on this database and the extension of IT job positions with other typical job skills, we created a list of the most frequent job skills in selected IT job positions. The main contribution is the development of a universal system to detect job requirements in job advertisements. The proposed approach can be used not only for IT positions, but also for various job positions. The presented data mining module can also be used for various job portals.

Suggested Citation

  • Bogdan Walek & Ondrej Pektor, 2021. "Data Mining of Job Requirements in Online Job Advertisements Using Machine Learning and SDCA Logistic Regression," Mathematics, MDPI, vol. 9(19), pages 1-32, October.
  • Handle: RePEc:gam:jmathe:v:9:y:2021:i:19:p:2475-:d:649449
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/9/19/2475/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/9/19/2475/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Jyldyz Djumalieva & Antonio Lima & Cath Sleeman, 2018. "Classifying Occupations According to Their Skill Requirements in Job Advertisements," Economic Statistics Centre of Excellence (ESCoE) Discussion Papers ESCoE DP-2018-04, Economic Statistics Centre of Excellence (ESCoE).
    2. Nik Dawson & Marian-Andrei Rizoiu & Benjamin Johnston & Mary-Anne Williams, 2020. "Predicting Skill Shortages in Labor Markets: A Machine Learning Approach," Papers 2004.01311, arXiv.org, revised Aug 2020.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ziqiao Ao & Gergely Horvath & Chunyuan Sheng & Yifan Song & Yutong Sun, 2022. "Skill requirements in job advertisements: A comparison of skill-categorization methods based on explanatory power in wage regressions," Papers 2207.12834, arXiv.org.
    2. Caglayan, Mustafa & Talavera, Oleksandr & Xiong, Lin, 2022. "Female small business owners in China: Discouraged, not discriminated," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 80(C).
    3. Faryna, Oleksandr & Pham, Tho & Talavera, Oleksandr & Tsapin, Andriy, 2020. "Wage Setting and Unemployment: Evidence from Online Job Vacancy Data," GLO Discussion Paper Series 503, Global Labor Organization (GLO).
    4. Josten, Cecily & Lordan, Grace, 2022. "Automation and the changing nature of work," LSE Research Online Documents on Economics 114539, London School of Economics and Political Science, LSE Library.
    5. Alvin Vista, 2020. "Data-Driven Identification of Skills for the Future: 21st-Century Skills for the 21st-Century Workforce," SAGE Open, , vol. 10(2), pages 21582440209, April.
    6. Jyldyz Djumalieva1 & Cath Sleeman, 2018. "An Open and Data-driven Taxonomy of Skills Extracted from Online Job Adverts," Economic Statistics Centre of Excellence (ESCoE) Discussion Papers ESCoE DP-2018-13, Economic Statistics Centre of Excellence (ESCoE).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:9:y:2021:i:19:p:2475-:d:649449. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.