IDEAS home Printed from https://ideas.repec.org/a/taf/jnlasa/v118y2023i542p805-817.html
   My bibliography  Save this article

Feature Screening for Interval-Valued Response with Application to Study Association between Posted Salary and Required Skills

Author

Listed:
  • Wei Zhong
  • Chen Qian
  • Wanjun Liu
  • Liping Zhu
  • Runze Li

Abstract

It is important to quantify the differences in returns to skills using the online job advertisements data, which have attracted great interest in both labor economics and statistics fields. In this article, we study the relationship between the posted salary and the job requirements in online labor markets. There are two challenges to deal with. First, the posted salary is always presented in an interval-valued form, for example, 5k–10k yuan per month. Simply taking the mid-point or the lower bound as the alternative for salary may result in biased estimators. Second, the number of the potential skill words as predictors generated from the job advertisements by word segmentation is very large and many of them may not contribute to the salary. To this end, we propose a new feature screening method, Absolute Distribution Difference Sure Independence Screening (ADD-SIS), to select important skill words for the interval-valued response. The marginal utility for feature screening is based on the difference of estimated distribution functions via nonparametric maximum likelihood estimation, which sufficiently uses the interval information. It is model-free and robust to outliers. Numerical simulations show that the new method using the interval information is more efficient to select important predictors than the methods only based on the single points of the intervals. In the real data application, we study the text data of job advertisements for data scientists and data analysts in a major China’s online job posting website, and explore the important skill words for the salary. We find that the skill words like optimization, long short-term memory (LSTM), convolutional neural networks (CNN), collaborative filtering, are positively correlated with the salary while the words like Excel, Office, data collection, may negatively contribute to the salary. Supplementary materials for this article are available online.

Suggested Citation

  • Wei Zhong & Chen Qian & Wanjun Liu & Liping Zhu & Runze Li, 2023. "Feature Screening for Interval-Valued Response with Application to Study Association between Posted Salary and Required Skills," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 118(542), pages 805-817, April.
  • Handle: RePEc:taf:jnlasa:v:118:y:2023:i:542:p:805-817
    DOI: 10.1080/01621459.2022.2152342
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/01621459.2022.2152342
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/01621459.2022.2152342?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Shuaishuai Chen & Jun Lu, 2023. "Quantile-Composited Feature Screening for Ultrahigh-Dimensional Data," Mathematics, MDPI, vol. 11(10), pages 1-21, May.
    2. Haowen Bao & Yongmiao Hong & Yuying Sun & Shouyang Wang, 2024. "Sparse Interval-valued Time Series Modeling with Machine Learning," Papers 2411.09452, arXiv.org.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:jnlasa:v:118:y:2023:i:542:p:805-817. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/UASA20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.