IDEAS home Printed from https://ideas.repec.org/p/ehl/lserod/123029.html
   My bibliography  Save this paper

Active learning with biased non-response to label requests

Author

Listed:
  • Robinson, Thomas
  • Tax, Niek
  • Mudd, Richard
  • Guy, Ido

Abstract

Active learning can improve the efficiency of training prediction models by identifying the most informative new labels to acquire. However, non-response to label requests can impact active learning’s effectiveness in real-world contexts. We conceptualise this degradation by considering the type of non-response present in the data, demonstrating that biased non-response is particularly detrimental to model performance. We argue that biased non-response is likely in contexts where the labelling process, by nature, relies on user interactions. To mitigate the impact of biased non-response, we propose a cost-based correction to the sampling strategy–the Upper Confidence Bound of the Expected Utility (UCB-EU)–that can, plausibly, be applied to any active learning algorithm. Through experiments, we demonstrate that our method successfully reduces the harm from labelling non-response in many settings. However, we also characterise settings where the non-response bias in the annotations remains detrimental under UCB-EU for specific sampling methods and data generating processes. Finally, we evaluate our method on a real-world dataset from an e-commerce platform. We show that UCB-EU yields substantial performance improvements to conversion models that are trained on clicked impressions. Most generally, this research serves to both better conceptualise the interplay between types of non-response and model improvements via active learning, and to provide a practical, easy-to-implement correction that mitigates model degradation.

Suggested Citation

  • Robinson, Thomas & Tax, Niek & Mudd, Richard & Guy, Ido, 2024. "Active learning with biased non-response to label requests," LSE Research Online Documents on Economics 123029, London School of Economics and Political Science, LSE Library.
  • Handle: RePEc:ehl:lserod:123029
    as

    Download full text from publisher

    File URL: http://eprints.lse.ac.uk/123029/
    File Function: Open access version.
    Download Restriction: no
    ---><---

    More about this item

    Keywords

    active learning; non-response; missing data; e-commerce; CTR prediction;
    All these keywords.

    JEL classification:

    • L81 - Industrial Organization - - Industry Studies: Services - - - Retail and Wholesale Trade; e-Commerce

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ehl:lserod:123029. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: LSERO Manager (email available below). General contact details of provider: https://edirc.repec.org/data/lsepsuk.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.