IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2309.17147.html
   My bibliography  Save this paper

Using Large Language Models for Qualitative Analysis can Introduce Serious Bias

Author

Listed:
  • Julian Ashwin
  • Aditya Chhabra
  • Vijayendra Rao

Abstract

Large Language Models (LLMs) are quickly becoming ubiquitous, but the implications for social science research are not yet well understood. This paper asks whether LLMs can help us analyse large-N qualitative data from open-ended interviews, with an application to transcripts of interviews with Rohingya refugees in Cox's Bazaar, Bangladesh. We find that a great deal of caution is needed in using LLMs to annotate text as there is a risk of introducing biases that can lead to misleading inferences. We here mean bias in the technical sense, that the errors that LLMs make in annotating interview transcripts are not random with respect to the characteristics of the interview subjects. Training simpler supervised models on high-quality human annotations with flexible coding leads to less measurement error and bias than LLM annotations. Therefore, given that some high quality annotations are necessary in order to asses whether an LLM introduces bias, we argue that it is probably preferable to train a bespoke model on these annotations than it is to use an LLM for annotation.

Suggested Citation

  • Julian Ashwin & Aditya Chhabra & Vijayendra Rao, 2023. "Using Large Language Models for Qualitative Analysis can Introduce Serious Bias," Papers 2309.17147, arXiv.org, revised Oct 2023.
  • Handle: RePEc:arx:papers:2309.17147
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2309.17147
    File Function: Latest version
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Bart Bonikowski & Laura K. Nelson, 2022. "From Ends to Means: The Promise of Computational Text Analysis for Theoretically Driven Sociological Research," Sociological Methods & Research, , vol. 51(4), pages 1469-1483, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.

      More about this item

      NEP fields

      This paper has been announced in the following NEP Reports:

      Statistics

      Access and download statistics

      Corrections

      All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2309.17147. See general information about how to correct material in RePEc.

      If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

      If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

      If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

      For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

      Please note that corrections may take a couple of weeks to filter through the various RePEc services.

      IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.