IDEAS home Printed from https://ideas.repec.org/a/gam/jpubli/v13y2025i1p12-d1610567.html
   My bibliography  Save this article

The Origins and Veracity of References ‘Cited’ by Generative Artificial Intelligence Applications: Implications for the Quality of Responses

Author

Listed:
  • Dirk H. R. Spennemann

    (School of Agricultural, Environmental and Veterinary Sciences, Charles Sturt University, Albury, NSW 2640, Australia
    Libraries Research Group, Charles Sturt University, Wagga Wagga, NSW 2678, Australia)

Abstract

The public release of ChatGPT in late 2022 has resulted in considerable publicity and has led to widespread discussion of the usefulness and capabilities of generative Artificial intelligence (Ai) language models. Its ability to extract and summarise data from textual sources and present them as human-like contextual responses makes it an eminently suitable tool to answer questions users might ask. Expanding on a previous analysis of the capabilities of ChatGPT3.5, this paper tested what archaeological literature appears to have been included in the training phase of three recent generative Ai language models: ChatGPT4o, ScholarGPT, and DeepSeek R1. While ChatGPT3.5 offered seemingly pertinent references, a large percentage proved to be fictitious. While the more recent model ScholarGPT, which is purportedly tailored towards academic needs, performed much better, it still offered a high rate of fictitious references compared to the general models ChatGPT4o and DeepSeek. Using ‘cloze’ analysis to make inferences on the sources ‘memorized’ by a generative Ai model, this paper was unable to prove that any of the four genAi models had perused the full texts of the genuine references. It can be shown that all references provided by ChatGPT and other OpenAi models, as well as DeepSeek, that were found to be genuine, have also been cited on Wikipedia pages. This strongly indicates that the source base for at least some, if not most, of the data is found in those pages and thus represents, at best, third-hand source material. This has significant implications in relation to the quality of the data available to generative Ai models to shape their answers. The implications of this are discussed.

Suggested Citation

  • Dirk H. R. Spennemann, 2025. "The Origins and Veracity of References ‘Cited’ by Generative Artificial Intelligence Applications: Implications for the Quality of Responses," Publications, MDPI, vol. 13(1), pages 1-23, March.
  • Handle: RePEc:gam:jpubli:v:13:y:2025:i:1:p:12-:d:1610567
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2304-6775/13/1/12/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2304-6775/13/1/12/
    Download Restriction: no
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jpubli:v:13:y:2025:i:1:p:12-:d:1610567. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.