IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v129y2024i11d10.1007_s11192-024-05142-9.html
   My bibliography  Save this article

Exploring the applicability of large language models to citation context analysis

Author

Listed:
  • Kai Nishikawa

    (University of Tsukuba
    Ministry of Culture, Science and Sports (MEXT))

  • Hitoshi Koshiba

    (Ministry of Culture, Science and Sports (MEXT))

Abstract

Unlike traditional citation analysis, which assumes that all citations in a paper are equivalent, citation context analysis considers the contextual information of individual citations. However, citation context analysis requires creating a large amount of data through annotation, which hinders its widespread use. This study explored the applicability of Large Language Models (LLM)—particularly Generative Pre-trained Transformer (GPT)—to citation context analysis by comparing LLM and human annotation results. The results showed that LLM annotation is as good as or better than human annotation in terms of consistency but poor in terms of its predictive performance. Thus, having LLM immediately replace human annotators in citation context analysis is inappropriate. However, the annotation results obtained by LLM can be used as reference information when narrowing the annotation results obtained by multiple human annotators down to one; alternatively, the LLM can be used as an annotator when it is difficult to prepare sufficient human annotators. This study provides basic findings important for the future development of citation context analysis.

Suggested Citation

  • Kai Nishikawa & Hitoshi Koshiba, 2024. "Exploring the applicability of large language models to citation context analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(11), pages 6751-6777, November.
  • Handle: RePEc:spr:scient:v:129:y:2024:i:11:d:10.1007_s11192-024-05142-9
    DOI: 10.1007/s11192-024-05142-9
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-024-05142-9
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-024-05142-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Iman Tahamtan & Lutz Bornmann, 2019. "What do citation counts measure? An updated review of studies on citations in scientific documents published between 2006 and 2018," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(3), pages 1635-1684, December.
    2. Yu-Wei Chang, 2013. "A comparison of citation contexts between natural sciences and social sciences and humanities," Scientometrics, Springer;Akadémiai Kiadó, vol. 96(2), pages 535-553, August.
    3. Kai Nishikawa, 2023. "How and why are citations between disciplines made? A citation context analysis focusing on natural sciences and social sciences and humanities," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(5), pages 2975-2997, May.
    4. Wang, Shiyun & Mao, Jin & Lu, Kun & Cao, Yujie & Li, Gang, 2021. "Understanding interdisciplinary knowledge integration through citance analysis: A case study on eHealth," Journal of Informetrics, Elsevier, vol. 15(4).
    5. Guo Zhang & Ying Ding & Staša Milojević, 2013. "Citation content analysis (CCA): A framework for syntactic and semantic analysis of citation content," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 64(7), pages 1490-1503, July.
    6. Zhang, Chengzhi & Liu, Lifan & Wang, Yuzhuo, 2021. "Characterizing references from different disciplines: A perspective of citation content analysis," Journal of Informetrics, Elsevier, vol. 15(2).
    7. Sehrish Iqbal & Saeed-Ul Hassan & Naif Radi Aljohani & Salem Alelyani & Raheel Nawaz & Lutz Bornmann, 2021. "A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6551-6599, August.
    8. Guo Zhang & Ying Ding & Staša Milojević, 2013. "Citation content analysis (CCA): A framework for syntactic and semantic analysis of citation content," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 64(7), pages 1490-1503, July.
    9. Chi-Shiou Lin, 2018. "An analysis of citation functions in the humanities and social sciences research from the perspective of problematic citation analysis assumptions," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 797-813, August.
    10. Ying Ding & Guo Zhang & Tamy Chambers & Min Song & Xiaolong Wang & Chengxiang Zhai, 2014. "Content-based citation analysis: The next generation of citation analysis," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(9), pages 1820-1833, September.
    11. Dongqing Lyu & Xuanmin Ruan & Juan Xie & Ying Cheng, 2021. "The classification of citing motivations: a meta-synthesis," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(4), pages 3243-3264, April.
    12. Kai Nishikawa, 2023. "Correction: How and why are citations between disciplines made? A citation context analysis focusing on natural sciences and social sciences and humanities," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(5), pages 2999-2999, May.
    13. Peiling Wang & Dagobert Soergel, 1998. "A cognitive model of document use during a research project. Study I. Document selection," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 49(2), pages 115-133, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kai Nishikawa, 2023. "How and why are citations between disciplines made? A citation context analysis focusing on natural sciences and social sciences and humanities," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(5), pages 2975-2997, May.
    2. Xiaorui Jiang & Jingqiang Chen, 2023. "Contextualised segment-wise citation function classification," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(9), pages 5117-5158, September.
    3. Frederique Bordignon, 2022. "Critical citations in knowledge construction and citation analysis: from paradox to definition," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(2), pages 959-972, February.
    4. Sehrish Iqbal & Saeed-Ul Hassan & Naif Radi Aljohani & Salem Alelyani & Raheel Nawaz & Lutz Bornmann, 2021. "A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6551-6599, August.
    5. Tahamtan, Iman & Bornmann, Lutz, 2018. "Core elements in the process of citing publications: Conceptual overview of the literature," Journal of Informetrics, Elsevier, vol. 12(1), pages 203-216.
    6. Wang, Shiyun & Mao, Jin & Lu, Kun & Cao, Yujie & Li, Gang, 2021. "Understanding interdisciplinary knowledge integration through citance analysis: A case study on eHealth," Journal of Informetrics, Elsevier, vol. 15(4).
    7. Iman Tahamtan & Lutz Bornmann, 2019. "What do citation counts measure? An updated review of studies on citations in scientific documents published between 2006 and 2018," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(3), pages 1635-1684, December.
    8. Liu, Xiaojuan & Wang, Chenlin & Chen, Dar-Zen & Huang, Mu-Hsuan, 2022. "Exploring perception of retraction based on mentioned status in post-retraction citations," Journal of Informetrics, Elsevier, vol. 16(3).
    9. Kim, Ha Jin & Jeong, Yoo Kyung & Song, Min, 2016. "Content- and proximity-based author co-citation analysis using citation sentences," Journal of Informetrics, Elsevier, vol. 10(4), pages 954-966.
    10. Luca Cagliero & Paolo Garza & Mohammad Reza Kavoosifar & Elena Baralis, 2018. "Discovering cross-topic collaborations among researchers by exploiting weighted association rules," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 1273-1301, August.
    11. Chao Lu & Ying Ding & Chengzhi Zhang, 2017. "Understanding the impact change of a highly cited article: a content-based citation analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(2), pages 927-945, August.
    12. Zhang, Chengzhi & Liu, Lifan & Wang, Yuzhuo, 2021. "Characterizing references from different disciplines: A perspective of citation content analysis," Journal of Informetrics, Elsevier, vol. 15(2).
    13. Hamid R. Jamali & Majid Nabavi & Saeid Asadi, 2018. "How video articles are cited, the case of JoVE: Journal of Visualized Experiments," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(3), pages 1821-1839, December.
    14. Jiang, Xiaorui & Zhuge, Hai, 2019. "Forward search path count as an alternative indirect citation impact indicator," Journal of Informetrics, Elsevier, vol. 13(4).
    15. Kong, Ling & Zhang, Wei & Hu, Haotian & Liang, Zhu & Han, Yonggang & Wang, Dongbo & Song, Min, 2024. "Transdisciplinary fine-grained citation content analysis: A multi-task learning perspective for citation aspect and sentiment classification," Journal of Informetrics, Elsevier, vol. 18(3).
    16. Yunxue Cui & Yongzhen Wang & Xiaozhong Liu & Xianwen Wang & Xuhong Zhang, 2023. "Multidimensional scholarly citations: Characterizing and understanding scholars' citation behaviors," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(1), pages 115-127, January.
    17. Ruhao Zhang & Junpeng Yuan, 2022. "Enhanced author bibliographic coupling analysis using semantic and syntactic citation information," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(12), pages 7681-7706, December.
    18. Shiyun Wang & Jin Mao & Yujie Cao & Gang Li, 2022. "Integrated knowledge content in an interdisciplinary field: identification, classification, and application," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6581-6614, November.
    19. Chen, Lixin, 2017. "Do patent citations indicate knowledge linkage? The evidence from text similarities between patents and their citations," Journal of Informetrics, Elsevier, vol. 11(1), pages 63-79.
    20. Lutz Bornmann & Robin Haunschild & Sven E. Hug, 2018. "Visualizing the context of citations referencing papers published by Eugene Garfield: a new type of keyword co-occurrence analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(2), pages 427-437, February.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:129:y:2024:i:11:d:10.1007_s11192-024-05142-9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.