IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v126y2021i8d10.1007_s11192-021-04055-1.html
   My bibliography  Save this article

A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies

Author

Listed:
  • Sehrish Iqbal

    (Information Technology University)

  • Saeed-Ul Hassan

    (Information Technology University)

  • Naif Radi Aljohani

    (King Abdulaziz University)

  • Salem Alelyani

    (King Khalid University
    King Khalid University)

  • Raheel Nawaz

    (Manchester Metropolitan University)

  • Lutz Bornmann

    (Administrative Headquarters of the Max Planck Society)

Abstract

In-text citation analysis is one of the most frequently used methods in research evaluation. We are seeing significant growth in citation analysis through bibliometric metadata, primarily due to the availability of citation databases such as the Web of Science, Scopus, Google Scholar, Microsoft Academic, and Dimensions. Due to better access to full-text publication corpora in recent years, information scientists have gone far beyond traditional bibliometrics by tapping into advancements in full-text data processing techniques to measure the impact of scientific publications in contextual terms. This has led to technical developments in citation classifications, citation sentiment analysis, citation summarisation, and citation-based recommendation. This article aims to narratively review the studies on these developments. Its primary focus is on publications that have used natural language processing and machine learning techniques to analyse citations.

Suggested Citation

  • Sehrish Iqbal & Saeed-Ul Hassan & Naif Radi Aljohani & Salem Alelyani & Raheel Nawaz & Lutz Bornmann, 2021. "A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6551-6599, August.
  • Handle: RePEc:spr:scient:v:126:y:2021:i:8:d:10.1007_s11192-021-04055-1
    DOI: 10.1007/s11192-021-04055-1
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-021-04055-1
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-021-04055-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Ding, Ying & Liu, Xiaozhong & Guo, Chun & Cronin, Blaise, 2013. "The distribution of references across texts: Some implications for citation analysis," Journal of Informetrics, Elsevier, vol. 7(3), pages 583-592.
    2. Iqra Safder & Saeed-Ul Hassan, 2019. "Bibliometric-enhanced information retrieval: a novel deep feature engineering approach for algorithm searching from full-text publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(1), pages 257-277, April.
    3. Saeed-Ul Hassan & Mubashir Imran & Sehrish Iqbal & Naif Radi Aljohani & Raheel Nawaz, 2018. "Deep context of citations using machine-learning models in scholarly full-text articles," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(3), pages 1645-1662, December.
    4. Guo Zhang & Ying Ding & Staša Milojević, 2013. "Citation content analysis (CCA): A framework for syntactic and semantic analysis of citation content," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 64(7), pages 1490-1503, July.
    5. Riaz Ahmad & Muhammad Tanvir Afzal, 2018. "CAD: an algorithm for citation-anchors detection in research papers," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(3), pages 1405-1423, December.
    6. Michael H. MacRoberts & Barbara R. MacRoberts, 1989. "Problems of citation analysis: A critical review," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 40(5), pages 342-349, September.
    7. Guo Zhang & Ying Ding & Staša Milojević, 2013. "Citation content analysis (CCA): A framework for syntactic and semantic analysis of citation content," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 64(7), pages 1490-1503, July.
    8. Small, Henry, 2018. "Characterizing highly cited method and non-method papers using citation contexts: The role of uncertainty," Journal of Informetrics, Elsevier, vol. 12(2), pages 461-480.
    9. Lutz Bornmann & Robin Haunschild & Sven E. Hug, 2018. "Visualizing the context of citations referencing papers published by Eugene Garfield: a new type of keyword co-occurrence analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(2), pages 427-437, February.
    10. Patricia A. Hooten, 1991. "Frequency and functional use of cited documents in information science," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 42(6), pages 397-404, July.
    11. Marc Bertin & Iana Atanassova & Yves Gingras & Vincent Larivière, 2016. "The invariant distribution of references in scientific articles," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 67(1), pages 164-177, January.
    12. Ying Ding & Guo Zhang & Tamy Chambers & Min Song & Xiaolong Wang & Chengxiang Zhai, 2014. "Content-based citation analysis: The next generation of citation analysis," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(9), pages 1820-1833, September.
    13. Xiaodan Zhu & Peter Turney & Daniel Lemire & André Vellino, 2015. "Measuring academic influence: Not all citations are equal," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 66(2), pages 408-427, February.
    14. Shutian Ma & Chengzhi Zhang & Xiaozhong Liu, 2020. "A review of citation recommendation: from textual content to enriched context," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(3), pages 1445-1472, March.
    15. Metin Doslu & Haluk O. Bingol, 2016. "Context sensitive article ranking with citation context analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 108(2), pages 653-671, August.
    16. Iman Tahamtan & Lutz Bornmann, 2019. "What do citation counts measure? An updated review of studies on citations in scientific documents published between 2006 and 2018," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(3), pages 1635-1684, December.
    17. Zehra Taşkın & Umut Al, 2018. "A content-based citation analysis study based on text categorization," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(1), pages 335-357, January.
    18. Samaneh Karimi & Luis Moraes & Avisha Das & Azadeh Shakery & Rakesh Verma, 2018. "Citance-based retrieval and summarization using IR and machine learning," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 1331-1366, August.
    19. Jeong, Yoo Kyung & Song, Min & Ding, Ying, 2014. "Content-based author co-citation analysis," Journal of Informetrics, Elsevier, vol. 8(1), pages 197-211.
    20. Chandra G. Prabha, 1983. "Some aspects of citation behavior: A pilot study in business administration," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 34(3), pages 202-206, May.
    21. Aaron Elkiss & Siwei Shen & Anthony Fader & Güneş Erkan & David States & Dragomir Radev, 2008. "Blind men and elephants: What do citation summaries tell us about a research article?," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 59(1), pages 51-62, January.
    22. Yu-Wei Chang, 2013. "A comparison of citation contexts between natural sciences and social sciences and humanities," Scientometrics, Springer;Akadémiai Kiadó, vol. 96(2), pages 535-553, August.
    23. Boyack, Kevin W. & van Eck, Nees Jan & Colavizza, Giovanni & Waltman, Ludo, 2018. "Characterizing in-text citations in scientific articles: A large-scale analysis," Journal of Informetrics, Elsevier, vol. 12(1), pages 59-73.
    24. Susan Bonzi, 1982. "Characteristics of a Literature as Predictors of Relatedness Between Cited and Citing Works," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 33(4), pages 208-216, July.
    25. Charles Oppenheim & Susan P. Renn, 1978. "Highly cited old papers and the reasons why they continue to be cited," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 29(5), pages 225-231, September.
    26. Hu, Zhigang & Lin, Gege & Sun, Taian & Hou, Haiyan, 2017. "Understanding multiply mentioned references," Journal of Informetrics, Elsevier, vol. 11(4), pages 948-958.
    27. Muhammad Touseef Ikram & Muhammad Tanvir Afzal, 2019. "Aspect based citation sentiment analysis using linguistic patterns for better comprehension of scientific knowledge," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(1), pages 73-95, April.
    28. Henry Small, 2004. "On the shoulders of Robert Merton: Towards a normative theory of citation," Scientometrics, Springer;Akadémiai Kiadó, vol. 60(1), pages 71-79, May.
    29. Shutian Ma & Jin Xu & Chengzhi Zhang, 2018. "Automatic identification of cited text spans: a multi-classifier approach over imbalanced dataset," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 1303-1330, August.
    30. Hu, Zhigang & Chen, Chaomei & Liu, Zeyuan, 2013. "Where are citations located in the body of scientific articles? A study of the distributions of citation locations," Journal of Informetrics, Elsevier, vol. 7(4), pages 887-896.
    31. Small, Henry & Tseng, Hung & Patek, Mike, 2017. "Discovering discoveries: Identifying biomedical discoveries using citation contexts," Journal of Informetrics, Elsevier, vol. 11(1), pages 46-62.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Xin An & Xin Sun & Shuo Xu, 2022. "Important citations identification with semi-supervised classification model," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6533-6555, November.
    2. Yasher Ali & Osman Khalid & Imran Ali Khan & Syed Sajid Hussain & Faisal Rehman & Sajid Siraj & Raheel Nawaz, 2022. "A hybrid group-based movie recommendation framework with overlapping memberships," PLOS ONE, Public Library of Science, vol. 17(3), pages 1-28, March.
    3. Xiaorui Jiang & Jingqiang Chen, 2023. "Contextualised segment-wise citation function classification," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(9), pages 5117-5158, September.
    4. Xiaorui Jiang & Junjun Liu, 2023. "Extracting the evolutionary backbone of scientific domains: The semantic main path network analysis approach based on citation context analysis," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(5), pages 546-569, May.
    5. Yunxue Cui & Yongzhen Wang & Xiaozhong Liu & Xianwen Wang & Xuhong Zhang, 2023. "Multidimensional scholarly citations: Characterizing and understanding scholars' citation behaviors," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(1), pages 115-127, January.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Naif Radi Aljohani & Ayman Fayoumi & Saeed-Ul Hassan, 2021. "An in-text citation classification predictive model for a scholarly search system," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5509-5529, July.
    2. Wang, Shiyun & Mao, Jin & Lu, Kun & Cao, Yujie & Li, Gang, 2021. "Understanding interdisciplinary knowledge integration through citance analysis: A case study on eHealth," Journal of Informetrics, Elsevier, vol. 15(4).
    3. Iman Tahamtan & Lutz Bornmann, 2019. "What do citation counts measure? An updated review of studies on citations in scientific documents published between 2006 and 2018," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(3), pages 1635-1684, December.
    4. Dongqing Lyu & Xuanmin Ruan & Juan Xie & Ying Cheng, 2021. "The classification of citing motivations: a meta-synthesis," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(4), pages 3243-3264, April.
    5. Hamid R. Jamali & Majid Nabavi & Saeid Asadi, 2018. "How video articles are cited, the case of JoVE: Journal of Visualized Experiments," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(3), pages 1821-1839, December.
    6. Dangzhi Zhao & Andreas Strotmann, 2020. "Telescopic and panoramic views of library and information science research 2011–2018: a comparison of four weighting schemes for author co-citation analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(1), pages 255-270, July.
    7. Chao Lu & Ying Ding & Chengzhi Zhang, 2017. "Understanding the impact change of a highly cited article: a content-based citation analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(2), pages 927-945, August.
    8. Tahamtan, Iman & Bornmann, Lutz, 2018. "Core elements in the process of citing publications: Conceptual overview of the literature," Journal of Informetrics, Elsevier, vol. 12(1), pages 203-216.
    9. Zhang, Chengzhi & Liu, Lifan & Wang, Yuzhuo, 2021. "Characterizing references from different disciplines: A perspective of citation content analysis," Journal of Informetrics, Elsevier, vol. 15(2).
    10. Liyue Chen & Jielan Ding & Vincent Larivière, 2022. "Measuring the citation context of national self‐references," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 73(5), pages 671-686, May.
    11. Boyack, Kevin W. & van Eck, Nees Jan & Colavizza, Giovanni & Waltman, Ludo, 2018. "Characterizing in-text citations in scientific articles: A large-scale analysis," Journal of Informetrics, Elsevier, vol. 12(1), pages 59-73.
    12. Ruhao Zhang & Junpeng Yuan, 2022. "Enhanced author bibliographic coupling analysis using semantic and syntactic citation information," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(12), pages 7681-7706, December.
    13. Dangzhi Zhao & Andreas Strotmann, 2020. "Deep and narrow impact: introducing location filtered citation counting," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(1), pages 503-517, January.
    14. Toluwase Victor Asubiaro & Isola Ajiferuke, 2022. "Semantic similarity-based credit attribution on citation paths: a method for allocating residual citation to and investigating depth of influence of scientific communications," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6257-6277, November.
    15. Mingyang Wang & Jiaqi Zhang & Shijia Jiao & Xiangrong Zhang & Na Zhu & Guangsheng Chen, 2020. "Important citation identification by exploiting the syntactic and contextual information of citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2109-2129, December.
    16. Shengzhi Huang & Jiajia Qian & Yong Huang & Wei Lu & Yi Bu & Jinqing Yang & Qikai Cheng, 2022. "Disclosing the relationship between citation structure and future impact of a publication," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 73(7), pages 1025-1042, July.
    17. Weibin Wang & Zheng Wang & Tian Yu & CholMyong Pak & Guang Yu, 2020. "Research on citation mention times and contributions using a neural network," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2383-2400, December.
    18. Kai Nishikawa, 2023. "How and why are citations between disciplines made? A citation context analysis focusing on natural sciences and social sciences and humanities," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(5), pages 2975-2997, May.
    19. Aurora González-Teruel & Francisca Abad-García, 2018. "The influence of Elfreda Chatman’s theories: a citation context analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(3), pages 1793-1819, December.
    20. Matthias Sebastian Rüdiger & David Antons & Torsten-Oliver Salge, 2021. "The explanatory power of citations: a new approach to unpacking impact in science," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(12), pages 9779-9809, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:126:y:2021:i:8:d:10.1007_s11192-021-04055-1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.