IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v117y2018i3d10.1007_s11192-018-2920-6.html
   My bibliography  Save this article

CAD: an algorithm for citation-anchors detection in research papers

Author

Listed:
  • Riaz Ahmad

    (Capital University of Science & Technology)

  • Muhammad Tanvir Afzal

    (Capital University of Science & Technology)

Abstract

Citations are very important parameters and are used to take many important decisions like ranking of researchers, institutions, countries, and to measure the relationship between research papers. All of these require accurate counting of citations and their occurrence (in-text citation counts) within the citing papers. Citation anchors refer to the citation made within the full text of the citing paper for example: ‘[1]’, ‘(Afzal et al, 2015)’, ‘[Afzal, 2015]’ etc. Identification of citation-anchors from the plain-text is a very challenging task due to the various styles and formats of citations. Recently, Shahid et al. highlighted some of the problems such as commonality in content, wrong allotment, mathematical ambiguities, and string variations etc in automatically identifying the in-text citation frequencies. The paper proposes an algorithm, CAD, for identification of citation-anchors and its in-text citation frequency based on different rules. For a comprehensive analysis, the dataset of research papers is prepared: on both Journal of Universal Computer Science (J.UCS) and (2) CiteSeer digital libraries. In experimental study, we conducted two experiments. In the first experiment, the proposed approach is compared with state-of-the-art technique over both datasets. The J.UCS dataset consists of 1200 research papers with 16,000 citation strings or references while the CiteSeer dataset consists of 52 research papers with 1850 references. The total dataset size becomes 1252 citing documents and 17,850 references. The experiments showed that CAD algorithm improved F-score by 44% and 37% respectively on both J.UCS and CiteSeer dataset over the contemporary technique (Shahid et al. in Int J Arab Inf Technol 12:481–488, 2014). The average score is 41% on both datasets. In the second experiment, the proposed approach is further analyzed against the existing state-of-the-art tools: CERMINE and GROBID. According to our results, the proposed approach is best performing with F1 of 0.99, followed by GROBID (F1 0.89) and CERMINE (F1 0.82).

Suggested Citation

  • Riaz Ahmad & Muhammad Tanvir Afzal, 2018. "CAD: an algorithm for citation-anchors detection in research papers," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(3), pages 1405-1423, December.
  • Handle: RePEc:spr:scient:v:117:y:2018:i:3:d:10.1007_s11192-018-2920-6
    DOI: 10.1007/s11192-018-2920-6
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-018-2920-6
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-018-2920-6?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Ding, Ying & Liu, Xiaozhong & Guo, Chun & Cronin, Blaise, 2013. "The distribution of references across texts: Some implications for citation analysis," Journal of Informetrics, Elsevier, vol. 7(3), pages 583-592.
    2. Hu, Zhigang & Chen, Chaomei & Liu, Zeyuan, 2013. "Where are citations located in the body of scientific articles? A study of the distributions of citation locations," Journal of Informetrics, Elsevier, vol. 7(4), pages 887-896.
    3. Kevin W. Boyack & Henry Small & Richard Klavans, 2013. "Improving the accuracy of co-citation clustering using full text," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 64(9), pages 1759-1767, September.
    4. Henry Small, 1973. "Co‐citation in the scientific literature: A new measure of the relationship between two documents," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 24(4), pages 265-269, July.
    5. Kevin W. Boyack & Henry Small & Richard Klavans, 2013. "Improving the accuracy of co‐citation clustering using full text," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 64(9), pages 1759-1767, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Sehrish Iqbal & Saeed-Ul Hassan & Naif Radi Aljohani & Salem Alelyani & Raheel Nawaz & Lutz Bornmann, 2021. "A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6551-6599, August.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ruhao Zhang & Junpeng Yuan, 2022. "Enhanced author bibliographic coupling analysis using semantic and syntactic citation information," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(12), pages 7681-7706, December.
    2. Raja Habib & Muhammad Tanvir Afzal, 2019. "Sections-based bibliographic coupling for research paper recommendation," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(2), pages 643-656, May.
    3. Dangzhi Zhao & Andreas Strotmann, 2020. "Deep and narrow impact: introducing location filtered citation counting," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(1), pages 503-517, January.
    4. Dangzhi Zhao & Andreas Strotmann, 2020. "Telescopic and panoramic views of library and information science research 2011–2018: a comparison of four weighting schemes for author co-citation analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(1), pages 255-270, July.
    5. Yun, Jinhyuk, 2022. "Generalization of bibliographic coupling and co-citation using the node split network," Journal of Informetrics, Elsevier, vol. 16(2).
    6. Boyack, Kevin W. & van Eck, Nees Jan & Colavizza, Giovanni & Waltman, Ludo, 2018. "Characterizing in-text citations in scientific articles: A large-scale analysis," Journal of Informetrics, Elsevier, vol. 12(1), pages 59-73.
    7. Kamal Sanguri & Atanu Bhuyan & Sabyasachi Patra, 2020. "A semantic similarity adjusted document co-citation analysis: a case of tourism supply chain," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 233-269, October.
    8. Rey-Long Liu, 2017. "A new bibliographic coupling measure with descriptive capability," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(2), pages 915-935, February.
    9. Michel Zitt, 2015. "Meso-level retrieval: IR-bibliometrics interplay and hybrid citation-words methods in scientific fields delineation," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(3), pages 2223-2245, March.
    10. Shenghui Wang & Rob Koopman, 2017. "Clustering articles based on semantic similarity," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 1017-1031, May.
    11. Hamid R. Jamali & Majid Nabavi & Saeid Asadi, 2018. "How video articles are cited, the case of JoVE: Journal of Visualized Experiments," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(3), pages 1821-1839, December.
    12. John P A Ioannidis & Kevin Boyack & Paul F Wouters, 2016. "Citation Metrics: A Primer on How (Not) to Normalize," PLOS Biology, Public Library of Science, vol. 14(9), pages 1-7, September.
    13. Tahamtan, Iman & Bornmann, Lutz, 2018. "Core elements in the process of citing publications: Conceptual overview of the literature," Journal of Informetrics, Elsevier, vol. 12(1), pages 203-216.
    14. CholMyong Pak & Guang Yu & Weibin Wang, 2018. "A study on the citation situation within the citing paper: citation distribution of references according to mention frequency," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(3), pages 905-918, March.
    15. Waltman, Ludo, 2016. "A review of the literature on citation impact indicators," Journal of Informetrics, Elsevier, vol. 10(2), pages 365-391.
    16. Mengyu Yu & Mazie Krehbiel & Samantha Thompson & Tatjana Miljkovic, 2020. "An exploration of gender gap using advanced data science tools: actuarial research community," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(2), pages 767-789, May.
    17. Bikun Chen & Dannan Deng & Zhouyan Zhong & Chengzhi Zhang, 2020. "Exploring linguistic characteristics of highly browsed and downloaded academic articles," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(3), pages 1769-1790, March.
    18. Weibin Wang & Zheng Wang & Tian Yu & CholMyong Pak & Guang Yu, 2020. "Research on citation mention times and contributions using a neural network," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2383-2400, December.
    19. Rotolo, Daniele & Hicks, Diana & Martin, Ben R., 2015. "What is an emerging technology?," Research Policy, Elsevier, vol. 44(10), pages 1827-1843.
    20. Wang, Shiyun & Mao, Jin & Lu, Kun & Cao, Yujie & Li, Gang, 2021. "Understanding interdisciplinary knowledge integration through citance analysis: A case study on eHealth," Journal of Informetrics, Elsevier, vol. 15(4).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:117:y:2018:i:3:d:10.1007_s11192-018-2920-6. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.