IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v129y2024i8d10.1007_s11192-024-05067-3.html
   My bibliography  Save this article

RefCit2vec: embedding models considering references and citations for measuring document similarity

Author

Listed:
  • Chien-chih Huang

    (National Taiwan University)

  • Kuang-hua Chen

    (National Taiwan University)

Abstract

This study outlines the intellectual structure of Library and Information Science in terms of the venues with RefCit2vec, an embedding method inspired by word2vec. The reference lists or cited-by lists of 62,077 articles in 35 venues (journals and proceedings) between 1928 and 2022 are converted into real number vectors by four independent models of RefCit2vec. The document similarities measured by the two models of RefCit2vec exhibit moderate correlations with bibliographical coupling metrics. In contrast, the similarities from the other two models moderately or strongly correlate with co-citation metrics. Each venue is represented by its centroid, the average vector of its constituent documents. By applying hierarchical agglomerative clustering on the venue centroids, 69% of venues robustly emerge in 6 out of 8 clusters. Four clusters consistently form the library-related branch. The bibliometrics/scientometrics branch contains only 1 cluster, whereas the information-related branch contains 3 clusters. 43% of venues are in six subgroups of consistent tree structures. An article is defined as SCIM-alike for it is closer to the SCIM centroid than half of SCIM articles are. 10% of JASIST articles are SCIM-alike upon their reference lists, and 5% of JASIST articles are SCIM-alike in terms of their cited-by lists. The percentage of SCIM-alike articles in JASIST hiked above the average between 2008 and 2018 but has dropped below the average since 2019. As we demonstrate the dynamics in LIS, citation embedding methods like RefCit2vec can incorporate citation-based, text-based, or authorship features to contribute to varied scenarios in investigating or exploring research fronts and scientific knowledge transfer.

Suggested Citation

  • Chien-chih Huang & Kuang-hua Chen, 2024. "RefCit2vec: embedding models considering references and citations for measuring document similarity," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(8), pages 4669-4693, August.
  • Handle: RePEc:spr:scient:v:129:y:2024:i:8:d:10.1007_s11192-024-05067-3
    DOI: 10.1007/s11192-024-05067-3
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-024-05067-3
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-024-05067-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Leo Egghe & Ronald Rousseau, 2002. "Co-citation, bibliographic coupling and a characterization of lattice citation networks," Scientometrics, Springer;Akadémiai Kiadó, vol. 55(3), pages 349-361, November.
    2. Staša Milojević & Cassidy R. Sugimoto & Erjia Yan & Ying Ding, 2011. "The cognitive structure of Library and Information Science: Analysis of article title words," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 62(10), pages 1933-1953, October.
    3. Zafar Ali & Irfan Ullah & Amin Ul Haq & Asim Ullah Jan & Khan Muhammad, 2021. "Correction to: An overview and evaluation of citation recommendation models," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(10), pages 8771-8771, October.
    4. Choi, Jaewoong & Yoon, Janghyeok, 2022. "Measuring knowledge exploration distance at the patent level: Application of network embedding and citation analysis," Journal of Informetrics, Elsevier, vol. 16(2).
    5. Ahlgren, Per & Colliander, Cristian, 2009. "Document–document similarity approaches and science mapping: Experimental comparison of five approaches," Journal of Informetrics, Elsevier, vol. 3(1), pages 49-63.
    6. Staša Milojević & Loet Leydesdorff, 2013. "Information metrics (iMetrics): a research specialty with a socio-cognitive identity?," Scientometrics, Springer;Akadémiai Kiadó, vol. 95(1), pages 141-157, April.
    7. Staša Milojević & Cassidy R. Sugimoto & Erjia Yan & Ying Ding, 2011. "The cognitive structure of Library and Information Science: Analysis of article title words," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 62(10), pages 1933-1953, October.
    8. Chanathip Pornprasit & Xin Liu & Pattararat Kiattipadungkul & Natthawut Kertkeidkachorn & Kyoung-Sook Kim & Thanapon Noraset & Saeed-Ul Hassan & Suppawong Tuarob, 2022. "Enhancing citation recommendation using citation network embedding," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(1), pages 233-264, January.
    9. Scott Deerwester & Susan T. Dumais & George W. Furnas & Thomas K. Landauer & Richard Harshman, 1990. "Indexing by latent semantic analysis," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 41(6), pages 391-407, September.
    10. Zafar Ali & Irfan Ullah & Amin Khan & Asim Ullah Jan & Khan Muhammad, 2021. "An overview and evaluation of citation recommendation models," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(5), pages 4083-4119, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shicheng Tan & Tao Zhang & Shu Zhao & Yanping Zhang, 2023. "Self-supervised scientific document recommendation based on contrastive learning," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(9), pages 5027-5049, September.
    2. Sabrina Petersohn & Thomas Heinze, 2018. "Professionalization of bibliometric research assessment. Insights from the history of the Leiden Centre for Science and Technology Studies (CWTS)," Science and Public Policy, Oxford University Press, vol. 45(4), pages 565-578.
    3. Yuen-Hsien Tseng & Ming-Yueh Tsay, 2013. "Journal clustering of library and information science for subfield delineation using the bibliometric analysis toolkit: CATAR," Scientometrics, Springer;Akadémiai Kiadó, vol. 95(2), pages 503-528, May.
    4. Jianhua Hou & Xiucai Yang & Chaomei Chen, 2018. "Emerging trends and new developments in information science: a document co-citation analysis (2009–2016)," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(2), pages 869-892, May.
    5. Daria Maltseva & Vladimir Batagelj, 2020. "iMetrics: the development of the discipline with many names," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 313-359, October.
    6. Yonghe Lu & Meilu Yuan & Jiaxin Liu & Minghong Chen, 2023. "Research on semantic representation and citation recommendation of scientific papers with multiple semantics fusion," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(2), pages 1367-1393, February.
    7. Xiaojuan Zhang & Shuqi Song & Yuping Xiong, 2024. "Personalized global citation recommendation with diversification awareness," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(7), pages 3625-3657, July.
    8. María Pinto & Rosaura Fernández-Pascual & David Caballero-Mariscal & Dora Sales, 2020. "Information literacy trends in higher education (2006–2019): visualizing the emerging field of mobile information literacy," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(2), pages 1479-1510, August.
    9. Yi Bu & Binglu Wang & Win-bin Huang & Shangkun Che & Yong Huang, 2018. "Using the appearance of citations in full text on author co-citation analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(1), pages 275-289, July.
    10. Yang, Siluo & Han, Ruizhen & Wolfram, Dietmar & Zhao, Yuehua, 2016. "Visualizing the intellectual structure of information science (2006–2015): Introducing author keyword coupling analysis," Journal of Informetrics, Elsevier, vol. 10(1), pages 132-150.
    11. Shesen Guo & Ganzhou Zhang, 2017. "Analyzing concept complexity, knowledge ageing and diffusion pattern of Mooc," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(1), pages 413-430, July.
    12. Hao Wang & Sanhong Deng & Xinning Su, 2016. "A study on construction and analysis of discipline knowledge structure of Chinese LIS based on CSSCI," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(3), pages 1725-1759, December.
    13. Michel Zitt, 2015. "Meso-level retrieval: IR-bibliometrics interplay and hybrid citation-words methods in scientific fields delineation," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(3), pages 2223-2245, March.
    14. Lin, Yiling & Evans, James A. & Wu, Lingfei, 2022. "New directions in science emerge from disconnection and discord," Journal of Informetrics, Elsevier, vol. 16(1).
    15. A. Abrizah & A. Noorhidawati & A. N. Zainab, 2015. "LIS journals categorization in the Journal Citation Report: a stated preference study," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(2), pages 1083-1099, February.
    16. Chaoqun Ni & Cassidy R. Sugimoto & Blaise Cronin, 2013. "Visualizing and comparing four facets of scholarly communication: producers, artifacts, concepts, and gatekeepers," Scientometrics, Springer;Akadémiai Kiadó, vol. 94(3), pages 1161-1173, March.
    17. Milojević, Staša & Sugimoto, Cassidy R. & Larivière, Vincent & Thelwall, Mike & Ding, Ying, 2014. "The role of handbooks in knowledge creation and diffusion: A case of science and technology studies," Journal of Informetrics, Elsevier, vol. 8(3), pages 693-709.
    18. Xie, Qing & Zhang, Xinyuan & Song, Min, 2021. "A network embedding-based scholar assessment indicator considering four facets: Research topic, author credit allocation, field-normalized journal impact, and published time," Journal of Informetrics, Elsevier, vol. 15(4).
    19. Bo Wang & Shengbo Liu & Kun Ding & Zeyuan Liu & Jing Xu, 2014. "Identifying technological topics and institution-topic distribution probability for patent competitive intelligence analysis: a case study in LTE technology," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(1), pages 685-704, October.
    20. Bar-Ilan, Judit, 2008. "Informetrics at the beginning of the 21st century—A review," Journal of Informetrics, Elsevier, vol. 2(1), pages 1-52.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:129:y:2024:i:8:d:10.1007_s11192-024-05067-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.