IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v129y2024i8d10.1007_s11192-024-05067-3.html
   My bibliography  Save this article

RefCit2vec: embedding models considering references and citations for measuring document similarity

Author

Listed:
  • Chien-chih Huang

    (National Taiwan University)

  • Kuang-hua Chen

    (National Taiwan University)

Abstract

This study outlines the intellectual structure of Library and Information Science in terms of the venues with RefCit2vec, an embedding method inspired by word2vec. The reference lists or cited-by lists of 62,077 articles in 35 venues (journals and proceedings) between 1928 and 2022 are converted into real number vectors by four independent models of RefCit2vec. The document similarities measured by the two models of RefCit2vec exhibit moderate correlations with bibliographical coupling metrics. In contrast, the similarities from the other two models moderately or strongly correlate with co-citation metrics. Each venue is represented by its centroid, the average vector of its constituent documents. By applying hierarchical agglomerative clustering on the venue centroids, 69% of venues robustly emerge in 6 out of 8 clusters. Four clusters consistently form the library-related branch. The bibliometrics/scientometrics branch contains only 1 cluster, whereas the information-related branch contains 3 clusters. 43% of venues are in six subgroups of consistent tree structures. An article is defined as SCIM-alike for it is closer to the SCIM centroid than half of SCIM articles are. 10% of JASIST articles are SCIM-alike upon their reference lists, and 5% of JASIST articles are SCIM-alike in terms of their cited-by lists. The percentage of SCIM-alike articles in JASIST hiked above the average between 2008 and 2018 but has dropped below the average since 2019. As we demonstrate the dynamics in LIS, citation embedding methods like RefCit2vec can incorporate citation-based, text-based, or authorship features to contribute to varied scenarios in investigating or exploring research fronts and scientific knowledge transfer.

Suggested Citation

  • Chien-chih Huang & Kuang-hua Chen, 2024. "RefCit2vec: embedding models considering references and citations for measuring document similarity," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(8), pages 4669-4693, August.
  • Handle: RePEc:spr:scient:v:129:y:2024:i:8:d:10.1007_s11192-024-05067-3
    DOI: 10.1007/s11192-024-05067-3
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-024-05067-3
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-024-05067-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:129:y:2024:i:8:d:10.1007_s11192-024-05067-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.