IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v125y2020i2d10.1007_s11192-020-03583-6.html
   My bibliography  Save this article

Using neural-network based paragraph embeddings for the calculation of within and between document similarities

Author

Listed:
  • Bart Thijs

    (ECOOM, FEB
    LIG, SIGMA)

Abstract

Science mapping using document networks comes often with the implicit assumption that scientific papers are indivisible units with unique links to neighbour documents. Research on proximity in co-citation analysis and the study of lexical properties of sections and citation contexts indicate that this assumption doesn’t always hold. Moreover, the meaning of words and co-words depends on the context in which they appear. This study proposes the use of a neural network architecture for word and paragraph embeddings (Doc2Vec) for the measurement of similarity among those smaller units of analysis. It is shown that paragraphs in the “Introduction” and the “Discussion” Section are more similar to the abstract, that the similarity among paragraphs is related to -but not linearly- the distance between the paragraphs. The “Methodology” Section is least similar to the other sections. Abstracts of citing-cited documents are more similar than random pairs and the context in which a reference appears is most similar to the abstract of the cited document. This novel approach with higher granularity can be used for bibliometric aided retrieval and to assist in measuring interdisciplinarity through the application of network-based centrality measures.

Suggested Citation

  • Bart Thijs, 2020. "Using neural-network based paragraph embeddings for the calculation of within and between document similarities," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(2), pages 835-849, November.
  • Handle: RePEc:spr:scient:v:125:y:2020:i:2:d:10.1007_s11192-020-03583-6
    DOI: 10.1007/s11192-020-03583-6
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-020-03583-6
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-020-03583-6?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Bart Thijs & Wolfgang Glänzel, 2018. "The contribution of the lexical component in hybrid clustering, the case of four decades of “Scientometrics”," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(1), pages 21-33, April.
    2. Kevin W. Boyack, 2017. "Investigating the effect of global data on topic detection," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 999-1015, May.
    3. Leydesdorff, Loet & Rafols, Ismael, 2011. "Indicators of the interdisciplinarity of journals: Diversity, centrality, and citations," Journal of Informetrics, Elsevier, vol. 5(1), pages 87-100.
    4. Shenghui Wang & Rob Koopman, 2017. "Clustering articles based on semantic similarity," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 1017-1031, May.
    5. Giovanni Abramo & Ciriaco Andrea D'Angelo & Flavia Di Costa, 2012. "Identifying interdisciplinarity through the disciplinary classification of coauthors of scientific publications," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 63(11), pages 2206-2222, November.
    6. Giovanni Abramo & Ciriaco Andrea D'Angelo & Flavia Costa, 2012. "Identifying interdisciplinarity through the disciplinary classification of coauthors of scientific publications," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(11), pages 2206-2222, November.
    7. Lin Zhang & Ronald Rousseau & Wolfgang Glänzel, 2016. "Diversity of references as an indicator of the interdisciplinarity of journals: Taking similarity between subject fields into account," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 67(5), pages 1257-1265, May.
    8. Jian Wang & Bart Thijs & Wolfgang Glänzel, 2015. "Interdisciplinarity and Impact: Distinct Effects of Variety, Balance, and Disparity," PLOS ONE, Public Library of Science, vol. 10(5), pages 1-18, May.
    9. Loet Leydesdorff & Iina Hellsten, 2006. "Measuring the meaning of words in contexts: An automated analysis of controversies about 'Monarch butterflies,' 'Frankenfoods,' and 'stem cells'," Scientometrics, Springer;Akadémiai Kiadó, vol. 67(2), pages 231-258, May.
    10. Marc Bertin & Iana Atanassova & Cassidy R. Sugimoto & Vincent Lariviere, 2016. "The linguistic patterns and rhetorical structure of citation context: an approach using n-grams," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(3), pages 1417-1434, December.
    11. Ahlgren, Per & Colliander, Cristian, 2009. "Document–document similarity approaches and science mapping: Experimental comparison of five approaches," Journal of Informetrics, Elsevier, vol. 3(1), pages 49-63.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ruhao Zhang & Junpeng Yuan, 2022. "Enhanced author bibliographic coupling analysis using semantic and syntactic citation information," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(12), pages 7681-7706, December.
    2. Hongshu Chen & Xinna Song & Qianqian Jin & Ximeng Wang, 2022. "Network dynamics in university-industry collaboration: a collaboration-knowledge dual-layer network perspective," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6637-6660, November.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wolfgang Glänzel & Koenraad Debackere, 2022. "Various aspects of interdisciplinarity in research and how to quantify and measure those," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(9), pages 5551-5569, September.
    2. Alfonso Ávila-Robinson & Cristian Mejia & Shintaro Sengoku, 2021. "Are bibliometric measures consistent with scientists’ perceptions? The case of interdisciplinarity in research," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(9), pages 7477-7502, September.
    3. Lin Zhang & Beibei Sun & Zaida Chinchilla-Rodríguez & Lixin Chen & Ying Huang, 2018. "Interdisciplinarity and collaboration: on the relationship between disciplinary diversity in departmental affiliations and reference lists," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(1), pages 271-291, October.
    4. Bei Zeng & Haihua Lyu & Zhenyue Zhao & Jiang Li, 2021. "Exploring the direction and diversity of interdisciplinary knowledge diffusion: A case study of professor Zeyuan Liu's scientific publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 6253-6272, July.
    5. Abramo, Giovanni & D’Angelo, Ciriaco Andrea & Zhang, Lin, 2018. "A comparison of two approaches for measuring interdisciplinary research output: The disciplinary diversity of authors vs the disciplinary diversity of the reference list," Journal of Informetrics, Elsevier, vol. 12(4), pages 1182-1193.
    6. Ugo Moschini & Elena Fenialdi & Cinzia Daraio & Giancarlo Ruocco & Elisa Molinari, 2020. "A comparison of three multidisciplinarity indices based on the diversity of Scopus subject areas of authors’ documents, their bibliography and their citing papers," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(2), pages 1145-1158, November.
    7. Hoang-Son Pham & Bram Vancraeynest & Hanne Poelmans & Sadia Vancauwenbergh & Amr Ali-Eldin, 2023. "Identifying interdisciplinary research in research projects," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(10), pages 5521-5544, October.
    8. Núria Bautista-Puig & Jorge Mañana-Rodríguez & Antonio Eleazar Serrano-López, 2021. "Role taxonomy of green and sustainable science and technology journals: exportation, importation, specialization and interdisciplinarity," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(5), pages 3871-3892, May.
    9. Carusi, Chiara & Bianchi, Giuseppe, 2019. "Scientific community detection via bipartite scholar/journal graph co-clustering," Journal of Informetrics, Elsevier, vol. 13(1), pages 354-386.
    10. Zhichao Ba & Yujie Cao & Jin Mao & Gang Li, 2019. "A hierarchical approach to analyzing knowledge integration between two fields—a case study on medical informatics and computer science," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(3), pages 1455-1486, June.
    11. Yu-Wei Chang, 2019. "Are articles in library and information science (LIS) journals primarily contributed to by LIS authors?," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(1), pages 81-104, October.
    12. Giovanni Abramo & Ciriaco Andrea D’Angelo & Flavia Costa, 2017. "Specialization versus diversification in research activities: the extent, intensity and relatedness of field diversification by individual scientists," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(3), pages 1403-1418, September.
    13. Feng Shi & James Evans, 2023. "Surprising combinations of research contents and contexts are related to impact and emerge with scientific outsiders from distant disciplines," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    14. Liang Hu & Win-bin Huang & Yi Bu, 2024. "Interdisciplinary research attracts greater attention from policy documents: evidence from COVID-19," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-10, December.
    15. Jingjing Ren & Fang Wang & Minglu Li, 2023. "Dynamics and characteristics of interdisciplinary research in scientific breakthroughs: case studies of Nobel-winning research in the past 120 years," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(8), pages 4383-4419, August.
    16. Shengli Deng & Sudi Xia, 2020. "Mapping the interdisciplinarity in information behavior research: a quantitative study using diversity measure and co-occurrence analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(1), pages 489-513, July.
    17. Shiji Chen & Clément Arsenault & Yves Gingras & Vincent Larivière, 2015. "Exploring the interdisciplinary evolution of a discipline: the case of Biochemistry and Molecular Biology," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(2), pages 1307-1323, February.
    18. Takahiro Kawamura & Katsutaro Watanabe & Naoya Matsumoto & Shusaku Egami & Mari Jibu, 2018. "Funding map using paragraph embedding based on semantic diversity," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 941-958, August.
    19. Meijun Liu & Sijie Yang & Yi Bu & Ning Zhang, 2023. "Female early-career scientists have conducted less interdisciplinary research in the past six decades: evidence from doctoral theses," Palgrave Communications, Palgrave Macmillan, vol. 10(1), pages 1-16, December.
    20. Mehdi Rhaiem & Nabil Amara, 2020. "Determinants of research efficiency in Canadian business schools: evidence from scholar-level data," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 53-99, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:125:y:2020:i:2:d:10.1007_s11192-020-03583-6. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.