IDEAS home Printed from https://ideas.repec.org/a/eee/infome/v18y2024i3s1751157724000427.html
   My bibliography  Save this article

Comparing semantic representation methods for keyword analysis in bibliometric research

Author

Listed:
  • Chen, Guo
  • Hong, Siqi
  • Du, Chenxin
  • Wang, Panting
  • Yang, Zeyu
  • Xiao, Lu

Abstract

Semantic representation methods play a crucial role in text mining tasks. Although numerous approaches have been proposed and compared in text mining research, the comparison of semantic representation methods specifically for publication keywords in bibliometric studies has received limited attention. This lack of practical evidence makes it challenging for researchers to select suitable methods to obtain keyword vectors for downstream bibliometric tasks, potentially hindering the achievement of optimal results. To address this gap, this study conducts an experimental comparison of various typical semantic representation methods for keywords, aiming to provide quantitative evidence for bibliometric studies. The experiment focuses on keyword clustering as the fundamental task and evaluates 22 variations of five typical methods across four scientific domains. The methods compared are co-word matrix, co-word network, word embedding, network embedding, and “semantic + structure” integration. The comparison is based on fitting the clustering results of these methods with the “evaluation standard” specific to each domain. The empirical findings demonstrate that the co-word matrix exhibits subpar performance, whereas the co-word network and word embedding techniques display satisfactory performance. Among the five network embedding algorithms, LINE and Node2Vec outperform DeepWalk, Struc2Vec, and SDNE. Remarkably, both the “pre-training and fine-tuning” model and the “semantic + structure” model yield unsatisfactory results in terms of performance. Nevertheless, even with variations in the performance of these methods, no singular approach stands out as universally superior. When selecting methods in practical applications, comprehensive consideration of factors such as corpus size and semantic cohesion of domain keywords is crucial. This study advances our understanding of semantic representation methods for keyword analysis and contributes to the advancement of bibliometric analysis by providing valuable recommendations for researchers in selecting appropriate methods.

Suggested Citation

  • Chen, Guo & Hong, Siqi & Du, Chenxin & Wang, Panting & Yang, Zeyu & Xiao, Lu, 2024. "Comparing semantic representation methods for keyword analysis in bibliometric research," Journal of Informetrics, Elsevier, vol. 18(3).
  • Handle: RePEc:eee:infome:v:18:y:2024:i:3:s1751157724000427
    DOI: 10.1016/j.joi.2024.101529
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1751157724000427
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.joi.2024.101529?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Jia Feng & Yun Qiu Zhang & Hao Zhang, 2017. "Improving the co-word analysis method based on semantic distance," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(3), pages 1521-1531, June.
    2. Chen, Guo & Xiao, Lu, 2016. "Selecting publication keywords for domain analysis in bibliometrics: A comparison of three methods," Journal of Informetrics, Elsevier, vol. 10(1), pages 212-223.
    3. Zhong-Yi Wang & Gang Li & Chun-Ya Li & Ang Li, 2012. "Research on the semantic-based co-word analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 90(3), pages 855-875, March.
    4. Lu Xiao & Guo Chen & Jianjun Sun & Shuguang Han & Chengzhi Zhang, 2016. "Exploring the topic hierarchy of digital library research in China using keyword networks: a K-core decomposition approach," Scientometrics, Springer;Akadémiai Kiadó, vol. 108(3), pages 1085-1101, September.
    5. Chunmei Gan & Weijun Wang, 2015. "Research characteristics and status on social media in China: A bibliometric and co-word analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(2), pages 1167-1182, November.
    6. Ding, Ying, 2011. "Community detection: Topological vs. topical," Journal of Informetrics, Elsevier, vol. 5(4), pages 498-514.
    7. Zhang, Yi & Shang, Lining & Huang, Lu & Porter, Alan L. & Zhang, Guangquan & Lu, Jie & Zhu, Donghua, 2016. "A hybrid similarity measure method for patent portfolio analysis," Journal of Informetrics, Elsevier, vol. 10(4), pages 1108-1130.
    8. Yi Bu & Mengyang Li & Weiye Gu & Win‐bin Huang, 2021. "Topic diversity: A discipline scheme‐free diversity measurement for journals," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 72(5), pages 523-539, May.
    9. Leydesdorff, Loet & Welbers, Kasper, 2011. "The semantic mapping of words and co-words in contexts," Journal of Informetrics, Elsevier, vol. 5(3), pages 469-475.
    10. Luo, Zhuoran & Lu, Wei & He, Jiangen & Wang, Yuqi, 2022. "Combination of research questions and methods: A new measurement of scientific novelty," Journal of Informetrics, Elsevier, vol. 16(2).
    11. Eoghan Cunningham & Barry Smyth & Derek Greene, 2021. "Collaboration in the time of COVID: a scientometric analysis of multidisciplinary SARS-CoV-2 research," Palgrave Communications, Palgrave Macmillan, vol. 8(1), pages 1-8, December.
    12. Zhang, Yi & Lu, Jie & Liu, Feng & Liu, Qian & Porter, Alan & Chen, Hongshu & Zhang, Guangquan, 2018. "Does deep learning help topic extraction? A kernel k-means clustering method with word embedding," Journal of Informetrics, Elsevier, vol. 12(4), pages 1099-1117.
    13. Zhang, Xinyuan & Xie, Qing & Song, Min, 2021. "Measuring the impact of novelty, bibliometric, and academic-network factors on citation count using a neural network," Journal of Informetrics, Elsevier, vol. 15(2).
    14. Lubna Zafar & Nayyer Masood & Samreen Ayaz, 2023. "Impact of field of study (FoS) on authors’ citation trend," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(4), pages 2557-2576, April.
    15. An, Xin & Li, Jinghong & Xu, Shuo & Chen, Liang & Sun, Wei, 2021. "An improved patent similarity measurement based on entities and semantic relations," Journal of Informetrics, Elsevier, vol. 15(2).
    16. Hou, Jianhua & Wang, Dongyi & Li, Jing, 2022. "A new method for measuring the originality of academic articles based on knowledge units in semantic networks," Journal of Informetrics, Elsevier, vol. 16(3).
    17. Jeong, Yoo Kyung & Song, Min & Ding, Ying, 2014. "Content-based author co-citation analysis," Journal of Informetrics, Elsevier, vol. 8(1), pages 197-211.
    18. Bei-Ni Yan & Tian-Shyug Lee & Tsung-Pei Lee, 2015. "Mapping the intellectual structure of the Internet of Things (IoT) field (2000–2014): a co-word analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(2), pages 1285-1300, November.
    19. Si Shen & Jiangfeng Liu & Litao Lin & Ying Huang & Lin Zhang & Chang Liu & Yutong Feng & Dongbo Wang, 2023. "SsciBERT: a pre-trained language model for social science texts," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(2), pages 1241-1263, February.
    20. Chen, Baitong & Tsutsui, Satoshi & Ding, Ying & Ma, Feicheng, 2017. "Understanding the topic evolution in a scientific domain: An exploratory study for the field of information retrieval," Journal of Informetrics, Elsevier, vol. 11(4), pages 1175-1189.
    21. Eoghan Cunningham & Barry Smyth & Derek Greene, 2021. "Correction: Collaboration in the time of COVID: a scientometric analysis of multidisciplinary SARS-CoV-2 research," Palgrave Communications, Palgrave Macmillan, vol. 8(1), pages 1-1, December.
    22. Kai Hu & Huayi Wu & Kunlun Qi & Jingmin Yu & Siluo Yang & Tianxing Yu & Jie Zheng & Bo Liu, 2018. "A domain keyword analysis approach extending Term Frequency-Keyword Active Index with Google Word2Vec model," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(3), pages 1031-1068, March.
    23. Jung, Sukhwan & Yoon, Wan Chul, 2020. "An alternative topic model based on Common Interest Authors for topic evolution analysis," Journal of Informetrics, Elsevier, vol. 14(3).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lu Huang & Xiang Chen & Yi Zhang & Changtian Wang & Xiaoli Cao & Jiarun Liu, 2022. "Identification of topic evolution: network analytics with piecewise linear representation and word embedding," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(9), pages 5353-5383, September.
    2. Qikai Cheng & Jiamin Wang & Wei Lu & Yong Huang & Yi Bu, 2020. "Keyword-citation-keyword network: a new perspective of discipline knowledge structure analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(3), pages 1923-1943, September.
    3. Kai Hu & Huayi Wu & Kunlun Qi & Jingmin Yu & Siluo Yang & Tianxing Yu & Jie Zheng & Bo Liu, 2018. "A domain keyword analysis approach extending Term Frequency-Keyword Active Index with Google Word2Vec model," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(3), pages 1031-1068, March.
    4. Xiang Zhu & Yunqiu Zhang, 2020. "Co-word analysis method based on meta-path of subject knowledge network," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(2), pages 753-766, May.
    5. Jung, Sukhwan & Segev, Aviv, 2022. "DAC: Descendant-aware clustering algorithm for network-based topic emergence prediction," Journal of Informetrics, Elsevier, vol. 16(3).
    6. Chen, Hongshu & Jin, Qianqian & Wang, Ximeng & Xiong, Fei, 2022. "Profiling academic-industrial collaborations in bibliometric-enhanced topic networks: A case study on digitalization research," Technological Forecasting and Social Change, Elsevier, vol. 175(C).
    7. Wang, Xiaoguang & He, Jing & Huang, Han & Wang, Hongyu, 2022. "MatrixSim: A new method for detecting the evolution paths of research topics," Journal of Informetrics, Elsevier, vol. 16(4).
    8. Yang, Siluo & Han, Ruizhen & Wolfram, Dietmar & Zhao, Yuehua, 2016. "Visualizing the intellectual structure of information science (2006–2015): Introducing author keyword coupling analysis," Journal of Informetrics, Elsevier, vol. 10(1), pages 132-150.
    9. Jinkai Yu & Wenjing Bi, 2019. "Evolution of Marine Environmental Governance Policy in China," Sustainability, MDPI, vol. 11(18), pages 1-14, September.
    10. Liu Yang & Keping Li & Hangfei Huang, 2018. "A new network model for extracting text keywords," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(1), pages 339-361, July.
    11. Danilo Silva Carvalho & Lucas Lopes Felipe & Priscila Costa Albuquerque & Fabio Zicker & Bruna de Paula Fonseca, 2023. "Leadership and international collaboration on COVID-19 research: reducing the North–South divide?," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(8), pages 4689-4705, August.
    12. Lu Huang & Yijie Cai & Erdong Zhao & Shengting Zhang & Yue Shu & Jiao Fan, 2022. "Measuring the interdisciplinarity of Information and Library Science interactions using citation analysis and semantic analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6733-6761, November.
    13. Chen, Liang & Xu, Shuo & Zhu, Lijun & Zhang, Jing & Yang, Guancan & Xu, Haiyun, 2022. "A deep learning based method benefiting from characteristics of patents for semantic relation classification," Journal of Informetrics, Elsevier, vol. 16(3).
    14. Aliakbar Pourhatami & Mohammad Kaviyani-Charati & Bahareh Kargar & Hamed Baziyad & Maryam Kargar & Carlos Olmeda-Gómez, 2021. "Mapping the intellectual structure of the coronavirus field (2000–2020): a co-word analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6625-6657, August.
    15. Sung Kim & Derek Hansen & Richard Helps, 2018. "Computing research in the academy: insights from theses and dissertations," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(1), pages 135-158, January.
    16. Xiaoyu Liu & Xuefeng Wang & Donghua Zhu, 2022. "Reviewer recommendation method for scientific research proposals: a case for NSFC," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(6), pages 3343-3366, June.
    17. Faraji, Omid & Ezadpour, Mostafa & Rahrovi Dastjerdi, Alireza & Dolatzarei, Ehsan, 2022. "Conceptual structure of balanced scorecard research: A co-word analysis," Evaluation and Program Planning, Elsevier, vol. 94(C).
    18. Chaker Jebari & Enrique Herrera-Viedma & Manuel Jesus Cobo, 2021. "The use of citation context to detect the evolution of research topics: a large-scale analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(4), pages 2971-2989, April.
    19. Chen, Guo & Xiao, Lu, 2016. "Selecting publication keywords for domain analysis in bibliometrics: A comparison of three methods," Journal of Informetrics, Elsevier, vol. 10(1), pages 212-223.
    20. Hao Tan & Yuyue Hao, 2022. "Mapping the Global Evolution and Research Directions of Information Seeking, Sharing and Communication in Disasters: A Bibliometric Study," IJERPH, MDPI, vol. 19(22), pages 1-20, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:infome:v:18:y:2024:i:3:s1751157724000427. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/joi .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.