IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v116y2018i1d10.1007_s11192-018-2743-5.html
   My bibliography  Save this article

A new network model for extracting text keywords

Author

Listed:
  • Liu Yang

    (Beijing Jiaotong University)

  • Keping Li

    (Beijing Jiaotong University)

  • Hangfei Huang

    (Beijing Jiaotong University)

Abstract

Text keywords are defined as meaningful and important words in a document, which provide a precise overview of its content and reflect the author’s writing intention. Keyword extraction methods have received a lot of attentions, among which is the network-based method. However, existing network-based keyword extraction methods only consider the connections between words in a document, while ignoring the impact of sentences. Since a sentence is made of many words, while words affect one another in a sentence, neglecting the influence of sentences will result in the loss of information. In this paper, we introduce a word network whose nodes represent words in a document, and define that any keyword extraction method based on a word network is called as a Word-net method. Then, we propose a new network model which considers the influence of sentences, and a new word-sentence method based on the new model. Experimental results demonstrate that our method outperforms the Word-net method, the classical term frequency-inverse document frequency (TF-IDF) method, most frequent method and TextRank method. The precision, recall, and F-measure of our result are respectively 7.95, 8.27 and 6.54% higher than the Word-net result, and the average precision of our result is 17.56% higher than the TF-IDF result. A two-way analysis of variance is employed to validate the empirical analysis, which indicates that keyword extraction methods and keyword numbers have statistically significant effects on the evaluation of metric values.

Suggested Citation

  • Liu Yang & Keping Li & Hangfei Huang, 2018. "A new network model for extracting text keywords," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(1), pages 339-361, July.
  • Handle: RePEc:spr:scient:v:116:y:2018:i:1:d:10.1007_s11192-018-2743-5
    DOI: 10.1007/s11192-018-2743-5
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-018-2743-5
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-018-2743-5?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Chen, Guo & Xiao, Lu, 2016. "Selecting publication keywords for domain analysis in bibliometrics: A comparison of three methods," Journal of Informetrics, Elsevier, vol. 10(1), pages 212-223.
    2. Perc, Matjaž, 2010. "Growth and structure of Slovenia’s scientific collaboration network," Journal of Informetrics, Elsevier, vol. 4(4), pages 475-482.
    3. Xinning Su & Sanhong Deng & Si Shen, 2014. "The design and application value of the Chinese Social Science Citation Index," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(3), pages 1567-1582, March.
    4. Zhong-Yi Wang & Gang Li & Chun-Ya Li & Ang Li, 2012. "Research on the semantic-based co-word analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 90(3), pages 855-875, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yang, Jinqing & Bu, Yi & Lu, Wei & Huang, Yong & Hu, Jiming & Huang, Shengzhi & Zhang, Li, 2022. "Identifying keyword sleeping beauties: A perspective on the knowledge diffusion process," Journal of Informetrics, Elsevier, vol. 16(1).
    2. Samuel Zanferdini Oliva & Livia Oliveira-Ciabati & Denise Gazotto Dezembro & Mário Sérgio Adolfi Júnior & Maísa Carvalho Silva & Hugo Cesar Pessotti & Juliana Tarossi Pollettini, 2021. "Text structuring methods based on complex network: a systematic review," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 1471-1493, February.
    3. Liu Yang & Keping Li & Dan Zhao & Shuang Gu & Dongyang Yan, 2019. "A Network Method for Identifying the Root Cause of High-Speed Rail Faults Based on Text Data," Energies, MDPI, vol. 12(10), pages 1-17, May.
    4. YiJun Liu & Li Zhang & Xiaoli Lian, 2020. "A document-structure-based complex network model for extracting text keywords," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(3), pages 1765-1791, September.
    5. Tingting Zhang & Baozhen Lee & Qinghua Zhu & Xi Han & Ke Chen, 2023. "Document keyword extraction based on semantic hierarchical graph model," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(5), pages 2623-2647, May.
    6. Chengzhi Zhang & Lei Zhao & Mengyuan Zhao & Yingyi Zhang, 2022. "Enhancing keyphrase extraction from academic articles with their reference information," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(2), pages 703-731, February.
    7. Xicheng Yin & Hongwei Wang & Pei Yin & Hengmin Zhu & Zhenyu Zhang, 2020. "A co-occurrence based approach of automatic keyword expansion using mass diffusion," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(3), pages 1885-1905, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kai Hu & Huayi Wu & Kunlun Qi & Jingmin Yu & Siluo Yang & Tianxing Yu & Jie Zheng & Bo Liu, 2018. "A domain keyword analysis approach extending Term Frequency-Keyword Active Index with Google Word2Vec model," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(3), pages 1031-1068, March.
    2. Chen, Guo & Xiao, Lu, 2016. "Selecting publication keywords for domain analysis in bibliometrics: A comparison of three methods," Journal of Informetrics, Elsevier, vol. 10(1), pages 212-223.
    3. Qikai Cheng & Jiamin Wang & Wei Lu & Yong Huang & Yi Bu, 2020. "Keyword-citation-keyword network: a new perspective of discipline knowledge structure analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(3), pages 1923-1943, September.
    4. Peng Liu & Haoxiang Xia, 2015. "Structure and evolution of co-authorship network in an interdisciplinary research field," Scientometrics, Springer;Akadémiai Kiadó, vol. 103(1), pages 101-134, April.
    5. Kim, Jinseok & Diesner, Jana, 2015. "The effect of data pre-processing on understanding the evolution of collaboration networks," Journal of Informetrics, Elsevier, vol. 9(1), pages 226-236.
    6. Weimao Ke, 2013. "A fitness model for scholarly impact analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 94(3), pages 981-998, March.
    7. Xiang Zhu & Yunqiu Zhang, 2020. "Co-word analysis method based on meta-path of subject knowledge network," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(2), pages 753-766, May.
    8. Cherry C. I. Lau & Christina W. Y. Wong, 2024. "Achieving sustainable development with sustainable packaging: A natural‐resource‐based view perspective," Business Strategy and the Environment, Wiley Blackwell, vol. 33(5), pages 4766-4787, July.
    9. Li, Qing & Zhang, Huaige & Hong, Xianpei, 2020. "Knowledge structure of technology licensing based on co-keywords network: A review and future directions," International Review of Economics & Finance, Elsevier, vol. 66(C), pages 154-165.
    10. Maxim Kotsemir & Tatiana Kuznetsova & Elena Nasybulina & Anna Pikalova, 2015. "Identifying Directions for Russia’s Science and Technology Cooperation," Foresight-Russia Форсайт, CyberLeninka;Федеральное государственное автономное образовательное учреждение высшего образования «Национальный исследовательский университет «Высшая школа экономики», vol. 9(4 (eng)), pages 54-72.
    11. Guo Chen & Jing Chen & Yu Shao & Lu Xiao, 2023. "Automatic noise reduction of domain-specific bibliographic datasets using positive-unlabeled learning," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(2), pages 1187-1204, February.
    12. Carla Martínez-Climent & Ana Zorio-Grima & Domingo Ribeiro-Soriano, 2018. "Financial return crowdfunding: literature review and bibliometric analysis," International Entrepreneurship and Management Journal, Springer, vol. 14(3), pages 527-553, September.
    13. Lara-Cabrera, R. & Cotta, C. & Fernández-Leiva, A.J., 2014. "An analysis of the structure and evolution of the scientific collaboration network of computer intelligence in games," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 395(C), pages 523-536.
    14. Yuetong Chen & Hao Wang & Baolong Zhang & Wei Zhang, 2022. "A method of measuring the article discriminative capacity and its distribution," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(6), pages 3317-3341, June.
    15. Jinseok Kim & Liang Tao & Seok-Hyoung Lee & Jana Diesner, 2016. "Evolution and structure of scientific co-publishing network in Korea between 1948–2011," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(1), pages 27-41, April.
    16. Liang, Wenyan & Gu, Jun & Nyland, Chris, 2022. "China's new research evaluation policy: Evidence from economics faculty of Elite Chinese universities," Research Policy, Elsevier, vol. 51(1).
    17. Xiaoyan Wang & Guocai Wang & Yanhui Zhao & Wyatt A. Schrock, 2024. "The Intellectual Structure of Sales Ethics Research: A Multi-method Bibliometric Analysis," Journal of Business Ethics, Springer, vol. 193(1), pages 133-157, August.
    18. Tomaz Bartol & Gordana Budimir & Doris Dekleva-Smrekar & Miro Pusnik & Primoz Juznic, 2014. "Assessment of research fields in Scopus and Web of Science in the view of national research evaluation in Slovenia," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(2), pages 1491-1504, February.
    19. Liliana Arroyo Moliner & Eva Gallardo-Gallardo & Pedro Gallo de Puelles, 2017. "Understanding scientific communities: a social network approach to collaborations in Talent Management research," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(3), pages 1439-1462, December.
    20. Zhang, Ronda J. & Ye, Fred Y., 2020. "Measuring similarity for clarifying layer difference in multiplex ad hoc duplex information networks," Journal of Informetrics, Elsevier, vol. 14(1).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:116:y:2018:i:1:d:10.1007_s11192-018-2743-5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.