IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v116y2018i1d10.1007_s11192-018-2743-5.html
   My bibliography  Save this article

A new network model for extracting text keywords

Author

Listed:
  • Liu Yang

    (Beijing Jiaotong University)

  • Keping Li

    (Beijing Jiaotong University)

  • Hangfei Huang

    (Beijing Jiaotong University)

Abstract

Text keywords are defined as meaningful and important words in a document, which provide a precise overview of its content and reflect the author’s writing intention. Keyword extraction methods have received a lot of attentions, among which is the network-based method. However, existing network-based keyword extraction methods only consider the connections between words in a document, while ignoring the impact of sentences. Since a sentence is made of many words, while words affect one another in a sentence, neglecting the influence of sentences will result in the loss of information. In this paper, we introduce a word network whose nodes represent words in a document, and define that any keyword extraction method based on a word network is called as a Word-net method. Then, we propose a new network model which considers the influence of sentences, and a new word-sentence method based on the new model. Experimental results demonstrate that our method outperforms the Word-net method, the classical term frequency-inverse document frequency (TF-IDF) method, most frequent method and TextRank method. The precision, recall, and F-measure of our result are respectively 7.95, 8.27 and 6.54% higher than the Word-net result, and the average precision of our result is 17.56% higher than the TF-IDF result. A two-way analysis of variance is employed to validate the empirical analysis, which indicates that keyword extraction methods and keyword numbers have statistically significant effects on the evaluation of metric values.

Suggested Citation

  • Liu Yang & Keping Li & Hangfei Huang, 2018. "A new network model for extracting text keywords," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(1), pages 339-361, July.
  • Handle: RePEc:spr:scient:v:116:y:2018:i:1:d:10.1007_s11192-018-2743-5
    DOI: 10.1007/s11192-018-2743-5
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-018-2743-5
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-018-2743-5?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Chen, Guo & Xiao, Lu, 2016. "Selecting publication keywords for domain analysis in bibliometrics: A comparison of three methods," Journal of Informetrics, Elsevier, vol. 10(1), pages 212-223.
    2. Zhong-Yi Wang & Gang Li & Chun-Ya Li & Ang Li, 2012. "Research on the semantic-based co-word analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 90(3), pages 855-875, March.
    3. Perc, Matjaž, 2010. "Growth and structure of Slovenia’s scientific collaboration network," Journal of Informetrics, Elsevier, vol. 4(4), pages 475-482.
    4. Xinning Su & Sanhong Deng & Si Shen, 2014. "The design and application value of the Chinese Social Science Citation Index," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(3), pages 1567-1582, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yang, Jinqing & Bu, Yi & Lu, Wei & Huang, Yong & Hu, Jiming & Huang, Shengzhi & Zhang, Li, 2022. "Identifying keyword sleeping beauties: A perspective on the knowledge diffusion process," Journal of Informetrics, Elsevier, vol. 16(1).
    2. Samuel Zanferdini Oliva & Livia Oliveira-Ciabati & Denise Gazotto Dezembro & Mário Sérgio Adolfi Júnior & Maísa Carvalho Silva & Hugo Cesar Pessotti & Juliana Tarossi Pollettini, 2021. "Text structuring methods based on complex network: a systematic review," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 1471-1493, February.
    3. Liu Yang & Keping Li & Dan Zhao & Shuang Gu & Dongyang Yan, 2019. "A Network Method for Identifying the Root Cause of High-Speed Rail Faults Based on Text Data," Energies, MDPI, vol. 12(10), pages 1-17, May.
    4. YiJun Liu & Li Zhang & Xiaoli Lian, 2020. "A document-structure-based complex network model for extracting text keywords," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(3), pages 1765-1791, September.
    5. Tingting Zhang & Baozhen Lee & Qinghua Zhu & Xi Han & Ke Chen, 2023. "Document keyword extraction based on semantic hierarchical graph model," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(5), pages 2623-2647, May.
    6. Chengzhi Zhang & Lei Zhao & Mengyuan Zhao & Yingyi Zhang, 2022. "Enhancing keyphrase extraction from academic articles with their reference information," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(2), pages 703-731, February.
    7. Xicheng Yin & Hongwei Wang & Pei Yin & Hengmin Zhu & Zhenyu Zhang, 2020. "A co-occurrence based approach of automatic keyword expansion using mass diffusion," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(3), pages 1885-1905, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kai Hu & Huayi Wu & Kunlun Qi & Jingmin Yu & Siluo Yang & Tianxing Yu & Jie Zheng & Bo Liu, 2018. "A domain keyword analysis approach extending Term Frequency-Keyword Active Index with Google Word2Vec model," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(3), pages 1031-1068, March.
    2. Chen, Guo & Xiao, Lu, 2016. "Selecting publication keywords for domain analysis in bibliometrics: A comparison of three methods," Journal of Informetrics, Elsevier, vol. 10(1), pages 212-223.
    3. Qikai Cheng & Jiamin Wang & Wei Lu & Yong Huang & Yi Bu, 2020. "Keyword-citation-keyword network: a new perspective of discipline knowledge structure analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(3), pages 1923-1943, September.
    4. Marian-Gabriel Hâncean & Matjaž Perc & Lazăr Vlăsceanu, 2014. "Fragmented Romanian Sociology: Growth and Structure of the Collaboration Network," PLOS ONE, Public Library of Science, vol. 9(11), pages 1-9, November.
    5. Marian-Gabriel Hâncean & Matjaž Perc & Jürgen Lerner, 2021. "The coauthorship networks of the most productive European researchers," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(1), pages 201-224, January.
    6. Andrej Kastrin & Jelena Klisara & Borut Lužar & Janez Povh, 2017. "Analysis of Slovenian research community through bibliographic networks," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(2), pages 791-813, February.
    7. Peng Liu & Haoxiang Xia, 2015. "Structure and evolution of co-authorship network in an interdisciplinary research field," Scientometrics, Springer;Akadémiai Kiadó, vol. 103(1), pages 101-134, April.
    8. Mikel Alayo & Txomin Iturralde & Amaia Maseda & Gloria Aparicio, 2021. "Mapping family firm internationalization research: bibliometric and literature review," Review of Managerial Science, Springer, vol. 15(6), pages 1517-1560, August.
    9. Kim, Jinseok & Diesner, Jana, 2015. "The effect of data pre-processing on understanding the evolution of collaboration networks," Journal of Informetrics, Elsevier, vol. 9(1), pages 226-236.
    10. Yunwei Chen & Katy Börner & Shu Fang, 2013. "Evolving collaboration networks in Scientometrics in 1978–2010: a micro–macro analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 95(3), pages 1051-1070, June.
    11. Yang, Siluo & Han, Ruizhen & Wolfram, Dietmar & Zhao, Yuehua, 2016. "Visualizing the intellectual structure of information science (2006–2015): Introducing author keyword coupling analysis," Journal of Informetrics, Elsevier, vol. 10(1), pages 132-150.
    12. Weimao Ke, 2013. "A fitness model for scholarly impact analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 94(3), pages 981-998, March.
    13. Xiang Zhu & Yunqiu Zhang, 2020. "Co-word analysis method based on meta-path of subject knowledge network," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(2), pages 753-766, May.
    14. Jinkai Yu & Wenjing Bi, 2019. "Evolution of Marine Environmental Governance Policy in China," Sustainability, MDPI, vol. 11(18), pages 1-14, September.
    15. Cherry C. I. Lau & Christina W. Y. Wong, 2024. "Achieving sustainable development with sustainable packaging: A natural‐resource‐based view perspective," Business Strategy and the Environment, Wiley Blackwell, vol. 33(5), pages 4766-4787, July.
    16. Li, Qing & Zhang, Huaige & Hong, Xianpei, 2020. "Knowledge structure of technology licensing based on co-keywords network: A review and future directions," International Review of Economics & Finance, Elsevier, vol. 66(C), pages 154-165.
    17. Maxim Kotsemir & Tatiana Kuznetsova & Elena Nasybulina & Anna Pikalova, 2015. "Identifying Directions for Russia’s Science and Technology Cooperation," Foresight-Russia Форсайт, CyberLeninka;Федеральное государственное автономное образовательное учреждение высшего образования «Национальный исследовательский университет «Высшая школа экономики», vol. 9(4 (eng)), pages 54-72.
    18. Guo Chen & Jing Chen & Yu Shao & Lu Xiao, 2023. "Automatic noise reduction of domain-specific bibliographic datasets using positive-unlabeled learning," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(2), pages 1187-1204, February.
    19. Olga Moskaleva & Vladimir Pislyakov & Ivan Sterligov & Mark Akoev & Svetlana Shabanova, 2018. "Russian Index of Science Citation: Overview and review," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(1), pages 449-462, July.
    20. Behrouzi, Saman & Shafaeipour Sarmoor, Zahra & Hajsadeghi, Khosrow & Kavousi, Kaveh, 2020. "Predicting scientific research trends based on link prediction in keyword networks," Journal of Informetrics, Elsevier, vol. 14(4).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:116:y:2018:i:1:d:10.1007_s11192-018-2743-5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.