IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v128y2023i5d10.1007_s11192-023-04677-7.html
   My bibliography  Save this article

Document keyword extraction based on semantic hierarchical graph model

Author

Listed:
  • Tingting Zhang

    (Nanjing Audit University)

  • Baozhen Lee

    (Nanjing Audit University)

  • Qinghua Zhu

    (Nanjing University)

  • Xi Han

    (Guangdong University of Finance and Economics)

  • Ke Chen

    (Nanjing Audit University)

Abstract

Keyword provide a brief profile of document contents and serve as an important method for quickly obtaining the document’s themes. Traditional keyword extraction methods are mostly based on statistical relationships between words, with no deeper understanding of the words’ structures. In addition, most studies to date performing keyword extraction are based on ranking-related measure values, without considering the cohesion of the extracted keyword set. In this paper, a keyword extraction method based on a semantic hierarchical graph model is proposed. First, the semantic graph for the document is constructed based on the hierarchical extraction of feature terms. Then, the keyword collection of the document is chosen from the constructed semantic graph. The keyword extraction method in this paper fully accounts for both the context of the keywords and the internal structure by which they are related. By mining the deep hidden structure of feature terms, the proposed method can effectively reveal the hierarchical association between terms within the semantic graph and obtain a keyword collection result with high probability. Moreover, several experiments conducted on released datasets show that our method outperforms the existing methods in terms of precision, recall, and F-measure.

Suggested Citation

  • Tingting Zhang & Baozhen Lee & Qinghua Zhu & Xi Han & Ke Chen, 2023. "Document keyword extraction based on semantic hierarchical graph model," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(5), pages 2623-2647, May.
  • Handle: RePEc:spr:scient:v:128:y:2023:i:5:d:10.1007_s11192-023-04677-7
    DOI: 10.1007/s11192-023-04677-7
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-023-04677-7
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-023-04677-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Hongbin Wang & Jingzhen Ye & Zhengtao Yu & Jian Wang & Cunli Mao, 2020. "Unsupervised Keyword Extraction Methods Based on a Word Graph Network," International Journal of Ambient Computing and Intelligence (IJACI), IGI Global, vol. 11(2), pages 68-79, April.
    2. Zara Nasar & Syed Waqar Jaffry & Muhammad Kamran Malik, 2018. "Information extraction from scientific articles: a survey," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(3), pages 1931-1990, December.
    3. Liu Yang & Keping Li & Hangfei Huang, 2018. "A new network model for extracting text keywords," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(1), pages 339-361, July.
    4. Garg, Muskan & Kumar, Mukesh, 2018. "The structure of word co-occurrence network for microblogs," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 512(C), pages 698-720.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xiaofang Wo & Guichen Li & Yuantian Sun & Jinghua Li & Sen Yang & Haoran Hao, 2022. "The Changing Tendency and Association Analysis of Intelligent Coal Mines in China: A Policy Text Mining Study," Sustainability, MDPI, vol. 14(18), pages 1-14, September.
    2. Yikun Su & Hong Xue & Huakang Liang, 2019. "An Evaluation Model for Urban Comprehensive Carrying Capacity: An Empirical Case from Harbin City," IJERPH, MDPI, vol. 16(3), pages 1-25, January.
    3. Xicheng Yin & Hongwei Wang & Pei Yin & Hengmin Zhu & Zhenyu Zhang, 2020. "A co-occurrence based approach of automatic keyword expansion using mass diffusion," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(3), pages 1885-1905, September.
    4. Yang, Jinqing & Bu, Yi & Lu, Wei & Huang, Yong & Hu, Jiming & Huang, Shengzhi & Zhang, Li, 2022. "Identifying keyword sleeping beauties: A perspective on the knowledge diffusion process," Journal of Informetrics, Elsevier, vol. 16(1).
    5. Liu Yang & Keping Li & Dan Zhao & Shuang Gu & Dongyang Yan, 2019. "A Network Method for Identifying the Root Cause of High-Speed Rail Faults Based on Text Data," Energies, MDPI, vol. 12(10), pages 1-17, May.
    6. Chengzhi Zhang & Lei Zhao & Mengyuan Zhao & Yingyi Zhang, 2022. "Enhancing keyphrase extraction from academic articles with their reference information," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(2), pages 703-731, February.
    7. Pengcheng Li & Wei Lu & Qikai Cheng, 2022. "Generating a related work section for scientific papers: an optimized approach with adopting problem and method information," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(8), pages 4397-4417, August.
    8. Yuting Hu & Xinyu Ma, 2023. "Research on the Structure and Characteristics of Adolescent Physical Health Policy in China Based on Policy Text Tool," Sustainability, MDPI, vol. 15(11), pages 1-17, May.
    9. Esra Gündoğan & Mehmet Kaya, 2022. "A novel hybrid paper recommendation system using deep learning," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(7), pages 3837-3855, July.
    10. Quispe, Laura V.C. & Tohalino, Jorge A.V. & Amancio, Diego R., 2021. "Using virtual edges to improve the discriminability of co-occurrence text networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 562(C).
    11. Wenhan Chao & Mengyuan Chen & Xian Zhou & Zhunchen Luo, 2023. "A joint framework for identifying the type and arguments of scientific contribution," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(6), pages 3347-3376, June.
    12. Xiaojie Yao & Yuan Hu & Huaping Gong & Dongyou Chen, 2021. "Characteristics and Evolution of China’s Industry–University–Research Collaboration to Promote the Sustainable Development: Based on Policy Text Analysis," Sustainability, MDPI, vol. 13(23), pages 1-18, November.
    13. YiJun Liu & Li Zhang & Xiaoli Lian, 2020. "A document-structure-based complex network model for extracting text keywords," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(3), pages 1765-1791, September.
    14. Tian-Yuan Huang & Liangping Ding & Yong-Qiang Yu & Lei Huang & Liying Yang, 2023. "From AR5 to AR6: exploring research advancement in climate change based on scientific evidence from IPCC WGI reports," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(9), pages 5227-5245, September.
    15. Samuel Zanferdini Oliva & Livia Oliveira-Ciabati & Denise Gazotto Dezembro & Mário Sérgio Adolfi Júnior & Maísa Carvalho Silva & Hugo Cesar Pessotti & Juliana Tarossi Pollettini, 2021. "Text structuring methods based on complex network: a systematic review," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 1471-1493, February.
    16. Tarek Saier & Michael Färber, 2020. "unarXive: a large scholarly data set with publications’ full-text, annotated in-text citations, and links to metadata," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 3085-3108, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:128:y:2023:i:5:d:10.1007_s11192-023-04677-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.