IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v127y2022i2d10.1007_s11192-021-04230-4.html
   My bibliography  Save this article

Enhancing keyphrase extraction from academic articles with their reference information

Author

Listed:
  • Chengzhi Zhang

    (Nanjing University of Science and Technology)

  • Lei Zhao

    (Nanjing University of Science and Technology)

  • Mengyuan Zhao

    (Nanjing University of Science and Technology)

  • Yingyi Zhang

    (Nanjing University of Science and Technology)

Abstract

With the development of Internet technology, the phenomenon of information overload is becoming more and more obvious. It takes a lot of time for users to obtain the information they need. However, keyphrases that summarize document information highly are helpful for users to quickly obtain and understand documents. For academic resources, most existing studies extract keyphrases through the title and abstract of papers. We find that title information in references also contains author-assigned keyphrases. Therefore, this article uses reference information and applies two typical methods of unsupervised extraction methods (TF*IDF and TextRank), two representative traditional supervised learning algorithms (Naïve Bayes and Conditional Random Field) and a supervised deep learning model (BiLSTM-CRF), to analyze the specific performance of reference information on keyphrase extraction. It is expected to improve the quality of keyphrase recognition from the perspective of expanding the source text. The experimental results show that reference information can increase precision, recall, and F1 of automatic keyphrase extraction to a certain extent. This indicates the usefulness of reference information on keyphrase extraction of academic papers and provides a new idea for the research on automatic keyphrase extraction.

Suggested Citation

  • Chengzhi Zhang & Lei Zhao & Mengyuan Zhao & Yingyi Zhang, 2022. "Enhancing keyphrase extraction from academic articles with their reference information," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(2), pages 703-731, February.
  • Handle: RePEc:spr:scient:v:127:y:2022:i:2:d:10.1007_s11192-021-04230-4
    DOI: 10.1007/s11192-021-04230-4
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-021-04230-4
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-021-04230-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Chen, Guo & Xiao, Lu, 2016. "Selecting publication keywords for domain analysis in bibliometrics: A comparison of three methods," Journal of Informetrics, Elsevier, vol. 10(1), pages 212-223.
    2. Liu Yang & Keping Li & Hangfei Huang, 2018. "A new network model for extracting text keywords," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(1), pages 339-361, July.
    3. Shimelis G. Assefa & Abebe Rorissa, 2013. "A bibliometric mapping of the structure of STEM education using co‐word analysis," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 64(12), pages 2513-2536, December.
    4. Shimelis G. Assefa & Abebe Rorissa, 2013. "A bibliometric mapping of the structure of STEM education using co-word analysis," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 64(12), pages 2513-2536, December.
    5. Scott Deerwester & Susan T. Dumais & George W. Furnas & Thomas K. Landauer & Richard Harshman, 1990. "Indexing by latent semantic analysis," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 41(6), pages 391-407, September.
    6. Yingyi Zhang & Chengzhi Zhang, 2021. "Enhancing keyphrase extraction from microblogs using human reading time," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 72(5), pages 611-626, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Mohammed Azmi Al-Betar & Ammar Kamal Abasi & Ghazi Al-Naymat & Kamran Arshad & Sharif Naser Makhadmeh, 2023. "Optimization of scientific publications clustering with ensemble approach for topic extraction," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(5), pages 2819-2877, May.
    2. Jinqing Yang & Zhifeng Liu & Xiufeng Cheng & Guanghui Ye, 2024. "Understanding the keyword adoption behavior patterns of researchers from a functional structure perspective," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(6), pages 3359-3384, June.
    3. Ebadi, Ashkan & Auger, Alain & Gauthier, Yvan, 2022. "Detecting emerging technologies and their evolution using deep learning and weak signal analysis," Journal of Informetrics, Elsevier, vol. 16(4).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jamali, Seyedh Mahboobeh & Nader, Ale Ebrahim & Jamali, Fatemeh, 2021. "The Role of STEM Education in Improving the Quality of Education: A Bibliometric Study," MPRA Paper 114214, University Library of Munich, Germany, revised 02 May 2022.
    2. Guan, Jiancheng & Yan, Yan & Zhang, Jing Jing, 2017. "The impact of collaboration and knowledge networks on citations," Journal of Informetrics, Elsevier, vol. 11(2), pages 407-422.
    3. Alan L. Porter & David J. Schoeneck & Jan Youtie & Gregg E. A. Solomon & Seokbeom Kwon & Stephen F. Carley, 2019. "Learning about learning: patterns of sharing of research knowledge among Education, Border, and Cognitive Science fields," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(3), pages 1093-1117, March.
    4. Guo Chen & Lu Xiao & Chang-ping Hu & Xue-qin Zhao, 2015. "Identifying the research focus of Library and Information Science institutions in China with institution-specific keywords," Scientometrics, Springer;Akadémiai Kiadó, vol. 103(2), pages 707-724, May.
    5. Ping Liu & Qiong Wu & Xiangming Mu & Kaipeng Yu & Yiting Guo, 2015. "Detecting the intellectual structure of library and information science based on formal concept analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 104(3), pages 737-762, September.
    6. Marie Katsurai & Shunsuke Ono, 2019. "TrendNets: mapping emerging research trends from dynamic co-word networks via sparse representation," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(3), pages 1583-1598, December.
    7. Víctor Meseguer-Sánchez & Emilio Abad-Segura & Luis Jesús Belmonte-Ureña & Valentín Molina-Moreno, 2020. "Examining the Research Evolution on the Socio-Economic and Environmental Dimensions on University Social Responsibility," IJERPH, MDPI, vol. 17(13), pages 1-30, July.
    8. Vibhav Singh & Surabhi Verma & Sushil S. Chaurasia, 2020. "Mapping the themes and intellectual structure of corporate university: co-citation and cluster analyses," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(3), pages 1275-1302, March.
    9. Chen, Guo & Xiao, Lu, 2016. "Selecting publication keywords for domain analysis in bibliometrics: A comparison of three methods," Journal of Informetrics, Elsevier, vol. 10(1), pages 212-223.
    10. Hae Ok Choi, 2020. "An Evolutionary Approach to Technology Innovation of Cadastre for Smart Land Management Policy," Land, MDPI, vol. 9(2), pages 1-19, February.
    11. Liang Zhuang & Chao Ye & Scott N. Lieske, 2020. "Intertwining globality and locality: bibliometric analysis based on the top geography annual conferences in America and China," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(2), pages 1075-1096, February.
    12. Curci, Ylenia & Mongeau Ospina, Christian A., 2016. "Investigating biofuels through network analysis," Energy Policy, Elsevier, vol. 97(C), pages 60-72.
    13. Chao Wei & Senlin Luo & Xincheng Ma & Hao Ren & Ji Zhang & Limin Pan, 2016. "Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation," PLOS ONE, Public Library of Science, vol. 11(1), pages 1-20, January.
    14. Kai Hu & Huayi Wu & Kunlun Qi & Jingmin Yu & Siluo Yang & Tianxing Yu & Jie Zheng & Bo Liu, 2018. "A domain keyword analysis approach extending Term Frequency-Keyword Active Index with Google Word2Vec model," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(3), pages 1031-1068, March.
    15. Maksym Polyakov & Morteza Chalak & Md. Sayed Iftekhar & Ram Pandit & Sorada Tapsuwan & Fan Zhang & Chunbo Ma, 2018. "Authorship, Collaboration, Topics, and Research Gaps in Environmental and Resource Economics 1991–2015," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 71(1), pages 217-239, September.
    16. Ding, Ying, 2011. "Community detection: Topological vs. topical," Journal of Informetrics, Elsevier, vol. 5(4), pages 498-514.
    17. Juan Shi & Kin Keung Lai & Ping Hu & Gang Chen, 2018. "Factors dominating individual information disseminating behavior on social networking sites," Information Technology and Management, Springer, vol. 19(2), pages 121-139, June.
    18. Ganesh Dash & Chetan Sharma & Shamneesh Sharma, 2023. "Sustainable Marketing and the Role of Social Media: An Experimental Study Using Natural Language Processing (NLP)," Sustainability, MDPI, vol. 15(6), pages 1-16, March.
    19. Paola Cerchiello & Giancarlo Nicola, 2018. "Assessing News Contagion in Finance," Econometrics, MDPI, vol. 6(1), pages 1-19, February.
    20. Shr-Wei Kao & Pin Luarn, 2020. "Topic Modeling Analysis of Social Enterprises: Twitter Evidence," Sustainability, MDPI, vol. 12(8), pages 1-20, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:127:y:2022:i:2:d:10.1007_s11192-021-04230-4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.