IDEAS home Printed from https://ideas.repec.org/a/eee/infome/v14y2020i1s1751157719301099.html
   My bibliography  Save this article

Identification of highly-cited papers using topic-model-based and bibliometric features: the consideration of keyword popularity

Author

Listed:
  • Hu, Ya-Han
  • Tai, Chun-Tien
  • Liu, Kang Ernest
  • Cai, Cheng-Fang

Abstract

The number of received citations have been used as an indicator of the impact of academic publications. Developing tools to find papers that have the potential to become highly-cited has recently attracted increasing scientific attention. Topics of concern by scholars may change over time in accordance with research trends, resulting in changes in received citations. Author-defined keywords, title and abstract provide valuable information about a research article. This study performs a latent Dirichlet allocation technique to extract topics and keywords from articles; five keyword popularity (KP) features are defined as indicators of emerging trends of articles. Binary classification models are utilized to predict papers that were highly-cited or less highly-cited by a number of supervised learning techniques. We empirically compare KP features of articles with other commonly used journal-related and author-related features proposed in previous studies. The results show that, with KP features, the prediction models are more effective than those with journal and/or author features, especially in the management information system discipline.

Suggested Citation

  • Hu, Ya-Han & Tai, Chun-Tien & Liu, Kang Ernest & Cai, Cheng-Fang, 2020. "Identification of highly-cited papers using topic-model-based and bibliometric features: the consideration of keyword popularity," Journal of Informetrics, Elsevier, vol. 14(1).
  • Handle: RePEc:eee:infome:v:14:y:2020:i:1:s1751157719301099
    DOI: 10.1016/j.joi.2019.101004
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1751157719301099
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.joi.2019.101004?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Bornmann, Lutz & Leydesdorff, Loet & Wang, Jian, 2014. "How to improve the prediction based on citation impact percentiles for years shortly after the publication date?," Journal of Informetrics, Elsevier, vol. 8(1), pages 175-180.
    2. Stegehuis, Clara & Litvak, Nelly & Waltman, Ludo, 2015. "Predicting the long-term citation impact of recent publications," Journal of Informetrics, Elsevier, vol. 9(3), pages 642-657.
    3. Babak Sohrabi & Hamideh Iraj, 2017. "The effect of keyword repetition in abstract and keyword frequency per journal in predicting citation counts," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(1), pages 243-251, January.
    4. Zhang, Xian & Wang, Xingwei & Chen, Jiajun & Xie, Xi & Wang, Ke & Wei, Yiming, 2014. "A novel modeling based real option approach for CCS investment evaluation under multiple uncertainties," Applied Energy, Elsevier, vol. 113(C), pages 1059-1067.
    5. Xiaodan Zhou & Guohui Zhao, 2015. "Global liposome research in the period of 1995–2014: a bibliometric analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(1), pages 231-248, October.
    6. Zhang, Yi & Lu, Jie & Liu, Feng & Liu, Qian & Porter, Alan & Chen, Hongshu & Zhang, Guangquan, 2018. "Does deep learning help topic extraction? A kernel k-means clustering method with word embedding," Journal of Informetrics, Elsevier, vol. 12(4), pages 1099-1117.
    7. Fabrizio Natale & Gianluca Fiore & Johann Hofherr, 2012. "Mapping the research on aquaculture. A bibliometric analysis of aquaculture literature," Scientometrics, Springer;Akadémiai Kiadó, vol. 90(3), pages 983-999, March.
    8. Zhang, Yin & Zhang, Bin & Gao, Kening & Guo, Pengwei & Sun, Daming, 2012. "Combining content and relation analysis for recommendation in social tagging systems," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 391(22), pages 5759-5768.
    9. Wang, Mingyang & Yu, Guang & Xu, Jianzhong & He, Huixin & Yu, Daren & An, Shuang, 2012. "Development a case-based classifier for predicting highly cited papers," Journal of Informetrics, Elsevier, vol. 6(4), pages 586-599.
    10. Vasilios D. Kosteas, 2018. "Predicting long-run citation counts for articles in top economics journals," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(3), pages 1395-1412, June.
    11. Abrishami, Ali & Aliakbary, Sadegh, 2019. "Predicting citation counts based on deep neural network learning techniques," Journal of Informetrics, Elsevier, vol. 13(2), pages 485-499.
    12. Ling-Li Li & Guohua Ding & Nan Feng & Ming-Huang Wang & Yuh-Shan Ho, 2009. "Global stem cell research trend: Bibliometric analysis as a tool for mapping of trends from 1991 to 2006," Scientometrics, Springer;Akadémiai Kiadó, vol. 80(1), pages 39-58, July.
    13. Daniel E. Acuna & Stefano Allesina & Konrad P. Kording, 2012. "Predicting scientific success," Nature, Nature, vol. 489(7415), pages 201-202, September.
    14. Uddin, Shahadat & Khan, Arif, 2016. "The impact of author-selected keywords on citation counts," Journal of Informetrics, Elsevier, vol. 10(4), pages 1166-1177.
    15. Loet Leydesdorff & Lutz Bornmann, 2011. "How fractional counting of citations affects the impact factor: Normalization in terms of differences in citation potentials among fields of science," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 62(2), pages 217-229, February.
    16. Jiaying Liu & Jiahao Tian & Xiangjie Kong & Ivan Lee & Feng Xia, 2019. "Two decades of information systems: a bibliometric review," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(2), pages 617-643, February.
    17. Tsai, Chih-Fong, 2014. "Citation impact analysis of top ranked computer science journals and their rankings," Journal of Informetrics, Elsevier, vol. 8(2), pages 318-328.
    18. Finardi, Ugo, 2014. "On the time evolution of received citations, in different scientific fields: An empirical study," Journal of Informetrics, Elsevier, vol. 8(1), pages 13-24.
    19. Bai, Xiaomei & Zhang, Fuli & Lee, Ivan, 2019. "Predicting the citations of scholarly paper," Journal of Informetrics, Elsevier, vol. 13(1), pages 407-418.
    20. Yu-Wei Chang & Mu-Hsuan Huang & Chiao-Wen Lin, 2015. "Evolution of research subjects in library and information science based on keyword, bibliographical coupling, and co-citation analyses," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 2071-2087, December.
    21. Tian, Yangge & Wen, Cheng & Hong, Song, 2008. "Global scientific production on GIS research by bibliometric analysis from 1997 to 2006," Journal of Informetrics, Elsevier, vol. 2(1), pages 65-74.
    22. Dorta-González, Pablo & Dorta-González, María Isabel & Santos-Peñate, Dolores Rosa & Suárez-Vega, Rafael, 2014. "Journal topic citation potential and between-field comparisons: The topic normalized impact factor," Journal of Informetrics, Elsevier, vol. 8(2), pages 406-418.
    23. Waleed Iqbal & Junaid Qadir & Gareth Tyson & Adnan Noor Mian & Saeed-ul Hassan & Jon Crowcroft, 2019. "A bibliometric analysis of publications in computer networking research," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(2), pages 1121-1155, May.
    24. Jian Wang, 2013. "Citation time window choice for research impact evaluation," Scientometrics, Springer;Akadémiai Kiadó, vol. 94(3), pages 851-872, March.
    25. Loet Leydesdorff & Jung C. Shin, 2011. "How to evaluate universities in terms of their relative citation impacts: Fractional counting of citations and the normalization of differences among disciplines," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 62(6), pages 1146-1155, June.
    26. Lawrence D. Fu & Constantin F. Aliferis, 2010. "Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature," Scientometrics, Springer;Akadémiai Kiadó, vol. 85(1), pages 257-270, October.
    27. Mingyang Wang & Zhenyu Wang & Guangsheng Chen, 2019. "Which can better predict the future success of articles? Bibliometric indices or alternative metrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(3), pages 1575-1595, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Wanjun Xia & Tianrui Li & Chongshou Li, 2023. "A review of scientific impact prediction: tasks, features and methods," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(1), pages 543-585, January.
    2. Qianqian Jin & Hongshu Chen & Ximeng Wang & Tingting Ma & Fei Xiong, 2022. "Exploring funding patterns with word embedding-enhanced organization–topic networks: a case study on big data," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(9), pages 5415-5440, September.
    3. Santosh Kumar Srivastava & Surajit Bag, 2023. "Recent Developments on Flexible Manufacturing in the Digital Era: A Review and Future Research Directions," Global Journal of Flexible Systems Management, Springer;Global Institute of Flexible Systems Management, vol. 24(4), pages 483-516, December.
    4. Haochuan Cui & Tiewei Li & Cheng-Jun Wang, 2023. "Climbing up the ladder of abstraction: how to span the boundaries of knowledge space in the online knowledge market?," Palgrave Communications, Palgrave Macmillan, vol. 10(1), pages 1-12, December.
    5. Wumei Du & Zheng Xie & Yiqin Lv, 2021. "Predicting publication productivity for authors: Shallow or deep architecture?," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5855-5879, July.
    6. Chowdhury, K.P., 2021. "Functional analysis of generalized linear models under non-linear constraints with applications to identifying highly-cited papers," Journal of Informetrics, Elsevier, vol. 15(1).
    7. Xu, Ran & Baghaei Lakeh, Arash & Ghaffarzadegan, Navid, 2021. "Examining the characteristics of impactful research topics: A case of three decades of HIV-AIDS research," Journal of Informetrics, Elsevier, vol. 15(1).
    8. Anqi Ma & Yu Liu & Xiujuan Xu & Tao Dong, 2021. "A deep-learning based citation count prediction model with paper metadata semantic features," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6803-6823, August.
    9. Sepideh Fahimifar & Khadijeh Mousavi & Fatemeh Mozaffari & Marcel Ausloos, 2023. "Identification of the most important external features of highly cited scholarly papers through 3 (i.e., Ridge, Lasso, and Boruta) feature selection data mining methods," Quality & Quantity: International Journal of Methodology, Springer, vol. 57(4), pages 3685-3712, August.
    10. Basma Albanna & Julia Handl & Richard Heeks, 2021. "Publication outperformance among global South researchers: An analysis of individual-level and publication-level predictors of positive deviance," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(10), pages 8375-8431, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wanjun Xia & Tianrui Li & Chongshou Li, 2023. "A review of scientific impact prediction: tasks, features and methods," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(1), pages 543-585, January.
    2. Chowdhury, K.P., 2021. "Functional analysis of generalized linear models under non-linear constraints with applications to identifying highly-cited papers," Journal of Informetrics, Elsevier, vol. 15(1).
    3. Anqi Ma & Yu Liu & Xiujuan Xu & Tao Dong, 2021. "A deep-learning based citation count prediction model with paper metadata semantic features," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6803-6823, August.
    4. Wang, Xing & Zhang, Zhihui, 2020. "Improving the reliability of short-term citation impact indicators by taking into account the correlation between short- and long-term citation impact," Journal of Informetrics, Elsevier, vol. 14(2).
    5. Martorell Cunil, Onofre & Otero González, Luis & Durán Santomil, Pablo & Mulet Forteza, Carlos, 2023. "How to accomplish a highly cited paper in the tourism, leisure and hospitality field," Journal of Business Research, Elsevier, vol. 157(C).
    6. Li, Xin & Ma, Xiaodi & Feng, Ye, 2024. "Early identification of breakthrough research from sleeping beauties using machine learning," Journal of Informetrics, Elsevier, vol. 18(2).
    7. Wumei Du & Zheng Xie & Yiqin Lv, 2021. "Predicting publication productivity for authors: Shallow or deep architecture?," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5855-5879, July.
    8. Zhao, Qihang & Feng, Xiaodong, 2022. "Utilizing citation network structure to predict paper citation counts: A Deep learning approach," Journal of Informetrics, Elsevier, vol. 16(1).
    9. Stegehuis, Clara & Litvak, Nelly & Waltman, Ludo, 2015. "Predicting the long-term citation impact of recent publications," Journal of Informetrics, Elsevier, vol. 9(3), pages 642-657.
    10. Shengzhi Huang & Jiajia Qian & Yong Huang & Wei Lu & Yi Bu & Jinqing Yang & Qikai Cheng, 2022. "Disclosing the relationship between citation structure and future impact of a publication," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 73(7), pages 1025-1042, July.
    11. Xie, Zheng, 2020. "Predicting publication productivity for researchers: A piecewise Poisson model," Journal of Informetrics, Elsevier, vol. 14(3).
    12. Ruan, Xuanmin & Zhu, Yuanyang & Li, Jiang & Cheng, Ying, 2020. "Predicting the citation counts of individual papers via a BP neural network," Journal of Informetrics, Elsevier, vol. 14(3).
    13. Waltman, Ludo, 2016. "A review of the literature on citation impact indicators," Journal of Informetrics, Elsevier, vol. 10(2), pages 365-391.
    14. Zhang, Xinyuan & Xie, Qing & Song, Min, 2021. "Measuring the impact of novelty, bibliometric, and academic-network factors on citation count using a neural network," Journal of Informetrics, Elsevier, vol. 15(2).
    15. Akella, Akhil Pandey & Alhoori, Hamed & Kondamudi, Pavan Ravikanth & Freeman, Cole & Zhou, Haiming, 2021. "Early indicators of scientific impact: Predicting citations with altmetrics," Journal of Informetrics, Elsevier, vol. 15(2).
    16. Cao, Xuanyu & Chen, Yan & Ray Liu, K.J., 2016. "A data analytic approach to quantifying scientific impact," Journal of Informetrics, Elsevier, vol. 10(2), pages 471-484.
    17. Cristina López-Duarte & Marta M. Vidal-Suárez & Belén González-Díaz, 2019. "Cross-national distance and international business: an analysis of the most influential recent models," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(1), pages 173-208, October.
    18. Gerson Pech & Catarina Delgado, 2020. "Percentile and stochastic-based approach to the comparison of the number of citations of articles indexed in different bibliographic databases," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(1), pages 223-252, April.
    19. Lu, Wei & Liu, Zhifeng & Huang, Yong & Bu, Yi & Li, Xin & Cheng, Qikai, 2020. "How do authors select keywords? A preliminary study of author keyword selection behavior," Journal of Informetrics, Elsevier, vol. 14(4).
    20. Andrea Fronzetti Colladon & Ciriaco Andrea D’Angelo & Peter A. Gloor, 2020. "Predicting the future success of scientific publications through social network and semantic analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(1), pages 357-377, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:infome:v:14:y:2020:i:1:s1751157719301099. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/joi .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.