IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v127y2022i7d10.1007_s11192-022-04429-z.html
   My bibliography  Save this article

Analysing academic paper ranking algorithms using test data and benchmarks: an investigation

Author

Listed:
  • Yu Zhang

    (UNSW Canberra)

  • Min Wang

    (UNSW Canberra)

  • Morteza Saberi

    (University of Technology Sydney)

  • Elizabeth Chang

    (UNSW Canberra)

Abstract

Research on academic paper ranking has received great attention in recent years, and many algorithms have been proposed to automatically assess a large number of papers for this purpose. How to evaluate or analyse the performance of these ranking algorithms becomes an open research question. Theoretically, evaluation of an algorithm requires to compare its ranking result against a ground truth paper list. However, such ground truth does not exist in the field of scholarly ranking due to the fact that there does not and will not exist an absolutely unbiased, objective, and unified standard to formulate the impact of papers. Therefore, in practice researchers evaluate or analyse their proposed ranking algorithms by different methods, such as using domain expert decisions (test data) and comparing against predefined ranking benchmarks. The question is whether using different methods leads to different analysis results, and if so, how should we analyse the performance of the ranking algorithms? To answer these questions, this study compares among test data and different citation-based benchmarks by examining their relationships and assessing the effect of the method choices on their analysis results. The results of our experiments show that there does exist difference in analysis results when employing test data and different benchmarks, and relying exclusively on one benchmark or test data may bring inadequate analysis results. In addition, a guideline on how to conduct a comprehensive analysis using multiple benchmarks from different perspectives is summarised, which can help provide a systematic understanding and profile of the analysed algorithms.

Suggested Citation

  • Yu Zhang & Min Wang & Morteza Saberi & Elizabeth Chang, 2022. "Analysing academic paper ranking algorithms using test data and benchmarks: an investigation," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(7), pages 4045-4074, July.
  • Handle: RePEc:spr:scient:v:127:y:2022:i:7:d:10.1007_s11192-022-04429-z
    DOI: 10.1007/s11192-022-04429-z
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-022-04429-z
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-022-04429-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Mariani, Manuel Sebastian & Medo, Matúš & Zhang, Yi-Cheng, 2016. "Identification of milestone papers through time-balanced network centrality," Journal of Informetrics, Elsevier, vol. 10(4), pages 1207-1223.
    2. Hu, Xiaojun & Rousseau, Ronald, 2016. "Scientific influence is not always visible: The phenomenon of under-cited influential publications," Journal of Informetrics, Elsevier, vol. 10(4), pages 1079-1091.
    3. Xu, Han & Martin, Eric & Mahidadia, Ashesh, 2014. "Contents and time sensitive document ranking of scientific literature," Journal of Informetrics, Elsevier, vol. 8(3), pages 546-561.
    4. Ahlgren, Per & Waltman, Ludo, 2014. "The correlation between citation-based and expert-based assessments of publication channels: SNIP and SJR vs. Norwegian quality assessments," Journal of Informetrics, Elsevier, vol. 8(4), pages 985-996.
    5. Jevin West & Theodore Bergstrom & Carl T. Bergstrom, 2010. "Big Macs and Eigenfactor scores: Don't let correlation coefficients fool you," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 61(9), pages 1800-1807, September.
    6. Erjia Yan & Ying Ding, 2010. "Weighted citation: An indicator of an article's prestige," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 61(8), pages 1635-1643, August.
    7. Lutz Bornmann & Rüdiger Mutz, 2015. "Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 66(11), pages 2215-2222, November.
    8. Dunaiski, Marcel & Geldenhuys, Jaco & Visser, Willem, 2018. "How to evaluate rankings of academic entities using test data," Journal of Informetrics, Elsevier, vol. 12(3), pages 631-655.
    9. Chen, P. & Xie, H. & Maslov, S. & Redner, S., 2007. "Finding scientific gems with Google’s PageRank algorithm," Journal of Informetrics, Elsevier, vol. 1(1), pages 8-15.
    10. Stephen M. Lawani & Alan E. Bayer, 1983. "Validity of citation criteria for assessing the influence of scientific publications: New evidence with peer assessment," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 34(1), pages 59-66, January.
    11. Erjia Yan & Ying Ding & Cassidy R. Sugimoto, 2011. "P‐Rank: An indicator measuring prestige in heterogeneous scholarly networks," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 62(3), pages 467-477, March.
    12. Waltman, Ludo, 2016. "A review of the literature on citation impact indicators," Journal of Informetrics, Elsevier, vol. 10(2), pages 365-391.
    13. Jiang, Xiaorui & Zhuge, Hai, 2019. "Forward search path count as an alternative indirect citation impact indicator," Journal of Informetrics, Elsevier, vol. 13(4).
    14. Bornmann, Lutz & Marx, Werner, 2015. "Methods for the generation of normalized citation impact scores in bibliometrics: Which method best reflects the judgements of experts?," Journal of Informetrics, Elsevier, vol. 9(2), pages 408-418.
    15. Erjia Yan & Ying Ding, 2010. "Weighted citation: An indicator of an article's prestige," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 61(8), pages 1635-1643, August.
    16. Zhang, Yu & Wang, Min & Gottwalt, Florian & Saberi, Morteza & Chang, Elizabeth, 2019. "Ranking scientific articles based on bibliometric networks with a weighting scheme," Journal of Informetrics, Elsevier, vol. 13(2), pages 616-634.
    17. Dunaiski, Marcel & Visser, Willem & Geldenhuys, Jaco, 2016. "Evaluating paper and author ranking algorithms using impact and contribution awards," Journal of Informetrics, Elsevier, vol. 10(2), pages 392-407.
    18. Erjia Yan & Ying Ding & Cassidy R. Sugimoto, 2011. "P-Rank: An indicator measuring prestige in heterogeneous scholarly networks," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 62(3), pages 467-477, March.
    19. Mike Thelwall, 2016. "Interpreting correlations between citation counts and other indicators," Scientometrics, Springer;Akadémiai Kiadó, vol. 108(1), pages 337-347, July.
    20. Saarela, Mirka & Kärkkäinen, Tommi & Lahtonen, Tommi & Rossi, Tuomo, 2016. "Expert-based versus citation-based ranking of scholarly and scientific publication channels," Journal of Informetrics, Elsevier, vol. 10(3), pages 693-718.
    21. Liwei Cai & Jiahao Tian & Jiaying Liu & Xiaomei Bai & Ivan Lee & Xiangjie Kong & Feng Xia, 2019. "Scholarly impact assessment: a survey of citation weighting solutions," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(2), pages 453-478, February.
    22. Xiaorui Jiang & Xiaoping Sun & Zhe Yang & Hai Zhuge & Jianmin Yao, 2016. "Exploiting heterogeneous scientific literature networks to combat ranking bias: Evidence from the computational linguistics area," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 67(7), pages 1679-1702, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Fang Zhang & Shengli Wu, 2021. "Measuring academic entities’ impact by content-based citation analysis in a heterogeneous academic network," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 7197-7222, August.
    2. Yuanyuan Liu & Qiang Wu & Shijie Wu & Yong Gao, 2021. "Weighted citation based on ranking-related contribution: a new index for evaluating article impact," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(10), pages 8653-8672, October.
    3. Zhang, Fang & Wu, Shengli, 2020. "Predicting future influence of papers, researchers, and venues in a dynamic academic network," Journal of Informetrics, Elsevier, vol. 14(2).
    4. Dunaiski, Marcel & Geldenhuys, Jaco & Visser, Willem, 2019. "On the interplay between normalisation, bias, and performance of paper impact metrics," Journal of Informetrics, Elsevier, vol. 13(1), pages 270-290.
    5. Dejian Yu & Wanru Wang & Shuai Zhang & Wenyu Zhang & Rongyu Liu, 2017. "A multiple-link, mutually reinforced journal-ranking model to measure the prestige of journals," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(1), pages 521-542, April.
    6. Xu, Shuqi & Mariani, Manuel Sebastian & Lü, Linyuan & Medo, Matúš, 2020. "Unbiased evaluation of ranking metrics reveals consistent performance in science and technology citation data," Journal of Informetrics, Elsevier, vol. 14(1).
    7. Dunaiski, Marcel & Geldenhuys, Jaco & Visser, Willem, 2019. "Globalised vs averaged: Bias and ranking performance on the author level," Journal of Informetrics, Elsevier, vol. 13(1), pages 299-313.
    8. Yu Zhang & Min Wang & Morteza Saberi & Elizabeth Chang, 2020. "Knowledge fusion through academic articles: a survey of definitions, techniques, applications and challenges," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2637-2666, December.
    9. Jiang, Xiaorui & Zhuge, Hai, 2019. "Forward search path count as an alternative indirect citation impact indicator," Journal of Informetrics, Elsevier, vol. 13(4).
    10. Dunaiski, Marcel & Geldenhuys, Jaco & Visser, Willem, 2018. "Author ranking evaluation at scale," Journal of Informetrics, Elsevier, vol. 12(3), pages 679-702.
    11. Walters, William H., 2017. "Do subjective journal ratings represent whole journals or typical articles? Unweighted or weighted citation impact?," Journal of Informetrics, Elsevier, vol. 11(3), pages 730-744.
    12. Xipeng Liu & Xinmiao Li, 2024. "Unbiased evaluation of ranking algorithms applied to the Chinese green patents citation network," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(6), pages 2999-3021, June.
    13. Shunshun Shi & Wenyu Zhang & Shuai Zhang & Jie Chen, 2018. "Does prestige dimension influence the interdisciplinary performance of scientific entities in knowledge flow? Evidence from the e-government field," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(2), pages 1237-1264, November.
    14. Lin Feng & Jian Zhou & Sheng-Lan Liu & Ning Cai & Jie Yang, 2020. "Analysis of journal evaluation indicators: an experimental study based on unsupervised Laplacian score," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(1), pages 233-254, July.
    15. Vaccario, Giacomo & Medo, Matúš & Wider, Nicolas & Mariani, Manuel Sebastian, 2017. "Quantifying and suppressing ranking bias in a large citation network," Journal of Informetrics, Elsevier, vol. 11(3), pages 766-782.
    16. Wang, Jingjing & Xu, Shuqi & Mariani, Manuel S. & Lü, Linyuan, 2021. "The local structure of citation networks uncovers expert-selected milestone papers," Journal of Informetrics, Elsevier, vol. 15(4).
    17. Ana Teresa Santos & Sandro Mendonça, 2022. "Do papers (really) match journals’ “aims and scope”? A computational assessment of innovation studies," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(12), pages 7449-7470, December.
    18. Tehmina Amjad & Ying Ding & Ali Daud & Jian Xu & Vincent Malic, 2015. "Topic-based heterogeneous rank," Scientometrics, Springer;Akadémiai Kiadó, vol. 104(1), pages 313-334, July.
    19. Mariani, Manuel Sebastian & Medo, Matúš & Lafond, François, 2019. "Early identification of important patents: Design and validation of citation network metrics," Technological Forecasting and Social Change, Elsevier, vol. 146(C), pages 644-654.
    20. Yanbo Zhou & Xin-Li Xu & Xu-Hua Yang & Qu Li, 2022. "The influence of disruption on evaluating the scientific significance of papers," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(10), pages 5931-5945, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:127:y:2022:i:7:d:10.1007_s11192-022-04429-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.