IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0251493.html
   My bibliography  Save this article

Article-level classification of scientific publications: A comparison of deep learning, direct citation and bibliographic coupling

Author

Listed:
  • Maxime Rivest
  • Etienne Vignola-Gagné
  • Éric Archambault

Abstract

Classification schemes for scientific activity and publications underpin a large swath of research evaluation practices at the organizational, governmental, and national levels. Several research classifications are currently in use, and they require continuous work as new classification techniques becomes available and as new research topics emerge. Convolutional neural networks, a subset of “deep learning” approaches, have recently offered novel and highly performant methods for classifying voluminous corpora of text. This article benchmarks a deep learning classification technique on more than 40 million scientific articles and on tens of thousands of scholarly journals. The comparison is performed against bibliographic coupling-, direct citation-, and manual-based classifications—the established and most widely used approaches in the field of bibliometrics, and by extension, in many science and innovation policy activities such as grant competition management. The results reveal that the performance of this first iteration of a deep learning approach is equivalent to the graph-based bibliometric approaches. All methods presented are also on par with manual classification. Somewhat surprisingly, no machine learning approaches were found to clearly outperform the simple label propagation approach that is direct citation. In conclusion, deep learning is promising because it performed just as well as the other approaches but has more flexibility to be further improved. For example, a deep neural network incorporating information from the citation network is likely to hold the key to an even better classification algorithm.

Suggested Citation

  • Maxime Rivest & Etienne Vignola-Gagné & Éric Archambault, 2021. "Article-level classification of scientific publications: A comparison of deep learning, direct citation and bibliographic coupling," PLOS ONE, Public Library of Science, vol. 16(5), pages 1-18, May.
  • Handle: RePEc:plo:pone00:0251493
    DOI: 10.1371/journal.pone.0251493
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0251493
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0251493&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0251493?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Richard Klavans & Kevin W. Boyack, 2017. "Which Type of Citation Analysis Generates the Most Accurate Taxonomy of Scientific and Technical Knowledge?," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 68(4), pages 984-998, April.
    2. Theresa Velden & Kevin W. Boyack & Jochen Gläser & Rob Koopman & Andrea Scharnhorst & Shenghui Wang, 2017. "Comparison of topic extraction approaches and their results," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 1169-1221, May.
    3. Waltman, Ludo & van Eck, Nees Jan, 2015. "Field-normalized citation impact indicators and the choice of an appropriate counting method," Journal of Informetrics, Elsevier, vol. 9(4), pages 872-894.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Zhang, Lin & Qi, Fan & Sivertsen, Gunnar & Liang, Liming & Campbell, David, 2023. "Gender differences in the patterns and consequences of changing specialization in scientific careers," SocArXiv ep5bx, Center for Open Science.
    2. Yugang He, 2022. "A Study on the Dynamic Relationship between Wealth Gap and Economic Growth in China," European Journal of Marketing and Economics Articles, Revistia Research and Publishing, vol. 5, ejme_v5_i.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Paul Donner, 2021. "Validation of the Astro dataset clustering solutions with external data," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 1619-1645, February.
    2. Matthias Held & Grit Laudel & Jochen Gläser, 2021. "Challenges to the validity of topic reconstruction," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(5), pages 4511-4536, May.
    3. Ballester, Omar & Penner, Orion, 2022. "Robustness, replicability and scalability in topic modelling," Journal of Informetrics, Elsevier, vol. 16(1).
    4. Zhang, Yi & Lu, Jie & Liu, Feng & Liu, Qian & Porter, Alan & Chen, Hongshu & Zhang, Guangquan, 2018. "Does deep learning help topic extraction? A kernel k-means clustering method with word embedding," Journal of Informetrics, Elsevier, vol. 12(4), pages 1099-1117.
    5. Sjögårde, Peter & Ahlgren, Per, 2018. "Granularity of algorithmically constructed publication-level classifications of research publications: Identification of topics," Journal of Informetrics, Elsevier, vol. 12(1), pages 133-152.
    6. Nees Jan Eck & Ludo Waltman, 2017. "Citation-based clustering of publications using CitNetExplorer and VOSviewer," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 1053-1070, May.
    7. Carlos Olmeda-Gómez & Carlos Romá-Mateo & Maria-Antonia Ovalle-Perandones, 2019. "Overview of trends in global epigenetic research (2009–2017)," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(3), pages 1545-1574, June.
    8. Xu, Haiyun & Winnink, Jos & Yue, Zenghui & Zhang, Huiling & Pang, Hongshen, 2021. "Multidimensional Scientometric indicators for the detection of emerging research topics," Technological Forecasting and Social Change, Elsevier, vol. 163(C).
    9. Peter Sjögårde & Per Ahlgren & Ludo Waltman, 2021. "Algorithmic labeling in hierarchical classifications of publications: Evaluation of bibliographic fields and term weighting approaches," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 72(7), pages 853-869, July.
    10. Calof, Jonathan & Søilen, Klaus Solberg & Klavans, Richard & Abdulkader, Bisan & Moudni, Ismail El, 2022. "Understanding the structure, characteristics, and future of collective intelligence using local and global bibliometric analyses," Technological Forecasting and Social Change, Elsevier, vol. 178(C).
    11. Luis Araya-Castillo & Felipe Hernández-Perlines & Hugo Moraga & Antonio Ariza-Montes, 2021. "Scientometric Analysis of Research on Socioemotional Wealth," Sustainability, MDPI, vol. 13(7), pages 1-26, March.
    12. Gao, Qiang & Liang, Zhentao & Wang, Ping & Hou, Jingrui & Chen, Xiuxiu & Liu, Manman, 2021. "Potential index: Revealing the future impact of research topics based on current knowledge networks," Journal of Informetrics, Elsevier, vol. 15(3).
    13. Takahiro Kawamura & Katsutaro Watanabe & Naoya Matsumoto & Shusaku Egami & Mari Jibu, 2018. "Funding map using paragraph embedding based on semantic diversity," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 941-958, August.
    14. Serhat Burmaoglu & Ozcan Saritas, 2019. "An evolutionary analysis of the innovation policy domain: Is there a paradigm shift?," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(3), pages 823-847, March.
    15. Jeffrey Demaine, 2022. "Fractionalization of research impact reveals global trends in university collaboration," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(5), pages 2235-2247, May.
    16. Thelwall, Mike & Bailey, Carol & Makita, Meiko & Sud, Pardeep & Madalli, Devika P., 2019. "Gender and research publishing in India: Uniformly high inequality?," Journal of Informetrics, Elsevier, vol. 13(1), pages 118-131.
    17. Peter Sjögårde & Fereshteh Didegah, 2022. "The association between topic growth and citation impact of research publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(4), pages 1903-1921, April.
    18. Stephen, Dimity & Stahlschmidt, Stephan, 2021. "Performance and structures of the German science system 2021," Studien zum deutschen Innovationssystem 5-2021, Expertenkommission Forschung und Innovation (EFI) - Commission of Experts for Research and Innovation, Berlin.
    19. Jinyuan Ma & Fan Jiang & Liujian Gu & Xiang Zheng & Xiao Lin & Chuanyi Wang, 2020. "Patterns of the Network of Cross-Border University Research Collaboration in the Guangdong-Hong Kong-Macau Greater Bay Area," Sustainability, MDPI, vol. 12(17), pages 1-17, August.
    20. Leporia, Benedetto & Geuna, Aldo & Mira, Antonietta, 2018. "Scientific Output of US and European Universities Scales Super-linearly with Resources," Department of Economics and Statistics Cognetti de Martiis LEI & BRICK - Laboratory of Economics of Innovation "Franco Momigliano", Bureau of Research in Innovation, Complexity and Knowledge, Collegio 201806, University of Turin.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0251493. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.