IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0154404.html
   My bibliography  Save this article

Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods

Author

Listed:
  • Lovro Šubelj
  • Nees Jan van Eck
  • Ludo Waltman

Abstract

Clustering methods are applied regularly in the bibliometric literature to identify research areas or scientific fields. These methods are for instance used to group publications into clusters based on their relations in a citation network. In the network science literature, many clustering methods, often referred to as graph partitioning or community detection techniques, have been developed. Focusing on the problem of clustering the publications in a citation network, we present a systematic comparison of the performance of a large number of these clustering methods. Using a number of different citation networks, some of them relatively small and others very large, we extensively study the statistical properties of the results provided by different methods. In addition, we also carry out an expert-based assessment of the results produced by different methods. The expert-based assessment focuses on publications in the field of scientometrics. Our findings seem to indicate that there is a trade-off between different properties that may be considered desirable for a good clustering of publications. Overall, map equation methods appear to perform best in our analysis, suggesting that these methods deserve more attention from the bibliometric community.

Suggested Citation

  • Lovro Šubelj & Nees Jan van Eck & Ludo Waltman, 2016. "Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods," PLOS ONE, Public Library of Science, vol. 11(4), pages 1-23, April.
  • Handle: RePEc:plo:pone00:0154404
    DOI: 10.1371/journal.pone.0154404
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0154404
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0154404&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0154404?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. van Eck, Nees Jan & Waltman, Ludo, 2014. "CitNetExplorer: A new software tool for analyzing and visualizing citation networks," Journal of Informetrics, Elsevier, vol. 8(4), pages 802-823.
    2. Ludo Waltman & Nees Jan van Eck, 2012. "A new methodology for constructing a publication‐level classification system of science," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 63(12), pages 2378-2392, December.
    3. Atieh Mirshahvalad & Olivier H Beauchesne & Éric Archambault & Martin Rosvall, 2013. "Resampling Effects on Significance Analysis of Network Clustering and Ranking," PLOS ONE, Public Library of Science, vol. 8(1), pages 1-7, January.
    4. Ludo Waltman & Nees Eck, 2013. "A smart local moving algorithm for large-scale modularity-based community detection," The European Physical Journal B: Condensed Matter and Complex Systems, Springer;EDP Sciences, vol. 86(11), pages 1-14, November.
    5. Yong-Yeol Ahn & James P. Bagrow & Sune Lehmann, 2010. "Link communities reveal multiscale complexity in networks," Nature, Nature, vol. 466(7307), pages 761-764, August.
    6. Kevin W. Boyack & Richard Klavans, 2010. "Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately?," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 61(12), pages 2389-2404, December.
    7. Ludo Waltman & Nees Jan Eck, 2012. "A new methodology for constructing a publication-level classification system of science," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(12), pages 2378-2392, December.
    8. L. Šubelj & M. Bajec, 2011. "Robust network community detection using balanced propagation," The European Physical Journal B: Condensed Matter and Complex Systems, Springer;EDP Sciences, vol. 81(3), pages 353-362, June.
    9. Jarneving, Bo, 2007. "Bibliographic coupling and its application to research-front and other core documents," Journal of Informetrics, Elsevier, vol. 1(4), pages 287-307.
    10. Kevin W Boyack & David Newman & Russell J Duhon & Richard Klavans & Michael Patek & Joseph R Biberstine & Bob Schijvenaars & André Skupin & Nianli Ma & Katy Börner, 2011. "Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches," PLOS ONE, Public Library of Science, vol. 6(3), pages 1-11, March.
    11. Martin Rosvall & Carl T Bergstrom, 2011. "Multilevel Compression of Random Walks on Networks Reveals Hierarchical Organization in Large Integrated Systems," PLOS ONE, Public Library of Science, vol. 6(4), pages 1-10, April.
    12. Martin Rosvall & Carl T Bergstrom, 2010. "Mapping Change in Large Networks," PLOS ONE, Public Library of Science, vol. 5(1), pages 1-7, January.
    13. Kevin W. Boyack & Richard Klavans, 2010. "Co‐citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately?," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 61(12), pages 2389-2404, December.
    14. Boyack, Kevin W. & Klavans, Richard, 2014. "Including cited non-source items in a large-scale map of science: What difference does it make?," Journal of Informetrics, Elsevier, vol. 8(3), pages 569-580.
    15. Meila, Marina, 2007. "Comparing clusterings--an information based distance," Journal of Multivariate Analysis, Elsevier, vol. 98(5), pages 873-895, May.
    16. Andrea Lancichinetti & Filippo Radicchi & José J Ramasco & Santo Fortunato, 2011. "Finding Statistically Significant Communities in Networks," PLOS ONE, Public Library of Science, vol. 6(4), pages 1-18, April.
    17. Šubelj, Lovro & Bajec, Marko, 2014. "Group detection in complex networks: An algorithm and comparison of the state of the art," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 397(C), pages 144-156.
    18. Waltman, Ludo & van Eck, Nees Jan & Noyons, Ed C.M., 2010. "A unified approach to mapping and clustering of bibliometric networks," Journal of Informetrics, Elsevier, vol. 4(4), pages 629-635.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Haitham Nobanee & Fayrouz Aksam Elsaied & Nouf Alhammadi & Noora Wazir, 2023. "Bibliometric analysis and visualization of green, sustainable, and environmental insurance research," Journal of Financial Services Marketing, Palgrave Macmillan, vol. 28(4), pages 631-648, December.
    2. Carusi, Chiara & Bianchi, Giuseppe, 2019. "Scientific community detection via bipartite scholar/journal graph co-clustering," Journal of Informetrics, Elsevier, vol. 13(1), pages 354-386.
    3. Diana Maynard & Benedetto Lepori & Johann Petrak & Xingyi Song & Philippe Laredo, 2020. "Using ontologies to map between research data and policymakers’ presumptions: the experience of the KNOWMAK project," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(2), pages 1275-1290, November.
    4. Jesús M. Álvarez-Llorente & Vicente P. Guerrero-Bote & Félix Moya-Anegón, 2024. "New fractional classifications of papers based on two generations of references and on the ASJC scopus scheme," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(6), pages 3493-3515, June.
    5. Wirapong Chansanam & Chunqiu Li, 2022. "Scientometrics of Poverty Research for Sustainability Development: Trend Analysis of the 1964–2022 Data through Scopus," Sustainability, MDPI, vol. 14(9), pages 1-19, April.
    6. Yang Liu & Jianjun Dong & Ling Shen, 2020. "A Conceptual Development Framework for Prefabricated Construction Supply Chain Management: An Integrated Overview," Sustainability, MDPI, vol. 12(5), pages 1-29, March.
    7. Leo Capari & Harald Wilfing & Andreas Exner & Thomas Höflehner & Daniela Haluza, 2022. "Cooling the City? A Scientometric Study on Urban Green and Blue Infrastructure and Climate Change-Induced Public Health Effects," Sustainability, MDPI, vol. 14(9), pages 1-19, April.
    8. Tomasz Zema & Adam Sulich, 2022. "Models of Electricity Price Forecasting: Bibliometric Research," Energies, MDPI, vol. 15(15), pages 1-18, August.
    9. Chiara Carusi & Giuseppe Bianchi, 2020. "A look at interdisciplinarity using bipartite scholar/journal networks," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(2), pages 867-894, February.
    10. Sjögårde, Peter & Ahlgren, Per, 2018. "Granularity of algorithmically constructed publication-level classifications of research publications: Identification of topics," Journal of Informetrics, Elsevier, vol. 12(1), pages 133-152.
    11. Kevin W. Boyack, 2017. "Investigating the effect of global data on topic detection," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 999-1015, May.
    12. Carlos Olmeda-Gómez & Carlos Romá-Mateo & Maria-Antonia Ovalle-Perandones, 2019. "Overview of trends in global epigenetic research (2009–2017)," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(3), pages 1545-1574, June.
    13. Paul Donner, 2021. "Validation of the Astro dataset clustering solutions with external data," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 1619-1645, February.
    14. Jochen Gläser & Wolfgang Glänzel & Andrea Scharnhorst, 2017. "Same data—different results? Towards a comparative approach to the identification of thematic structures in science," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 981-998, May.
    15. Matthias Held & Grit Laudel & Jochen Gläser, 2021. "Challenges to the validity of topic reconstruction," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(5), pages 4511-4536, May.
    16. Jiang, Kai & Ashworth, Peta, 2021. "The development of Carbon Capture Utilization and Storage (CCUS) research in China: A bibliometric perspective," Renewable and Sustainable Energy Reviews, Elsevier, vol. 138(C).
    17. Hua Zheng & Min Guo & Qian Wang & Qinghai Zhang & Noriko Akita, 2023. "A Bibliometric Analysis of Current Knowledge Structure and Research Progress Related to Urban Community Garden Systems," Land, MDPI, vol. 12(1), pages 1-34, January.
    18. Luis Gerardo Hernández García, 2022. "Transport equipment network analysis: the value-added contribution," Journal of Economic Structures, Springer;Pan-Pacific Association of Input-Output Studies (PAPAIOS), vol. 11(1), pages 1-25, December.
    19. Sitaram Devarakonda & Dmitriy Korobskiy & Tandy Warnow & George Chacko, 2020. "Viewing computer science through citation analysis: Salton and Bergmark Redux," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 271-287, October.
    20. Li, Menghui & Yang, Liying & Zhang, Huina & Shen, Zhesi & Wu, Chensheng & Wu, Jinshan, 2017. "Do mathematicians, economists and biomedical scientists trace large topics more strongly than physicists?," Journal of Informetrics, Elsevier, vol. 11(2), pages 598-607.
    21. Maria Magdalena Turek Rahoveanu & Valentin Serban & Adrian Gheorghe Zugravu & Adrian Turek Rahoveanu & Dragoș Sebastian Cristea & Petronela Nechita & Cristian Silviu Simionescu, 2022. "Perspectives on Smart Villages from a Bibliometric Approach," Sustainability, MDPI, vol. 14(17), pages 1-17, August.
    22. Fan, Yangliu & Lehmann, Sune & Blok, Anders, 2022. "Extracting the interdisciplinary specialty structures in social media data-based research: A clustering-based network approach," Journal of Informetrics, Elsevier, vol. 16(3).
    23. Giovanni Colavizza, 2017. "The structural role of the core literature in history," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(3), pages 1787-1809, December.
    24. Yuya Kajikawa, 2022. "Reframing evidence in evidence-based policy making and role of bibliometrics: toward transdisciplinary scientometric research," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(9), pages 5571-5585, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Nees Jan Eck & Ludo Waltman, 2017. "Citation-based clustering of publications using CitNetExplorer and VOSviewer," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 1053-1070, May.
    2. Sjögårde, Peter & Ahlgren, Per, 2018. "Granularity of algorithmically constructed publication-level classifications of research publications: Identification of topics," Journal of Informetrics, Elsevier, vol. 12(1), pages 133-152.
    3. R. Fileto Maciel & P. Saskia Bayerl & Marta Macedo Kerr Pinheiro, 2019. "Technical research innovations of the US national security system," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(2), pages 539-565, August.
    4. Yun, Jinhyuk, 2022. "Generalization of bibliographic coupling and co-citation using the node split network," Journal of Informetrics, Elsevier, vol. 16(2).
    5. Fang Han & Christopher L. Magee, 2018. "Testing the science/technology relationship by analysis of patent citations of scientific papers after decomposition of both science and technology," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 767-796, August.
    6. Xu, Shuo & Hao, Liyuan & Yang, Guancan & Lu, Kun & An, Xin, 2021. "A topic models based framework for detecting and forecasting emerging technologies," Technological Forecasting and Social Change, Elsevier, vol. 162(C).
    7. Leydesdorff, Loet & Bornmann, Lutz & Zhou, Ping, 2016. "Construction of a pragmatic base line for journal classifications and maps based on aggregated journal-journal citation relations," Journal of Informetrics, Elsevier, vol. 10(4), pages 902-918.
    8. Zang, Yuzhu & Yang, Yuanyuan & Liu, Yansui, 2021. "Toward serving land consolidation on the table of sustainability: An overview of the research landscape and future directions," Land Use Policy, Elsevier, vol. 109(C).
    9. Xu, Shuo & Hao, Liyuan & An, Xin & Yang, Guancan & Wang, Feifei, 2019. "Emerging research topics detection with multiple machine learning models," Journal of Informetrics, Elsevier, vol. 13(4).
    10. Sitaram Devarakonda & Dmitriy Korobskiy & Tandy Warnow & George Chacko, 2020. "Viewing computer science through citation analysis: Salton and Bergmark Redux," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 271-287, October.
    11. Wang, Feifei & Jia, Chenran & Wang, Xiaohan & Liu, Junwan & Xu, Shuo & Liu, Yang & Yang, Chenyuyan, 2019. "Exploring all-author tripartite citation networks: A case study of gene editing," Journal of Informetrics, Elsevier, vol. 13(3), pages 856-873.
    12. Urdiales, Cristina & Guzmán, Eduardo, 2024. "An automatic and association-based procedure for hierarchical publication subject categorization," Journal of Informetrics, Elsevier, vol. 18(1).
    13. Shu, Fei & Julien, Charles-Antoine & Zhang, Lin & Qiu, Junping & Zhang, Jing & Larivière, Vincent, 2019. "Comparing journal and paper level classifications of science," Journal of Informetrics, Elsevier, vol. 13(1), pages 202-225.
    14. Gómez-Núñez, Antonio J. & Batagelj, Vladimir & Vargas-Quesada, Benjamín & Moya-Anegón, Félix & Chinchilla-Rodríguez, Zaida, 2014. "Optimizing SCImago Journal & Country Rank classification by community detection," Journal of Informetrics, Elsevier, vol. 8(2), pages 369-383.
    15. Fei Shu & Yue Ma & Junping Qiu & Vincent Larivière, 2020. "Classifications of science and their effects on bibliometric evaluations," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2727-2744, December.
    16. Nina Sakinah Ahmad Rofaie & Seuk Wai Phoong & Muzalwana Abdul Talib & Ainin Sulaiman, 2023. "Light-emitting diode (LED) research: A bibliometric analysis during 2003–2018," Quality & Quantity: International Journal of Methodology, Springer, vol. 57(1), pages 173-191, February.
    17. Ignacio Rodríguez-Rodríguez & José-Víctor Rodríguez & Niloofar Shirvanizadeh & Andrés Ortiz & Domingo-Javier Pardo-Quiles, 2021. "Applications of Artificial Intelligence, Machine Learning, Big Data and the Internet of Things to the COVID-19 Pandemic: A Scientometric Review Using Text Mining," IJERPH, MDPI, vol. 18(16), pages 1-29, August.
    18. Peter Sjögårde & Fereshteh Didegah, 2022. "The association between topic growth and citation impact of research publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(4), pages 1903-1921, April.
    19. Ludo Waltman & Nees Jan Eck, 2012. "A new methodology for constructing a publication-level classification system of science," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(12), pages 2378-2392, December.
    20. Lin Zhang & Beibei Sun & Fei Shu & Ying Huang, 2022. "Comparing paper level classifications across different methods and systems: an investigation of Nature publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(12), pages 7633-7651, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0154404. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.