IDEAS home Printed from https://ideas.repec.org/a/eee/infome/v14y2020i4s1751157720301978.html
   My bibliography  Save this article

Return to basics: Clustering of scientific literature using structural information

Author

Listed:
  • Yun, Jinhyuk
  • Ahn, Sejung
  • Lee, June Young

Abstract

Scholars frequently employ relatedness measures to estimate the similarity between two different items (e.g., documents, authors, and institutes). Such relatedness measures are commonly based on overlapping references (i.e., bibliographic coupling) or citations (i.e., co-citation) and can then be used with cluster analysis to find boundaries between research fields. Unfortunately, calculating a relatedness measure is challenging, especially for a large number of items, because the computational complexity is greater than linear. We propose an alternative method for identifying research fronts that uses direct citation inspired by relatedness measures. Our novel approach simply replicates a node into two distinct nodes: a citing node and cited node. We then apply typical clustering methods to the modified network. Clusters of citing nodes should emulate those from the bibliographic coupling relatedness network, while clusters of cited nodes should act like those from the co-citation relatedness network. In validation tests, our proposed method demonstrated high levels of similarity with conventional relatedness-based methods. We also found that the clustering results of the proposed method outperformed those of conventional relatedness-based measures regarding similarity with natural language processing-based classification.

Suggested Citation

  • Yun, Jinhyuk & Ahn, Sejung & Lee, June Young, 2020. "Return to basics: Clustering of scientific literature using structural information," Journal of Informetrics, Elsevier, vol. 14(4).
  • Handle: RePEc:eee:infome:v:14:y:2020:i:4:s1751157720301978
    DOI: 10.1016/j.joi.2020.101099
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1751157720301978
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.joi.2020.101099?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Rey-Long Liu, 2017. "A new bibliographic coupling measure with descriptive capability," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(2), pages 915-935, February.
    2. Giovanni Colavizza & Kevin W. Boyack & Nees Jan van Eck & Ludo Waltman, 2018. "The Closer the Better: Similarity of Publication Pairs at Different Cocitation Levels," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 69(4), pages 600-609, April.
    3. repec:bla:jamist:v:54:y:2003:i:13:p:1250-1259 is not listed on IDEAS
    4. Ludo Waltman & Nees Jan Eck, 2012. "A new methodology for constructing a publication-level classification system of science," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(12), pages 2378-2392, December.
    5. Richard Klavans & Kevin W. Boyack, 2017. "Which Type of Citation Analysis Generates the Most Accurate Taxonomy of Scientific and Technical Knowledge?," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 68(4), pages 984-998, April.
    6. Jochen Gläser & Wolfgang Glänzel & Andrea Scharnhorst, 2017. "Same data—different results? Towards a comparative approach to the identification of thematic structures in science," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 981-998, May.
    7. repec:bla:jamist:v:55:y:2004:i:9:p:843-843 is not listed on IDEAS
    8. repec:bla:jamist:v:54:y:2003:i:6:p:550-560 is not listed on IDEAS
    9. repec:bla:amedoc:v:14:y:1963:i:1:p:10-25 is not listed on IDEAS
    10. repec:bla:jamist:v:61:y:2010:i:12:p:2389-2404 is not listed on IDEAS
    11. Yu-Wei Chang & Mu-Hsuan Huang & Chiao-Wen Lin, 2015. "Evolution of research subjects in library and information science based on keyword, bibliographical coupling, and co-citation analyses," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 2071-2087, December.
    12. repec:bla:jamest:v:41:y:1990:i:6:p:433-443 is not listed on IDEAS
    13. repec:bla:jamest:v:46:y:1995:i:1:p:45-51 is not listed on IDEAS
    14. Bu, Yi & Ni, Shaokang & Huang, Win-bin, 2017. "Combining multiple scholarly relationships with author cocitation analysis: A preliminary exploration on improving knowledge domain mappings," Journal of Informetrics, Elsevier, vol. 11(3), pages 810-822.
    15. repec:bla:jamist:v:56:y:2005:i:7:p:769-772 is not listed on IDEAS
    16. Leo Egghe & Ronald Rousseau, 2002. "Co-citation, bibliographic coupling and a characterization of lattice citation networks," Scientometrics, Springer;Akadémiai Kiadó, vol. 55(3), pages 349-361, November.
    17. repec:bla:jamist:v:63:y:2012:i:12:p:2378-2392 is not listed on IDEAS
    18. Bart Thijs & Edgar Schiebel & Wolfgang Glänzel, 2013. "Do second-order similarities provide added-value in a hybrid approach?," Scientometrics, Springer;Akadémiai Kiadó, vol. 96(3), pages 667-677, September.
    19. repec:bla:jamest:v:24:y:1973:i:4:p:265-269 is not listed on IDEAS
    20. Kevin W. Boyack & Richard Klavans, 2010. "Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately?," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 61(12), pages 2389-2404, December.
    21. repec:bla:jamist:v:59:y:2008:i:13:p:2070-2086 is not listed on IDEAS
    22. repec:bla:jamest:v:50:y:1999:i:9:p:799-813 is not listed on IDEAS
    23. Cristian Colliander & Per Ahlgren, 2012. "Experimental comparison of first and second-order similarities in a scientometric context," Scientometrics, Springer;Akadémiai Kiadó, vol. 90(2), pages 675-685, February.
    24. repec:bla:jamest:v:51:y:2000:i:7:p:635-645 is not listed on IDEAS
    25. repec:bla:jamist:v:60:y:2009:i:2:p:240-246 is not listed on IDEAS
    26. repec:bla:jamist:v:55:y:2004:i:10:p:935-935 is not listed on IDEAS
    27. repec:bla:jamest:v:32:y:1981:i:3:p:163-171 is not listed on IDEAS
    28. repec:bla:jamest:v:49:y:1998:i:4:p:327-355 is not listed on IDEAS
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yun, Jinhyuk, 2022. "Generalization of bibliographic coupling and co-citation using the node split network," Journal of Informetrics, Elsevier, vol. 16(2).
    2. Skrjanc, T. & Mihalic, R. & Rudez, U., 2023. "A systematic literature review on under-frequency load shedding protection using clustering methods," Renewable and Sustainable Energy Reviews, Elsevier, vol. 180(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yun, Jinhyuk, 2022. "Generalization of bibliographic coupling and co-citation using the node split network," Journal of Informetrics, Elsevier, vol. 16(2).
    2. Fabian Meyer-Brötz & Edgar Schiebel & Leo Brecht, 2017. "Experimental evaluation of parameter settings in calculation of hybrid similarities: effects of first- and second-order similarity, edge cutting, and weighting factors," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(3), pages 1307-1325, June.
    3. Sjögårde, Peter & Ahlgren, Per, 2018. "Granularity of algorithmically constructed publication-level classifications of research publications: Identification of topics," Journal of Informetrics, Elsevier, vol. 12(1), pages 133-152.
    4. Paul Donner, 2021. "Validation of the Astro dataset clustering solutions with external data," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 1619-1645, February.
    5. Li, Menghui & Yang, Liying & Zhang, Huina & Shen, Zhesi & Wu, Chensheng & Wu, Jinshan, 2017. "Do mathematicians, economists and biomedical scientists trace large topics more strongly than physicists?," Journal of Informetrics, Elsevier, vol. 11(2), pages 598-607.
    6. Matthias Held & Grit Laudel & Jochen Gläser, 2021. "Challenges to the validity of topic reconstruction," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(5), pages 4511-4536, May.
    7. Takano, Yasutomo & Kajikawa, Yuya, 2019. "Extracting commercialization opportunities of the Internet of Things: Measuring text similarity between papers and patents," Technological Forecasting and Social Change, Elsevier, vol. 138(C), pages 45-68.
    8. Shome, Samik & Hassan, M. Kabir & Verma, Sushma & Panigrahi, Tushar Ranjan, 2023. "Impact investment for sustainable development: A bibliometric analysis," International Review of Economics & Finance, Elsevier, vol. 84(C), pages 770-800.
    9. Nees Jan Eck & Ludo Waltman, 2017. "Citation-based clustering of publications using CitNetExplorer and VOSviewer," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 1053-1070, May.
    10. Wang, Feifei & Jia, Chenran & Wang, Xiaohan & Liu, Junwan & Xu, Shuo & Liu, Yang & Yang, Chenyuyan, 2019. "Exploring all-author tripartite citation networks: A case study of gene editing," Journal of Informetrics, Elsevier, vol. 13(3), pages 856-873.
    11. Fang Han & Christopher L. Magee, 2018. "Testing the science/technology relationship by analysis of patent citations of scientific papers after decomposition of both science and technology," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 767-796, August.
    12. Fei Shu & Yue Ma & Junping Qiu & Vincent Larivière, 2020. "Classifications of science and their effects on bibliometric evaluations," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2727-2744, December.
    13. Peter Sjögårde & Fereshteh Didegah, 2022. "The association between topic growth and citation impact of research publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(4), pages 1903-1921, April.
    14. Lin Zhang & Beibei Sun & Fei Shu & Ying Huang, 2022. "Comparing paper level classifications across different methods and systems: an investigation of Nature publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(12), pages 7633-7651, December.
    15. Sitaram Devarakonda & Dmitriy Korobskiy & Tandy Warnow & George Chacko, 2020. "Viewing computer science through citation analysis: Salton and Bergmark Redux," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 271-287, October.
    16. Shu, Fei & Julien, Charles-Antoine & Zhang, Lin & Qiu, Junping & Zhang, Jing & Larivière, Vincent, 2019. "Comparing journal and paper level classifications of science," Journal of Informetrics, Elsevier, vol. 13(1), pages 202-225.
    17. Carlos Olmeda-Gómez & Carlos Romá-Mateo & Maria-Antonia Ovalle-Perandones, 2019. "Overview of trends in global epigenetic research (2009–2017)," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(3), pages 1545-1574, June.
    18. Serhat Burmaoglu & Ozcan Saritas, 2019. "An evolutionary analysis of the innovation policy domain: Is there a paradigm shift?," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(3), pages 823-847, March.
    19. June Young Lee & Sejung Ahn & Dohyun Kim, 2021. "Deep learning-based prediction of future growth potential of technologies," PLOS ONE, Public Library of Science, vol. 16(6), pages 1-16, June.
    20. Michel Zitt, 2015. "Meso-level retrieval: IR-bibliometrics interplay and hybrid citation-words methods in scientific fields delineation," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(3), pages 2223-2245, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:infome:v:14:y:2020:i:4:s1751157720301978. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/joi .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.