IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v111y2017i2d10.1007_s11192-017-2297-y.html
   My bibliography  Save this article

Investigating the effect of global data on topic detection

Author

Listed:
  • Kevin W. Boyack

    (SciTech Strategies, Inc.)

Abstract

A dataset containing 111,616 documents in astronomy and astrophysics (Astro-set) has been created and is being partitioned by several research groups using different algorithms. For this paper, rather than partitioning the dataset directly, we locate the data in a previously created model of the full Scopus database. This allows comparisons between using local and global data for community detection, which is done in an accompanying paper. We can begin to answer the question of the extent to which the rest of a large database (a global solution) affects the partitioning of a smaller journal-based set of documents (a local solution). We find that the Astro-set, while spread across hundreds of partitions in the Scopus map, is concentrated in only a few regions of the map. From this perspective there seems to be some correspondence between local information and the global cluster solution. However, we also show that the within-Astro-set links are only one-third of the total links that are available to these papers in the full Scopus database. The non-Astro-set links are significant in two ways: (1) in areas where the Astro-set papers are concentrated, related papers from non-astronomy journals are included in clusters with the Astro-set papers, and (2) Astro-set papers that have a very low fraction of within-set links tend to end up in clusters that are not astronomy-based. Overall, this work highlights limitations of the use of journal-based document sets to identify the structure of scientific fields.

Suggested Citation

  • Kevin W. Boyack, 2017. "Investigating the effect of global data on topic detection," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 999-1015, May.
  • Handle: RePEc:spr:scient:v:111:y:2017:i:2:d:10.1007_s11192-017-2297-y
    DOI: 10.1007/s11192-017-2297-y
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-017-2297-y
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-017-2297-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Ludo Waltman & Nees Eck, 2013. "A smart local moving algorithm for large-scale modularity-based community detection," The European Physical Journal B: Condensed Matter and Complex Systems, Springer;EDP Sciences, vol. 86(11), pages 1-14, November.
    2. Richard Klavans & Kevin W. Boyack, 2011. "Using global mapping to create more accurate document-level maps of research fields," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 62(1), pages 1-18, January.
    3. Ludo Waltman & Nees Jan Eck, 2012. "A new methodology for constructing a publication-level classification system of science," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(12), pages 2378-2392, December.
    4. András Schubert, 2013. "Measuring the similarity between the reference and citation distributions of journals," Scientometrics, Springer;Akadémiai Kiadó, vol. 96(1), pages 305-313, July.
    5. Theresa Velden & Kevin W. Boyack & Jochen Gläser & Rob Koopman & Andrea Scharnhorst & Shenghui Wang, 2017. "Comparison of topic extraction approaches and their results," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 1169-1221, May.
    6. Nees Jan Eck & Ludo Waltman, 2017. "Citation-based clustering of publications using CitNetExplorer and VOSviewer," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 1053-1070, May.
    7. Small, Henry & Boyack, Kevin W. & Klavans, Richard, 2014. "Identifying emerging topics in science and technology," Research Policy, Elsevier, vol. 43(8), pages 1450-1467.
    8. Boyack, Kevin W. & Klavans, Richard, 2014. "Including cited non-source items in a large-scale map of science: What difference does it make?," Journal of Informetrics, Elsevier, vol. 8(3), pages 569-580.
    9. Kevin W. Boyack & Richard Klavans, 2014. "Creation of a highly detailed, dynamic, global model and map of science," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(4), pages 670-685, April.
    10. Ludo Waltman & Nees Jan van Eck, 2012. "A new methodology for constructing a publication‐level classification system of science," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 63(12), pages 2378-2392, December.
    11. Lovro Šubelj & Nees Jan van Eck & Ludo Waltman, 2016. "Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods," PLOS ONE, Public Library of Science, vol. 11(4), pages 1-23, April.
    12. Richard Klavans & Kevin W. Boyack, 2011. "Using global mapping to create more accurate document‐level maps of research fields," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 62(1), pages 1-18, January.
    13. Scott Emmons & Stephen Kobourov & Mike Gallant & Katy Börner, 2016. "Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale," PLOS ONE, Public Library of Science, vol. 11(7), pages 1-18, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Keungoui Kim & Dieter F. Kogler & Sira Maliphol, 2024. "Identifying interdisciplinary emergence in the science of science: combination of network analysis and BERTopic," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-15, December.
    2. Rob Koopman & Shenghui Wang & Andrea Scharnhorst, 2017. "Contextualization of topics: browsing through the universe of bibliographic information," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 1119-1139, May.
    3. Jochen Gläser & Wolfgang Glänzel & Andrea Scharnhorst, 2017. "Same data—different results? Towards a comparative approach to the identification of thematic structures in science," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 981-998, May.
    4. Calof, Jonathan & Søilen, Klaus Solberg & Klavans, Richard & Abdulkader, Bisan & Moudni, Ismail El, 2022. "Understanding the structure, characteristics, and future of collective intelligence using local and global bibliometric analyses," Technological Forecasting and Social Change, Elsevier, vol. 178(C).
    5. Yuan Zhou & Heng Lin & Yufei Liu & Wei Ding, 2019. "A novel method to identify emerging technologies using a semi-supervised topic clustering model: a case of 3D printing industry," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(1), pages 167-185, July.
    6. Diana Maynard & Benedetto Lepori & Johann Petrak & Xingyi Song & Philippe Laredo, 2020. "Using ontologies to map between research data and policymakers’ presumptions: the experience of the KNOWMAK project," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(2), pages 1275-1290, November.
    7. Theresa Velden & Kevin W. Boyack & Jochen Gläser & Rob Koopman & Andrea Scharnhorst & Shenghui Wang, 2017. "Comparison of topic extraction approaches and their results," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 1169-1221, May.
    8. Sjögårde, Peter & Ahlgren, Per, 2018. "Granularity of algorithmically constructed publication-level classifications of research publications: Identification of topics," Journal of Informetrics, Elsevier, vol. 12(1), pages 133-152.
    9. Frank Havemann & Jochen Gläser & Michael Heinz, 2017. "Memetic search for overlapping topics based on a local evaluation of link communities," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 1089-1118, May.
    10. Sitaram Devarakonda & Dmitriy Korobskiy & Tandy Warnow & George Chacko, 2020. "Viewing computer science through citation analysis: Salton and Bergmark Redux," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 271-287, October.
    11. Bart Thijs, 2020. "Using neural-network based paragraph embeddings for the calculation of within and between document similarities," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(2), pages 835-849, November.
    12. Paul Donner, 2021. "Validation of the Astro dataset clustering solutions with external data," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 1619-1645, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sjögårde, Peter & Ahlgren, Per, 2018. "Granularity of algorithmically constructed publication-level classifications of research publications: Identification of topics," Journal of Informetrics, Elsevier, vol. 12(1), pages 133-152.
    2. Jochen Gläser & Wolfgang Glänzel & Andrea Scharnhorst, 2017. "Same data—different results? Towards a comparative approach to the identification of thematic structures in science," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 981-998, May.
    3. Matthias Held & Grit Laudel & Jochen Gläser, 2021. "Challenges to the validity of topic reconstruction," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(5), pages 4511-4536, May.
    4. Nees Jan Eck & Ludo Waltman, 2017. "Citation-based clustering of publications using CitNetExplorer and VOSviewer," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 1053-1070, May.
    5. Frank Havemann & Jochen Gläser & Michael Heinz, 2017. "Memetic search for overlapping topics based on a local evaluation of link communities," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 1089-1118, May.
    6. Shuo Xu & Junwan Liu & Dongsheng Zhai & Xin An & Zheng Wang & Hongshen Pang, 2018. "Overlapping thematic structures extraction with mixed-membership stochastic blockmodel," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(1), pages 61-84, October.
    7. R. Fileto Maciel & P. Saskia Bayerl & Marta Macedo Kerr Pinheiro, 2019. "Technical research innovations of the US national security system," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(2), pages 539-565, August.
    8. Rons, Nadine, 2018. "Bibliometric approximation of a scientific specialty by combining key sources, title words, authors and references," Journal of Informetrics, Elsevier, vol. 12(1), pages 113-132.
    9. Xu, Haiyun & Winnink, Jos & Yue, Zenghui & Zhang, Huiling & Pang, Hongshen, 2021. "Multidimensional Scientometric indicators for the detection of emerging research topics," Technological Forecasting and Social Change, Elsevier, vol. 163(C).
    10. Peter Sjögårde & Per Ahlgren & Ludo Waltman, 2021. "Algorithmic labeling in hierarchical classifications of publications: Evaluation of bibliographic fields and term weighting approaches," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 72(7), pages 853-869, July.
    11. Li, Menghui & Yang, Liying & Zhang, Huina & Shen, Zhesi & Wu, Chensheng & Wu, Jinshan, 2017. "Do mathematicians, economists and biomedical scientists trace large topics more strongly than physicists?," Journal of Informetrics, Elsevier, vol. 11(2), pages 598-607.
    12. Ricardo Arencibia-Jorge & Rosa Lidia Vega-Almeida & José Luis Jiménez-Andrade & Humberto Carrillo-Calvet, 2022. "Evolutionary stages and multidisciplinary nature of artificial intelligence research," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(9), pages 5139-5158, September.
    13. Theresa Velden & Kevin W. Boyack & Jochen Gläser & Rob Koopman & Andrea Scharnhorst & Shenghui Wang, 2017. "Comparison of topic extraction approaches and their results," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 1169-1221, May.
    14. Keungoui Kim & Dieter F. Kogler & Sira Maliphol, 2024. "Identifying interdisciplinary emergence in the science of science: combination of network analysis and BERTopic," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-15, December.
    15. Chiara Carusi & Giuseppe Bianchi, 2020. "A look at interdisciplinarity using bipartite scholar/journal networks," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(2), pages 867-894, February.
    16. Fang Han & Christopher L. Magee, 2018. "Testing the science/technology relationship by analysis of patent citations of scientific papers after decomposition of both science and technology," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 767-796, August.
    17. Natalya Ivanova & Ekaterina Zolotova, 2023. "Landolt Indicator Values in Modern Research: A Review," Sustainability, MDPI, vol. 15(12), pages 1-22, June.
    18. Shuo Xu & Liyuan Hao & Xin An & Hongshen Pang & Ting Li, 2020. "Review on emerging research topics with key-route main path analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(1), pages 607-624, January.
    19. Carusi, Chiara & Bianchi, Giuseppe, 2019. "Scientific community detection via bipartite scholar/journal graph co-clustering," Journal of Informetrics, Elsevier, vol. 13(1), pages 354-386.
    20. Lovro Šubelj & Nees Jan van Eck & Ludo Waltman, 2016. "Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods," PLOS ONE, Public Library of Science, vol. 11(4), pages 1-23, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:111:y:2017:i:2:d:10.1007_s11192-017-2297-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.