IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v129y2024i7d10.1007_s11192-024-05076-2.html
   My bibliography  Save this article

An open-source tool for merging data from multiple citation databases

Author

Listed:
  • Dušan Nikolić

    (University of Novi Sad)

  • Dragan Ivanović

    (University of Novi Sad)

  • Lidija Ivanović

    (University of Novi Sad)

Abstract

A bibliometric analysis based on records from a single citation database may be limited in its comprehensiveness and, therefore, in the reliability of its results. The process of combining and deduplicating records from multiple citation index databases for the purpose of a bibliometric analysis is often manual and requires significant effort, especially for larger amounts of data. This paper presents an open-source tool for automatically preprocessing and deduplicating records based on similarity and user-configurable strategies. To validate the capabilities of the tool, the authors of this paper first manually deduplicated records from Scopus and Web of Science on a use-case analysis for 11,307 records. The performance of the tool was then evaluated against the manually deduplicated results. From the results of the best performing similarity configuration on a deduplication use case, the tool minimizes the time researchers would spend on data wrangling for combining Scopus and WoS up to 99% precision and 98% F-measure. The tool developed has practical implications for bibliometric studies. For instance, we conducted a bibliometric analysis of the most productive researchers at a university using a single citation database, as well as merged data from multiple citation databases. The study used the VOSviewer tool and showed that utilizing merged data may produce different outcomes compared to those obtained from a study based on a single citation database.

Suggested Citation

  • Dušan Nikolić & Dragan Ivanović & Lidija Ivanović, 2024. "An open-source tool for merging data from multiple citation databases," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(7), pages 4573-4595, July.
  • Handle: RePEc:spr:scient:v:129:y:2024:i:7:d:10.1007_s11192-024-05076-2
    DOI: 10.1007/s11192-024-05076-2
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-024-05076-2
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-024-05076-2?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Donthu, Naveen & Kumar, Satish & Mukherjee, Debmalya & Pandey, Nitesh & Lim, Weng Marc, 2021. "How to conduct a bibliometric analysis: An overview and guidelines," Journal of Business Research, Elsevier, vol. 133(C), pages 285-296.
    2. Aria, Massimo & Cuccurullo, Corrado, 2017. "bibliometrix: An R-tool for comprehensive science mapping analysis," Journal of Informetrics, Elsevier, vol. 11(4), pages 959-975.
    3. Shir Aviv-Reuven & Ariel Rosenfeld, 2023. "A logical set theory approach to journal subject classification analysis: intra-system irregularities and inter-system discrepancies in Web of Science and Scopus," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(1), pages 157-175, January.
    4. Philippe Mongeon & Adèle Paul-Hus, 2016. "The journal coverage of Web of Science and Scopus: a comparative analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 106(1), pages 213-228, January.
    5. Raminta Pranckutė, 2021. "Web of Science (WoS) and Scopus: The Titans of Bibliographic Information in Today’s Academic World," Publications, MDPI, vol. 9(1), pages 1-59, March.
    6. M.J. Cobo & A.G. López‐Herrera & E. Herrera‐Viedma & F. Herrera, 2012. "SciMAT: A new science mapping analysis software tool," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 63(8), pages 1609-1630, August.
    7. van Eck, Nees Jan & Waltman, Ludo, 2014. "CitNetExplorer: A new software tool for analyzing and visualizing citation networks," Journal of Informetrics, Elsevier, vol. 8(4), pages 802-823.
    8. Juan Ruiz-Rosero & Gustavo Ramirez-Gonzalez & Jesus Viveros-Delgado, 2019. "Software survey: ScientoPy, a scientometric tool for topics trend analysis in scientific publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(2), pages 1165-1188, November.
    9. Chaomei Chen, 2006. "CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 57(3), pages 359-377, February.
    10. Andrea Caputo & Mariya Kargina, 2022. "A user-friendly method to merge Scopus and Web of Science data during bibliometric analysis," Journal of Marketing Analytics, Palgrave Macmillan, vol. 10(1), pages 82-88, March.
    11. Mehmet Ali Abdulhayoglu & Bart Thijs, 2018. "Use of locality sensitive hashing (LSH) algorithm to match Web of Science and Scopus," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 1229-1245, August.
    12. Nees Jan Eck & Ludo Waltman, 2010. "Software survey: VOSviewer, a computer program for bibliometric mapping," Scientometrics, Springer;Akadémiai Kiadó, vol. 84(2), pages 523-538, August.
    13. Miguel-Angel Vera-Baceta & Michael Thelwall & Kayvan Kousha, 2019. "Web of Science and Scopus language coverage," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(3), pages 1803-1813, December.
    14. Vivek Kumar Singh & Prashasti Singh & Mousumi Karmakar & Jacqueline Leta & Philipp Mayr, 2021. "The journal coverage of Web of Science, Scopus and Dimensions: A comparative analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(6), pages 5113-5142, June.
    15. Alberto Martín-Martín & Mike Thelwall & Enrique Orduna-Malea & Emilio Delgado López-Cózar, 2021. "Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations’ COCI: a multidisciplinary comparison of coverage via citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(1), pages 871-906, January.
    16. Saïd Echchakoui, 2020. "Why and how to merge Scopus and Web of Science during bibliometric analysis: the case of sales force literature from 1912 to 2019," Journal of Marketing Analytics, Palgrave Macmillan, vol. 8(3), pages 165-184, September.
    17. M.J. Cobo & A.G. López-Herrera & E. Herrera-Viedma & F. Herrera, 2012. "SciMAT: A new science mapping analysis software tool," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(8), pages 1609-1630, August.
    18. Junwen Zhu & Weishu Liu, 2020. "A tale of two databases: the use of Web of Science and Scopus in academic papers," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(1), pages 321-335, April.
    19. Miika Kumpulainen & Marko Seppänen, 2022. "Combining Web of Science and Scopus datasets in citation-based literature study," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(10), pages 5613-5631, October.
    20. Anne-Wil Harzing & Satu Alakangas, 2016. "Google Scholar, Scopus and the Web of Science: a longitudinal and cross-disciplinary comparison," Scientometrics, Springer;Akadémiai Kiadó, vol. 106(2), pages 787-804, February.
    21. Alberto Martín-Martín & Mike Thelwall & Enrique Orduna-Malea & Emilio Delgado López-Cózar, 2021. "Correction to: Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations’ COCI: a multidisciplinary comparison of coverage via citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(1), pages 907-908, January.
    22. Gagolewski, Marek, 2011. "Bibliometric impact assessment with R and the CITAN package," Journal of Informetrics, Elsevier, vol. 5(4), pages 678-692.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Michael Gusenbauer, 2022. "Search where you will find most: Comparing the disciplinary coverage of 56 bibliographic databases," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(5), pages 2683-2745, May.
    2. Kristina Galjanić & Ivan Marović & Nikša Jajac, 2022. "Decision Support Systems for Managing Construction Projects: A Scientific Evolution Analysis," Sustainability, MDPI, vol. 14(9), pages 1-23, April.
    3. Qin, Yong & Xu, Zeshui & Wang, Xinxin & Škare, Marinko, 2022. "Green energy adoption and its determinants: A bibliometric analysis," Renewable and Sustainable Energy Reviews, Elsevier, vol. 153(C).
    4. Adriana Ana Maria Davidescu & Margareta-Stela Florescu & Liviu Cosmin Mosora & Mihaela Hrisanta Mosora & Eduard Mihai Manta, 2022. "A Bibliometric Analysis of Research Publications of the Bucharest University of Economic Studies in Time of Pandemics: Implications for Teachers’ Professional Publishing Activity," IJERPH, MDPI, vol. 19(14), pages 1-36, July.
    5. Fernando Morante-Carballo & Néstor Montalván-Burbano & Maribel Aguilar-Aguilar & Paúl Carrión-Mero, 2022. "A Bibliometric Analysis of the Scientific Research on Artisanal and Small-Scale Mining," IJERPH, MDPI, vol. 19(13), pages 1-29, July.
    6. Gabriel Alves Vieira & Jacqueline Leta, 2024. "biblioverlap: an R package for document matching across bibliographic datasets," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(7), pages 4513-4527, July.
    7. Faisal Bin Sulaiman, 2023. "Compact City: What Is the Extent of Our Exploration for Its Meanings? A Systematic Review," Sustainability, MDPI, vol. 15(13), pages 1-22, June.
    8. Andrea Caputo & Mariya Kargina, 2022. "A user-friendly method to merge Scopus and Web of Science data during bibliometric analysis," Journal of Marketing Analytics, Palgrave Macmillan, vol. 10(1), pages 82-88, March.
    9. Carlo Dindorf & Eva Bartaguiz & Freya Gassmann & Michael Fröhlich, 2022. "Conceptual Structure and Current Trends in Artificial Intelligence, Machine Learning, and Deep Learning Research in Sports: A Bibliometric Review," IJERPH, MDPI, vol. 20(1), pages 1-23, December.
    10. Shir Aviv-Reuven & Ariel Rosenfeld, 2023. "A logical set theory approach to journal subject classification analysis: intra-system irregularities and inter-system discrepancies in Web of Science and Scopus," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(1), pages 157-175, January.
    11. Raminta Pranckutė, 2021. "Web of Science (WoS) and Scopus: The Titans of Bibliographic Information in Today’s Academic World," Publications, MDPI, vol. 9(1), pages 1-59, March.
    12. Congying Fang & Riken Homma & Tianfu Qiu, 2024. "A Bibliometrics Analysis Related to the Built Environment and Walking," Sustainability, MDPI, vol. 16(7), pages 1-17, March.
    13. Gaviria-Marin, Magaly & Merigó, José M. & Baier-Fuentes, Hugo, 2019. "Knowledge management: A global examination based on bibliometric analysis," Technological Forecasting and Social Change, Elsevier, vol. 140(C), pages 194-220.
    14. Zhichao Wang & Valentin Zelenyuk, 2021. "Performance Analysis of Hospitals in Australia and its Peers: A Systematic Review," CEPA Working Papers Series WP012021, School of Economics, University of Queensland, Australia.
    15. Zhichao Wang & Bao Hoang Nguyen & Valentin Zelenyuk, 2024. "Performance analysis of hospitals in Australia and its peers: a systematic and critical review," Journal of Productivity Analysis, Springer, vol. 62(2), pages 139-173, October.
    16. Mehdi Toloo & Rouhollah Khodabandelou & Amar Oukil, 2022. "A Comprehensive Bibliometric Analysis of Fractional Programming (1965–2020)," Mathematics, MDPI, vol. 10(11), pages 1-21, May.
    17. Zamani, Mehdi & Yalcin, Haydar & Naeini, Ali Bonyadi & Zeba, Gordana & Daim, Tugrul U, 2022. "Developing metrics for emerging technologies: identification and assessment," Technological Forecasting and Social Change, Elsevier, vol. 176(C).
    18. Pahrudin Pahrudin & Li-Wei Liu & Shao-Yu Li, 2022. "What Is the Role of Tourism Management and Marketing toward Sustainable Tourism? A Bibliometric Analysis Approach," Sustainability, MDPI, vol. 14(7), pages 1-18, April.
    19. Moaaz Kabil & Mohamed Abouelseoud & Faisal Alsubaie & Heba Mostafa Hassan & Imre Varga & Katalin Csobán & Lóránt Dénes Dávid, 2022. "Evolutionary Relationship between Tourism and Real Estate: Evidence and Research Trends," Sustainability, MDPI, vol. 14(16), pages 1-19, August.
    20. Andrzej Lis & Agata Sudolska & Mateusz Tomanek, 2020. "Mapping Research on Sustainable Supply-Chain Management," Sustainability, MDPI, vol. 12(10), pages 1-26, May.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:129:y:2024:i:7:d:10.1007_s11192-024-05076-2. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.