IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v123y2020i1d10.1007_s11192-020-03386-9.html
   My bibliography  Save this article

Percentile and stochastic-based approach to the comparison of the number of citations of articles indexed in different bibliographic databases

Author

Listed:
  • Gerson Pech

    (Rio de Janeiro State University
    University of Porto)

  • Catarina Delgado

    (University of Porto
    University of Porto)

Abstract

Recent studies have shown that the coverage of Scopus and Web of Science (WoS) databases differs substantially. Consequently, the citation counts of a paper are different depending on the database used, making it difficult to apply both together. To address this problem, this paper aims to examine whether the percentile- and stochastic-based approach is effective for converting citation counts between two databases while guaranteeing its time-normalization. For this analysis, we collected a dataset of 326,345 papers, published in 1987–2017 in the top 10% source titles of the following fields: Industrial and Manufacturing Engineering, Aquatic Science, Social Psychology and Archaeology. First, we applied the linear regression model to the citation percentiles of indexed papers in both databases. Secondly, we used the predicted results of this linear dependence, combined with the Monte Carlo simulations, to obtain the probability density function of a percentile from papers in the database in which they are missing. The results indicate that, with the method proposed in this paper, it is possible to convert the citation counts of articles between Scopus and WoS. In addition, it also predicts the citation impact of a missing paper on one of those databases, based on the citation impact on the other database. Tests on subsamples, using Lin’s concordance coefficient, suggest substantial agreement between estimated and real citation values. This allows the combined use of the citation counts of two databases, improving the coverage and accuracy of both bibliometric studies and bibliometric indicators.

Suggested Citation

  • Gerson Pech & Catarina Delgado, 2020. "Percentile and stochastic-based approach to the comparison of the number of citations of articles indexed in different bibliographic databases," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(1), pages 223-252, April.
  • Handle: RePEc:spr:scient:v:123:y:2020:i:1:d:10.1007_s11192-020-03386-9
    DOI: 10.1007/s11192-020-03386-9
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-020-03386-9
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-020-03386-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Abramo, Giovanni & D’Angelo, Ciriaco Andrea & Felici, Giovanni, 2019. "Predicting publication long-term impact through a combination of early citations and journal impact factor," Journal of Informetrics, Elsevier, vol. 13(1), pages 32-49.
    2. Fairclough, Ruth & Thelwall, Mike, 2015. "More precise methods for national research citation impact comparisons," Journal of Informetrics, Elsevier, vol. 9(4), pages 895-906.
    3. Filippo Radicchi & Claudio Castellano, 2012. "A Reverse Engineering Approach to the Suppression of Citation Biases Reveals Universal Properties of Citation Distributions," PLOS ONE, Public Library of Science, vol. 7(3), pages 1-9, March.
    4. Rodríguez-Navarro, Alonso & Brito, Ricardo, 2018. "Double rank analysis for research assessment," Journal of Informetrics, Elsevier, vol. 12(1), pages 31-41.
    5. Abramo, Giovanni & D’Angelo, Ciriaco Andrea & Soldatenkova, Anastasiia, 2017. "An investigation on the skewness patterns and fractal nature of research productivity distributions at field and discipline level," Journal of Informetrics, Elsevier, vol. 11(1), pages 324-335.
    6. Brooke A. Saladin & Guangzhi Shang & Timothy D. Fry & Joan M. Donohue, 2015. "Research Constituents and Authorship Patterns in the Production and Operations Management Journal," Production and Operations Management, Production and Operations Management Society, vol. 24(4), pages 523-534, April.
    7. Philippe Mongeon & Adèle Paul-Hus, 2016. "The journal coverage of Web of Science and Scopus: a comparative analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 106(1), pages 213-228, January.
    8. Alberto Martín-Martín & Enrique Orduna-Malea & Emilio Delgado López-Cózar, 2018. "Coverage of highly-cited documents in Google Scholar, Web of Science, and Scopus: a multidisciplinary comparison," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(3), pages 2175-2188, September.
    9. Bradford Demarest & Guo Freeman & Cassidy R. Sugimoto, 2014. "The reviewer in the mirror: examining gendered and ethnicized notions of reciprocity in peer review," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(1), pages 717-735, October.
    10. Ben Jann, 2016. "Assessing inequality using percentile shares," Stata Journal, StataCorp LP, vol. 16(2), pages 264-300, June.
    11. Stegehuis, Clara & Litvak, Nelly & Waltman, Ludo, 2015. "Predicting the long-term citation impact of recent publications," Journal of Informetrics, Elsevier, vol. 9(3), pages 642-657.
    12. Wang, Qi & Waltman, Ludo, 2016. "Large-scale analysis of the accuracy of the journal classification systems of Web of Science and Scopus," Journal of Informetrics, Elsevier, vol. 10(2), pages 347-364.
    13. Bornmann, Lutz & Leydesdorff, Loet & Wang, Jian, 2013. "Which percentile-based approach should be preferred for calculating normalized citation impact values? An empirical comparison of five approaches including a newly developed citation-rank approach (P1," Journal of Informetrics, Elsevier, vol. 7(4), pages 933-944.
    14. Laengle, Sigifredo & Merigó, José M. & Miranda, Jaime & Słowiński, Roman & Bomze, Immanuel & Borgonovo, Emanuele & Dyson, Robert G. & Oliveira, José Fernando & Teunter, Ruud, 2017. "Forty years of the European Journal of Operational Research: A bibliometric overview," European Journal of Operational Research, Elsevier, vol. 262(3), pages 803-816.
    15. Bibi Alajmi & Talal Alhaji, 2018. "Mapping the Field of Knowledge Management: Bibliometric and Content Analysis of Journal of Information & Knowledge Management for the Period from 2002–2016," Journal of Information & Knowledge Management (JIKM), World Scientific Publishing Co. Pte. Ltd., vol. 17(03), pages 1-16, September.
    16. James B. Davies & Nicole M. Fortin & Thomas Lemieux, 2017. "Wealth inequality: Theory, measurement and decomposition," Canadian Journal of Economics, Canadian Economics Association, vol. 50(5), pages 1224-1261, December.
    17. Yasuhiro Yamashita & Yoshiko Okubo, 2006. "Patterns of scientific collaboration between Japan and France: Inter-sectoral analysis using Probabilistic Partnership Index (PPI)," Scientometrics, Springer;Akadémiai Kiadó, vol. 68(2), pages 303-324, August.
    18. Jan Schulz, 2016. "Using Monte Carlo simulations to assess the impact of author name disambiguation quality on different bibliometric analyses," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(3), pages 1283-1298, June.
    19. Antonio J. Gómez-Núñez & Benjamín Vargas-Quesada & Félix Moya-Anegón & Wolfgang Glänzel, 2011. "Improving SCImago Journal & Country Rank (SJR) subject classification through reference analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 89(3), pages 741-758, December.
    20. Valderrama-Zurián, Juan-Carlos & Aguilar-Moya, Remedios & Melero-Fuentes, David & Aleixandre-Benavent, Rafael, 2015. "A systematic analysis of duplicate records in Scopus," Journal of Informetrics, Elsevier, vol. 9(3), pages 570-576.
    21. Lutz Bornmann, 2013. "How to analyze percentile citation impact data meaningfully in bibliometrics: The statistical analysis of distributions, percentile rank classes, and top-cited papers," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 64(3), pages 587-595, March.
    22. Ludo Waltman & Michael Schreiber, 2013. "On the calculation of percentile-based bibliometric indicators," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 64(2), pages 372-379, February.
    23. Petersen, Alexander M. & Pan, Raj K. & Pammolli, Fabio & Fortunato, Santo, 2019. "Methods to account for citation inflation in research evaluation," Research Policy, Elsevier, vol. 48(7), pages 1855-1865.
    24. Lutz Bornmann & Klaus Wohlrabe, 2019. "Normalisation of citation impact in economics," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(2), pages 841-884, August.
    25. Lutz Bornmann & Loet Leydesdorff, 2018. "Count highly-cited papers instead of papers with h citations: use normalized citation counts and compare “like with like”!," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(2), pages 1119-1123, May.
    26. Moed, Henk F. & Bar-Ilan, Judit & Halevi, Gali, 2016. "A new methodology for comparing Google Scholar and Scopus," Journal of Informetrics, Elsevier, vol. 10(2), pages 533-551.
    27. Vasilios D. Kosteas, 2018. "Predicting long-run citation counts for articles in top economics journals," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(3), pages 1395-1412, June.
    28. Franceschini, Fiorenzo & Maisano, Domenico & Mastrogiacomo, Luca, 2016. "Empirical analysis and classification of database errors in Scopus and Web of Science," Journal of Informetrics, Elsevier, vol. 10(4), pages 933-953.
    29. Lorna Wildgaard & Jesper W. Schneider & Birger Larsen, 2014. "A review of the characteristics of 108 author-level bibliometric indicators," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(1), pages 125-158, October.
    30. Bornmann, Lutz & Leydesdorff, Loet, 2017. "Skewness of citation impact data and covariates of citation distributions: A large-scale empirical analysis based on Web of Science data," Journal of Informetrics, Elsevier, vol. 11(1), pages 164-175.
    31. Rousseau, Ronald, 2007. "The influence of missing publications on the Hirsch index," Journal of Informetrics, Elsevier, vol. 1(1), pages 2-7.
    32. Aswini Kumar Mishra, 2018. "Household Income Inequality and Income Mobility: Implications Towards Equalizing Longer-Term Incomes in India," International Economic Journal, Taylor & Francis Journals, vol. 32(2), pages 271-290, April.
    33. Pan, Raj K. & Petersen, Alexander M. & Pammolli, Fabio & Fortunato, Santo, 2018. "The memory of science: Inflation, myopia, and the knowledge network," Journal of Informetrics, Elsevier, vol. 12(3), pages 656-678.
    34. Zhihui Zhang & Ying Cheng & Nian Cai Liu, 2014. "Comparison of the effect of mean-based method and z-score for field normalization of citations at the level of Web of Science subject categories," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(3), pages 1679-1693, December.
    35. Mike Thelwall, 2019. "The influence of highly cited papers on field normalised indicators," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(2), pages 519-537, February.
    36. Thelwall, Mike, 2016. "The precision of the arithmetic mean, geometric mean and percentiles for citation data: An experimental simulation modelling approach," Journal of Informetrics, Elsevier, vol. 10(1), pages 110-123.
    37. Guangzhi Shang & Brooke Saladin & Tim Fry & Joan Donohue, 2015. "Twenty-six years of operations management research (1985–2010): authorship patterns and research constituents in eleven top rated journals," International Journal of Production Research, Taylor & Francis Journals, vol. 53(20), pages 6161-6197, October.
    38. Ludo Waltman & Michael Schreiber, 2013. "On the calculation of percentile‐based bibliometric indicators," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 64(2), pages 372-379, February.
    39. Jian Wang, 2013. "Citation time window choice for research impact evaluation," Scientometrics, Springer;Akadémiai Kiadó, vol. 94(3), pages 851-872, March.
    40. Lutz Bornmann & Werner Marx, 2014. "How to evaluate individual researchers working in the natural and life sciences meaningfully? A proposal of methods based on percentiles of citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(1), pages 487-509, January.
    41. Rodriguez, Marko A. & Pepe, Alberto, 2008. "On the relationship between the structural and socioacademic communities of a coauthorship network," Journal of Informetrics, Elsevier, vol. 2(3), pages 195-201.
    42. Anne-Wil Harzing & Satu Alakangas, 2016. "Google Scholar, Scopus and the Web of Science: a longitudinal and cross-disciplinary comparison," Scientometrics, Springer;Akadémiai Kiadó, vol. 106(2), pages 787-804, February.
    43. Martín-Martín, Alberto & Orduna-Malea, Enrique & Thelwall, Mike & Delgado López-Cózar, Emilio, 2018. "Google Scholar, Web of Science, and Scopus: A systematic comparison of citations in 252 subject categories," Journal of Informetrics, Elsevier, vol. 12(4), pages 1160-1177.
    44. Brito, Ricardo & Rodríguez-Navarro, Alonso, 2018. "Research assessment by percentile-based double rank analysis," Journal of Informetrics, Elsevier, vol. 12(1), pages 315-329.
    45. Hong Zhu & Qi Zhu, 2016. "Mergers and acquisitions by Chinese firms: A review and comparison with other mergers and acquisitions research in the leading journals," Asia Pacific Journal of Management, Springer, vol. 33(4), pages 1107-1149, December.
    46. Pesta, Bryan J., 2018. "Bibliometric analysis across eight years 2008–2015 of Intelligence articles: An updating of Wicherts (2009)," Intelligence, Elsevier, vol. 67(C), pages 26-32.
    47. Yanan Wang & An Zeng & Ying Fan & Zengru Di, 2019. "Ranking scientific publications considering the aging characteristics of citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(1), pages 155-166, July.
    48. Waltman, Ludo, 2016. "A review of the literature on citation impact indicators," Journal of Informetrics, Elsevier, vol. 10(2), pages 365-391.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Gerson Pech & Catarina Delgado & Silvio Paolo Sorella, 2022. "Classifying papers into subfields using Abstracts, Titles, Keywords and KeyWords Plus through pattern detection and optimization procedures: An application in Physics," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 73(11), pages 1513-1528, November.
    2. Gerson Pech & Catarina Delgado, 2020. "Assessing the publication impact using citation data from both Scopus and WoS databases: an approach validated in 15 research fields," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(2), pages 909-924, November.
    3. Pech, Gerson & Delgado, Catarina, 2021. "Screening the most highly cited papers in longitudinal bibliometric studies and systematic literature reviews of a research field or journal: Widespread used metrics vs a percentile citation-based app," Journal of Informetrics, Elsevier, vol. 15(3).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Gerson Pech & Catarina Delgado, 2020. "Assessing the publication impact using citation data from both Scopus and WoS databases: an approach validated in 15 research fields," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(2), pages 909-924, November.
    2. Waltman, Ludo, 2016. "A review of the literature on citation impact indicators," Journal of Informetrics, Elsevier, vol. 10(2), pages 365-391.
    3. Pech, Gerson & Delgado, Catarina, 2021. "Screening the most highly cited papers in longitudinal bibliometric studies and systematic literature reviews of a research field or journal: Widespread used metrics vs a percentile citation-based app," Journal of Informetrics, Elsevier, vol. 15(3).
    4. Raminta Pranckutė, 2021. "Web of Science (WoS) and Scopus: The Titans of Bibliographic Information in Today’s Academic World," Publications, MDPI, vol. 9(1), pages 1-59, March.
    5. Michael Gusenbauer, 2022. "Search where you will find most: Comparing the disciplinary coverage of 56 bibliographic databases," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(5), pages 2683-2745, May.
    6. Alonso Rodríguez-Navarro & Ricardo Brito, 2019. "Probability and expected frequency of breakthroughs: basis and use of a robust method of research assessment," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(1), pages 213-235, April.
    7. Lutz Bornmann & Klaus Wohlrabe, 2019. "Normalisation of citation impact in economics," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(2), pages 841-884, August.
    8. Shirley Ainsworth & Jane M. Russell, 2018. "Has hosting on science direct improved the visibility of Latin American scholarly journals? A preliminary analysis of data quality," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(3), pages 1463-1484, June.
    9. Mike Thelwall, 2019. "The influence of highly cited papers on field normalised indicators," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(2), pages 519-537, February.
    10. Wang, Xing & Zhang, Zhihui, 2020. "Improving the reliability of short-term citation impact indicators by taking into account the correlation between short- and long-term citation impact," Journal of Informetrics, Elsevier, vol. 14(2).
    11. Lutz Bornmann & Richard Williams, 2020. "An evaluation of percentile measures of citation impact, and a proposal for making them better," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(2), pages 1457-1478, August.
    12. Hamdi A. Al-Jamimi & Galal M. BinMakhashen & Lutz Bornmann & Yousif Ahmed Al Wajih, 2023. "Saudi Arabia research: academic insights and trend analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(10), pages 5595-5627, October.
    13. Gordana Budimir & Sophia Rahimeh & Sameh Tamimi & Primož Južnič, 2021. "Comparison of self-citation patterns in WoS and Scopus databases based on national scientific production in Slovenia (1996–2020)," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(3), pages 2249-2267, March.
    14. Bornmann, Lutz & Ganser, Christian & Tekles, Alexander, 2022. "Simulation of the h index use at university departments within the bibliometrics-based heuristics framework: Can the indicator be used to compare individual researchers?," Journal of Informetrics, Elsevier, vol. 16(1).
    15. Brito, Ricardo & Rodríguez-Navarro, Alonso, 2018. "Research assessment by percentile-based double rank analysis," Journal of Informetrics, Elsevier, vol. 12(1), pages 315-329.
    16. Gabriel-Alexandru Vȋiu & Mihai Păunescu, 2021. "The lack of meaningful boundary differences between journal impact factor quartiles undermines their independent use in research evaluation," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 1495-1525, February.
    17. Zhenyu Gou & Fan Meng & Zaida Chinchilla-Rodríguez & Yi Bu, 2022. "Encoding the citation life-cycle: the operationalization of a literature-aging conceptual model," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(8), pages 5027-5052, August.
    18. Franceschini, Fiorenzo & Maisano, Domenico & Mastrogiacomo, Luca, 2016. "Empirical analysis and classification of database errors in Scopus and Web of Science," Journal of Informetrics, Elsevier, vol. 10(4), pages 933-953.
    19. Yves Fassin, 2020. "The HF-rating as a universal complement to the h-index," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(2), pages 965-990, November.
    20. Vivek Kumar Singh & Prashasti Singh & Mousumi Karmakar & Jacqueline Leta & Philipp Mayr, 2021. "The journal coverage of Web of Science, Scopus and Dimensions: A comparative analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(6), pages 5113-5142, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:123:y:2020:i:1:d:10.1007_s11192-020-03386-9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.