IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v129y2024i10d10.1007_s11192-023-04923-y.html
   My bibliography  Save this article

Missing institutions in OpenAlex: possible reasons, implications, and solutions

Author

Listed:
  • Lin Zhang

    (Wuhan University
    Wuhan University
    KU Leuven)

  • Zhe Cao

    (Wuhan University
    Wuhan University)

  • Yuanyuan Shang

    (Chinese Academy of Social Sciences Evaluation Studies)

  • Gunnar Sivertsen

    (Nordic Institute for Studies in Innovation, Research and Education (NIFU))

  • Ying Huang

    (Wuhan University
    Wuhan University
    KU Leuven)

Abstract

The advent of open science calls for open data platforms with high data quality. As a fully open catalog of the global research system launched in January 2022, OpenAlex features two main advantages of easy data accessibility and broad data coverage, which has been widely used in quantitative science studies. Remarkably, OpenAlex is adopted as an important data source for Leiden university ranking. However, there is a severe data quality problem of missing institutions in journal article metadata in OpenAlex. This study investigates the possible reasons for the problem and its consequences and solutions by defining three types of institutional information—full institutional information (FII), partially missing institutional information (PMII) and completely missing institutional information (CMII). Our results show that the problem of missing institutions occurs in more than 60% of the journal articles in OpenAlex. The problem is particularly widespread in metadata from the early years and in the social sciences and humanities. Using sub-samples of the data, we further explore the possible reasons for the problem, the risk it might represent for distorted results, and possible solutions to the problem of missing institutions. The aim is to raise the importance of data quality improvements in open resources, and thus to support the responsible use of open resources in quantitative science studies and also in broader contexts.

Suggested Citation

  • Lin Zhang & Zhe Cao & Yuanyuan Shang & Gunnar Sivertsen & Ying Huang, 2024. "Missing institutions in OpenAlex: possible reasons, implications, and solutions," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(10), pages 5869-5891, October.
  • Handle: RePEc:spr:scient:v:129:y:2024:i:10:d:10.1007_s11192-023-04923-y
    DOI: 10.1007/s11192-023-04923-y
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-023-04923-y
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-023-04923-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Ludo Waltman & Clara Calero‐Medina & Joost Kosten & Ed C.M. Noyons & Robert J.W. Tijssen & Nees Jan van Eck & Thed N. van Leeuwen & Anthony F.J. van Raan & Martijn S. Visser & Paul Wouters, 2012. "The Leiden ranking 2011/2012: Data collection, indicators, and interpretation," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 63(12), pages 2419-2432, December.
    2. Xuli Tang & Xin Li & Feicheng Ma, 2022. "Internationalizing AI: evolution and impact of distance factors," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(1), pages 181-205, January.
    3. Weihua Li & Sam Zhang & Zhiming Zheng & Skyler J. Cranmer & Aaron Clauset, 2022. "Untangling the network effects of productivity and prominence among scientists," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    4. Jean-Francois Molinari & Alain Molinari, 2008. "A new methodology for ranking scientific institutions," Scientometrics, Springer;Akadémiai Kiadó, vol. 75(1), pages 163-174, April.
    5. Bedoor K. AlShebli & Talal Rahwan & Wei Lee Woon, 2018. "The preeminence of ethnic diversity in scientific collaboration," Nature Communications, Nature, vol. 9(1), pages 1-10, December.
    6. Pu Han & Jin Shi & Xiaoyan Li & Dongbo Wang & Si Shen & Xinning Su, 2014. "International collaboration in LIS: global trends and networks at the country and institution level," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(1), pages 53-72, January.
    7. Joanna Wolszczak-Derlacz & Aleksandra Parteka, 2011. "Efficiency of European public higher education institutions: a two-stage multicountry approach," Scientometrics, Springer;Akadémiai Kiadó, vol. 89(3), pages 887-917, December.
    8. Geoffrey Boulton, 2012. "Open your minds and share your results," Nature, Nature, vol. 486(7404), pages 441-441, June.
    9. Shuiqing Huang & Bo Yang & Sulan Yan & Ronald Rousseau, 2014. "Institution name disambiguation for research assessment," Scientometrics, Springer;Akadémiai Kiadó, vol. 99(3), pages 823-838, June.
    10. Ludo Waltman & Clara Calero-Medina & Joost Kosten & Ed C.M. Noyons & Robert J.W. Tijssen & Nees Jan Eck & Thed N. Leeuwen & Anthony F.J. Raan & Martijn S. Visser & Paul Wouters, 2012. "The Leiden ranking 2011/2012: Data collection, indicators, and interpretation," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(12), pages 2419-2432, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hoekman, Jarno & Rake, Bastian, 2024. "Geography of authorship: How geography shapes authorship attribution in big team science," Research Policy, Elsevier, vol. 53(2).
    2. Marco Cavallaro & Benedetto Lepori, 2021. "Institutional barriers to participation in EU framework programs: contrasting the Swiss and UK cases," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 1311-1328, February.
    3. Ying Guo & Xiantao Xiao, 2022. "Author-level altmetrics for the evaluation of Chinese scholars," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(2), pages 973-990, February.
    4. Jeffrey Demaine, 2022. "Fractionalization of research impact reveals global trends in university collaboration," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(5), pages 2235-2247, May.
    5. Cinzia Daraio & Simone Di Leo & Loet Leydesdorff, 2022. "Using the Leiden Rankings as a Heuristics: Evidence from Italian universities in the European landscape," LEM Papers Series 2022/08, Laboratory of Economics and Management (LEM), Sant'Anna School of Advanced Studies, Pisa, Italy.
    6. Leporia, Benedetto & Geuna, Aldo & Mira, Antonietta, 2018. "Scientific Output of US and European Universities Scales Super-linearly with Resources," Department of Economics and Statistics Cognetti de Martiis LEI & BRICK - Laboratory of Economics of Innovation "Franco Momigliano", Bureau of Research in Innovation, Complexity and Knowledge, Collegio 201806, University of Turin.
    7. Shahryar Rahnamayan & Sedigheh Mahdavi & Kalyanmoy Deb & Azam Asilian Bidgoli, 2020. "Ranking Multi-Metric Scientific Achievements Using a Concept of Pareto Optimality," Mathematics, MDPI, vol. 8(6), pages 1-46, June.
    8. Fernando García & Francisco Guijarro & Javier Oliver, 2021. "A Multicriteria Goal Programming Model for Ranking Universities," Mathematics, MDPI, vol. 9(5), pages 1-17, February.
    9. Lavinia Mustea, 2022. "An Overview of Public Sector Performance in Europe," Ovidius University Annals, Economic Sciences Series, Ovidius University of Constantza, Faculty of Economic Sciences, vol. 0(1), pages 339-345, September.
    10. Lutz Bornmann, 2020. "Bibliometrics-based decision tree (BBDT) for deciding whether two universities in the Leiden ranking differ substantially in their performance," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(2), pages 1255-1258, February.
    11. Wenceslao Arroyo‐Machado & Adrián A. Díaz‐Faes & Enrique Herrera‐Viedma & Rodrigo Costas, 2024. "From academic to media capital: To what extent does the scientific reputation of universities translate into Wikipedia attention?," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 75(4), pages 423-437, April.
    12. Lutz Bornmann & Werner Marx & Andreas Barth, 2013. "The Normalization of Citation Counts Based on Classification Systems," Publications, MDPI, vol. 1(2), pages 1-9, August.
    13. Fabio S. V. Silva & Peter A. Schulz & Everard C. M. Noyons, 2019. "Co-authorship networks and research impact in large research facilities: benchmarking internal reports and bibliometric databases," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(1), pages 93-108, January.
    14. Klaus Wohlrabe & Sabine Gralka & Lutz Bornmann, 2019. "Zur Effizienz deutscher Universitäten und deren Entwicklung zwischen 2004 und 2015," ifo Schnelldienst, ifo Institute - Leibniz Institute for Economic Research at the University of Munich, vol. 72(21), pages 15-21, November.
    15. Loet Leydesdorff & Lutz Bornmann & Jonathan Adams, 2019. "The integrated impact indicator revisited (I3*): a non-parametric alternative to the journal impact factor," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(3), pages 1669-1694, June.
    16. Fredrik Niclas Piro, 2019. "The R&D composition of European countries: concentrated versus dispersed profiles," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(2), pages 1095-1119, May.
    17. Tomaz Bartol & Gordana Budimir & Primoz Juznic & Karmen Stopar, 2016. "Mapping and classification of agriculture in Web of Science: other subject categories and research fields may benefit," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(2), pages 979-996, November.
    18. Mohammad Ashraful Mobin & Masnun Mahi & M. Kabir Hassan & Marzia Habib & Shabiha Akter & Tahmina Hassan, 2023. "An analysis of COVID-19 and WHO global research roadmap: knowledge mapping and future research agenda," Eurasian Economic Review, Springer;Eurasia Business and Economics Society, vol. 13(1), pages 35-56, March.
    19. Zhesi Shen & Liying Yang & Zengru Di & Jinshan Wu, 2019. "Large enough sample size to rank two groups of data reliably according to their means," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(2), pages 653-671, February.
    20. Massucci, Francesco Alessandro & Docampo, Domingo, 2019. "Measuring the academic reputation through citation networks via PageRank," Journal of Informetrics, Elsevier, vol. 13(1), pages 185-201.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:129:y:2024:i:10:d:10.1007_s11192-023-04923-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.