IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v127y2022i6d10.1007_s11192-022-04367-w.html
   My bibliography  Save this article

Identifying and correcting invalid citations due to DOI errors in Crossref data

Author

Listed:
  • Alessia Cioffi

    (University of Bologna)

  • Sara Coppini

    (University of Bologna)

  • Arcangelo Massari

    (University of Bologna)

  • Arianna Moretti

    (University of Bologna)

  • Silvio Peroni

    (University of Bologna
    University of Bologna)

  • Cristian Santini

    (FIZ Karlsruhe – Leibniz Institute for Information Infrastructure
    Karlsruhe Institute of Technology, Institute AIFB)

  • Nooshin Shahidzadeh Asadi

    (University of Antwerp)

Abstract

This work aims to identify classes of DOI mistakes by analysing the open bibliographic metadata available in Crossref, highlighting which publishers were responsible for such mistakes and how many of these incorrect DOIs could be corrected through automatic processes. By using a list of invalid cited DOIs gathered by OpenCitations while processing the OpenCitations Index of Crossref open DOI-to-DOI citations (COCI) in the past two years, we retrieved the citations in the January 2021 Crossref dump to such invalid DOIs. We processed these citations by keeping track of their validity and the publishers responsible for uploading the related citation data in Crossref. Finally, we identified patterns of factual errors in the invalid DOIs and the regular expressions needed to catch and correct them. The outcomes of this research show that only a few publishers were responsible for and/or affected by the majority of invalid citations. We extended the taxonomy of DOI name errors proposed in past studies and defined more elaborated regular expressions that can clean a higher number of mistakes in invalid DOIs than prior approaches. The data gathered in our study can enable investigating possible reasons for DOI mistakes from a qualitative point of view, helping publishers identify the problems underlying their production of invalid citation data. Also, the DOI cleaning mechanism we present could be integrated into the existing process (e.g. in COCI) to add citations by automatically correcting a wrong DOI. This study was run strictly following Open Science principles, and, as such, our research outcomes are fully reproducible.

Suggested Citation

  • Alessia Cioffi & Sara Coppini & Arcangelo Massari & Arianna Moretti & Silvio Peroni & Cristian Santini & Nooshin Shahidzadeh Asadi, 2022. "Identifying and correcting invalid citations due to DOI errors in Crossref data," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(6), pages 3593-3612, June.
  • Handle: RePEc:spr:scient:v:127:y:2022:i:6:d:10.1007_s11192-022-04367-w
    DOI: 10.1007/s11192-022-04367-w
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-022-04367-w
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-022-04367-w?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Junwen Zhu & Guangyuan Hu & Weishu Liu, 2019. "DOI errors and possible solutions for Web of Science," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(2), pages 709-718, February.
    2. Valderrama-Zurián, Juan-Carlos & Aguilar-Moya, Remedios & Melero-Fuentes, David & Aleixandre-Benavent, Rafael, 2015. "A systematic analysis of duplicate records in Scopus," Journal of Informetrics, Elsevier, vol. 9(3), pages 570-576.
    3. Ivan Heibi & Silvio Peroni & David Shotton, 2019. "Software review: COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(2), pages 1213-1228, November.
    4. Fiorenzo Franceschini & Domenico Maisano & Luca Mastrogiacomo, 2015. "Errors in DOI indexing by bibliometric databases," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(3), pages 2181-2186, March.
    5. Gorraiz, Juan & Melero-Fuentes, David & Gumpenberger, Christian & Valderrama-Zurián, Juan-Carlos, 2016. "Availability of digital object identifiers (DOIs) in Web of Science and Scopus," Journal of Informetrics, Elsevier, vol. 10(1), pages 98-109.
    6. Shuo Xu & Liyuan Hao & Xin An & Dongsheng Zhai & Hongshen Pang, 2019. "Types of DOI errors of cited references in Web of Science with a cleaning method," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(3), pages 1427-1437, September.
    7. Franceschini, Fiorenzo & Maisano, Domenico & Mastrogiacomo, Luca, 2016. "The museum of errors/horrors in Scopus," Journal of Informetrics, Elsevier, vol. 10(1), pages 174-182.
    8. Christophe Boudry & Ghislaine Chartron, 2017. "Availability of digital object identifiers in publications archived by PubMed," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(3), pages 1453-1469, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Wentao Cui & Meng Xiao & Ludi Wang & Xuezhi Wang & Yi Du & Yuanchun Zhou, 2024. "Automated taxonomy alignment via large language models: bridging the gap between knowledge domains," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(9), pages 5287-5312, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Gabriel Alves Vieira & Jacqueline Leta, 2024. "biblioverlap: an R package for document matching across bibliographic datasets," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(7), pages 4513-4527, July.
    2. Shuo Xu & Liyuan Hao & Xin An & Dongsheng Zhai & Hongshen Pang, 2019. "Types of DOI errors of cited references in Web of Science with a cleaning method," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(3), pages 1427-1437, September.
    3. Raminta Pranckutė, 2021. "Web of Science (WoS) and Scopus: The Titans of Bibliographic Information in Today’s Academic World," Publications, MDPI, vol. 9(1), pages 1-59, March.
    4. Junwen Zhu & Fang Liu & Weishu Liu, 2019. "The secrets behind Web of Science’s DOI search," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(3), pages 1745-1753, June.
    5. Junwen Zhu & Guangyuan Hu & Weishu Liu, 2019. "DOI errors and possible solutions for Web of Science," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(2), pages 709-718, February.
    6. Waltman, Ludo, 2016. "A review of the literature on citation impact indicators," Journal of Informetrics, Elsevier, vol. 10(2), pages 365-391.
    7. Weishu Liu & Meiting Huang & Haifeng Wang, 2021. "Same journal but different numbers of published records indexed in Scopus and Web of Science Core Collection: causes, consequences, and solutions," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(5), pages 4541-4550, May.
    8. Shirley Ainsworth & Jane M. Russell, 2018. "Has hosting on science direct improved the visibility of Latin American scholarly journals? A preliminary analysis of data quality," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(3), pages 1463-1484, June.
    9. Weishu Liu, 2020. "Accuracy of funding information in Scopus: a comparative case study," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(1), pages 803-811, July.
    10. Xiaoling Huang & Lei Wang & Weishu Liu, 2023. "Identification of national research output using Scopus/Web of Science Core Collection: a revisit and further investigation," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(4), pages 2337-2347, April.
    11. Franceschini, Fiorenzo & Maisano, Domenico & Mastrogiacomo, Luca, 2016. "Empirical analysis and classification of database errors in Scopus and Web of Science," Journal of Informetrics, Elsevier, vol. 10(4), pages 933-953.
    12. Christophe Boudry & Ghislaine Chartron, 2017. "Availability of digital object identifiers in publications archived by PubMed," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(3), pages 1453-1469, March.
    13. Mike Thelwall, 2017. "Are Mendeley reader counts useful impact indicators in all fields?," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(3), pages 1721-1731, December.
    14. Thelwall, Mike, 2018. "Microsoft Academic automatic document searches: Accuracy for journal articles and suitability for citation analysis," Journal of Informetrics, Elsevier, vol. 12(1), pages 1-9.
    15. Sergio Copiello, 2019. "The open access citation premium may depend on the openness and inclusiveness of the indexing database, but the relationship is controversial because it is ambiguous where the open access boundary lie," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(2), pages 995-1018, November.
    16. Abdelghani Maddi & Lesya Baudoin, 2022. "The quality of the web of science data: a longitudinal study on the completeness of authors-addresses links," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6279-6292, November.
    17. Gerson Pech & Catarina Delgado, 2020. "Assessing the publication impact using citation data from both Scopus and WoS databases: an approach validated in 15 research fields," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(2), pages 909-924, November.
    18. Igor Savchenko & Denis Kosyakov, 2022. "Lost in affiliation: apatride publications in international databases," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(6), pages 3471-3487, June.
    19. Paul Donner, 2017. "Document type assignment accuracy in the journal citation index data of Web of Science," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(1), pages 219-236, October.
    20. Rogério Mugnaini & Grischa Fraumann & Esteban F. Tuesta & Abel L. Packer, 2021. "Openness trends in Brazilian citation data: factors related to the use of DOIs," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(3), pages 2523-2556, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:127:y:2022:i:6:d:10.1007_s11192-022-04367-w. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.