IDEAS home Printed from https://ideas.repec.org/a/bla/jinfst/v75y2024i1p43-58.html
   My bibliography  Save this article

Why are these publications missing? Uncovering the reasons behind the exclusion of documents in free‐access scholarly databases

Author

Listed:
  • Lorena Delgado‐Quirós
  • Isidro F. Aguillo
  • Alberto Martín‐Martín
  • Emilio Delgado López‐Cózar
  • Enrique Orduña‐Malea
  • José Luis Ortega

Abstract

This study analyses the coverage of seven free‐access bibliographic databases (Crossref, Dimensions—non‐subscription version, Google Scholar, Lens, Microsoft Academic, Scilit, and Semantic Scholar) to identify the potential reasons that might cause the exclusion of scholarly documents and how they could influence coverage. To do this, 116 k randomly selected bibliographic records from Crossref were used as a baseline. API endpoints and web scraping were used to query each database. The results show that coverage differences are mainly caused by the way each service builds their databases. While classic bibliographic databases ingest almost the exact same content from Crossref (Lens and Scilit miss 0.1% and 0.2% of the records, respectively), academic search engines present lower coverage (Google Scholar does not find: 9.8%, Semantic Scholar: 10%, and Microsoft Academic: 12%). Coverage differences are mainly attributed to external factors, such as web accessibility and robot exclusion policies (39.2%–46%), and internal requirements that exclude secondary content (6.5%–11.6%). In the case of Dimensions, the only classic bibliographic database with the lowest coverage (7.6%), internal selection criteria such as the indexation of full books instead of book chapters (65%) and the exclusion of secondary content (15%) are the main motives of missing publications.

Suggested Citation

  • Lorena Delgado‐Quirós & Isidro F. Aguillo & Alberto Martín‐Martín & Emilio Delgado López‐Cózar & Enrique Orduña‐Malea & José Luis Ortega, 2024. "Why are these publications missing? Uncovering the reasons behind the exclusion of documents in free‐access scholarly databases," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 75(1), pages 43-58, January.
  • Handle: RePEc:bla:jinfst:v:75:y:2024:i:1:p:43-58
    DOI: 10.1002/asi.24839
    as

    Download full text from publisher

    File URL: https://doi.org/10.1002/asi.24839
    Download Restriction: no

    File URL: https://libkey.io/10.1002/asi.24839?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jinfst:v:75:y:2024:i:1:p:43-58. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.asis.org .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.