IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0167475.html
   My bibliography  Save this article

Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content

Author

Listed:
  • Shawn M Jones
  • Herbert Van de Sompel
  • Harihar Shankar
  • Martin Klein
  • Richard Tobin
  • Claire Grover

Abstract

Increasingly, scholarly articles contain URI references to “web at large” resources including project web sites, scholarly wikis, ontologies, online debates, presentations, blogs, and videos. Authors reference such resources to provide essential context for the research they report on. A reader who visits a web at large resource by following a URI reference in an article, some time after its publication, is led to believe that the resource’s content is representative of what the author originally referenced. However, due to the dynamic nature of the web, that may very well not be the case. We reuse a dataset from a previous study in which several authors of this paper were involved, and investigate to what extent the textual content of web at large resources referenced in a vast collection of Science, Technology, and Medicine (STM) articles published between 1997 and 2012 has remained stable since the publication of the referencing article. We do so in a two-step approach that relies on various well-established similarity measures to compare textual content. In a first step, we use 19 web archives to find snapshots of referenced web at large resources that have textual content that is representative of the state of the resource around the time of publication of the referencing paper. We find that representative snapshots exist for about 30% of all URI references. In a second step, we compare the textual content of representative snapshots with that of their live web counterparts. We find that for over 75% of references the content has drifted away from what it was when referenced. These results raise significant concerns regarding the long term integrity of the web-based scholarly record and call for the deployment of techniques to combat these problems.

Suggested Citation

  • Shawn M Jones & Herbert Van de Sompel & Harihar Shankar & Martin Klein & Richard Tobin & Claire Grover, 2016. "Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content," PLOS ONE, Public Library of Science, vol. 11(12), pages 1-32, December.
  • Handle: RePEc:plo:pone00:0167475
    DOI: 10.1371/journal.pone.0167475
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0167475
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0167475&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0167475?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Carmine Sellitto, 2005. "The impact of impermanent Web‐located citations: A study of 123 scholarly conference publications," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 56(7), pages 695-703, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Chris Hartgerink, 2019. "Verified, Shared, Modular, and Provenance Based Research Communication with the Dat Protocol," Publications, MDPI, vol. 7(2), pages 1-19, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Judit Bar-Ilan & Bluma C. Peritz, 2009. "The lifespan of “informetrics” on the Web: An eight year study (1998–2006)," Scientometrics, Springer;Akadémiai Kiadó, vol. 79(1), pages 7-25, April.
    2. Sampath Kumar, B.T. & Vinay Kumar, D., 2013. "HTTP 404-page (not) found: Recovery of decayed URL citations," Journal of Informetrics, Elsevier, vol. 7(1), pages 145-157.
    3. Zhiqiang Wu, 2009. "An empirical study of the accessibility of web references in two Chinese academic journals," Scientometrics, Springer;Akadémiai Kiadó, vol. 78(3), pages 481-503, March.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0167475. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.