IDEAS home Printed from https://ideas.repec.org/a/bla/jinfst/v72y2021i12p1461-1476.html
   My bibliography  Save this article

Prevalence of nonsensical algorithmically generated papers in the scientific literature

Author

Listed:
  • Guillaume Cabanac
  • Cyril Labbé

Abstract

In 2014 leading publishers withdrew more than 120 nonsensical publications automatically generated with the SCIgen program. Casual observations suggested that similar problematic papers are still published and sold, without follow‐up retractions. No systematic screening has been performed and the prevalence of such nonsensical publications in the scientific literature is unknown. Our contribution is 2‐fold. First, we designed a detector that combs the scientific literature for grammar‐based computer‐generated papers. Applied to SCIgen, it has a 83.6% precision. Second, we performed a scientometric study of the 243 detected SCIgen‐papers from 19 publishers. We estimate the prevalence of SCIgen‐papers to be 75 per million papers in Information and Computing Sciences. Only 19% of the 243 problematic papers were dealt with: formal retraction (12) or silent removal (34). Publishers still serve and sometimes sell the remaining 197 papers without any caveat. We found evidence of citation manipulation via edited SCIgen bibliographies. This work reveals metric gaming up to the point of absurdity: fraudsters publish nonsensical algorithmically generated papers featuring genuine references. It stresses the need to screen papers for nonsense before peer‐review and chase citation manipulation in published papers. Overall, this is yet another illustration of the harmful effects of the pressure to publish or perish.

Suggested Citation

  • Guillaume Cabanac & Cyril Labbé, 2021. "Prevalence of nonsensical algorithmically generated papers in the scientific literature," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 72(12), pages 1461-1476, December.
  • Handle: RePEc:bla:jinfst:v:72:y:2021:i:12:p:1461-1476
    DOI: 10.1002/asi.24495
    as

    Download full text from publisher

    File URL: https://doi.org/10.1002/asi.24495
    Download Restriction: no

    File URL: https://libkey.io/10.1002/asi.24495?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Philip Ball, 2005. "Computer conference welcomes gobbledegook paper," Nature, Nature, vol. 434(7036), pages 946-946, April.
    2. Diego Raphael Amancio, 2015. "Comparing the topological properties of real and artificially generated scientific manuscripts," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 1763-1779, December.
    3. Paul Ginsparg, 2014. "ArXiv screens spot fake papers," Nature, Nature, vol. 508(7494), pages 44-44, April.
    4. Priyanka Pulla, 2019. "The plan to mine the world’s research papers," Nature, Nature, vol. 571(7765), pages 316-318, July.
    5. Anne-Wil Harzing, 2019. "Two new kids on the block: How do Crossref and Dimensions compare with Google Scholar, Microsoft Academic, Scopus and the Web of Science?," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(1), pages 341-349, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. repec:hal:journl:hal-04794323 is not listed on IDEAS
    2. Howell, Bronwyn E. & Potgieter, Petrus H., 2023. "AI-generated lemons: a sour outlook for content producers?," 32nd European Regional ITS Conference, Madrid 2023: Realising the digital decade in the European Union – Easier said than done? 277971, International Telecommunications Society (ITS).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jennifer A. Byrne & Cyril Labbé, 2017. "Striking similarities between publications from China describing single gene knockdown experiments in human cancer cell lines," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(3), pages 1471-1493, March.
    2. Nguyen Minh Tien & Cyril Labbé, 2018. "Detecting automatically generated sentences with grammatical structure similarity," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 1247-1271, August.
    3. Kyle J. Burghardt & Bradley H. Howlett & Audrey S. Khoury & Stephanie M. Fern & Paul R. Burghardt, 2020. "Three Commonly Utilized Scholarly Databases and a Social Network Site Provide Different, But Related, Metrics of Pharmacy Faculty Publication," Publications, MDPI, vol. 8(2), pages 1-10, April.
    4. Marek Kwiek & Wojciech Roszka, 2022. "Academic vs. biological age in research on academic careers: a large-scale study with implications for scientifically developing systems," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(6), pages 3543-3575, June.
    5. Vivek Kumar Singh & Prashasti Singh & Mousumi Karmakar & Jacqueline Leta & Philipp Mayr, 2021. "The journal coverage of Web of Science, Scopus and Dimensions: A comparative analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(6), pages 5113-5142, June.
    6. Steve J. Bickley & Ho Fai Chan & Benno Torgler, 2022. "Artificial intelligence in the field of economics," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(4), pages 2055-2084, April.
    7. Raminta Pranckutė, 2021. "Web of Science (WoS) and Scopus: The Titans of Bibliographic Information in Today’s Academic World," Publications, MDPI, vol. 9(1), pages 1-59, March.
    8. Corrêa, Edilson A. & Marinho, Vanessa Q. & Amancio, Diego R., 2020. "Semantic flow in language networks discriminates texts by genre and publication date," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 557(C).
    9. Mehdi Toloo & Rouhollah Khodabandelou & Amar Oukil, 2022. "A Comprehensive Bibliometric Analysis of Fractional Programming (1965–2020)," Mathematics, MDPI, vol. 10(11), pages 1-21, May.
    10. Zhentao Liang & Jin Mao & Kun Lu & Gang Li, 2021. "Finding citations for PubMed: a large-scale comparison between five freely available bibliographic data sources," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(12), pages 9519-9542, December.
    11. Olena Horobets, 2021. "Research Data as a Result of Research Activities: the Role and Significance for the Official Statistics," Journal of the Knowledge Economy, Springer;Portland International Center for Management of Engineering and Technology (PICMET), vol. 12(3), pages 1424-1436, September.
    12. Toluwase Victor Asubiaro & Sodiq Onaolapo, 2023. "A comparative study of the coverage of African journals in Web of Science, Scopus, and CrossRef," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(7), pages 745-758, July.
    13. Hunter Bennett & Flynn Slattery, 2023. "Graphical abstracts are associated with greater Altmetric attention scores, but not citations, in sport science," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(6), pages 3793-3804, June.
    14. Dejian Yu & Wanru Wang & Shuai Zhang & Wenyu Zhang & Rongyu Liu, 2017. "Hybrid self-optimized clustering model based on citation links and textual features to detect research topics," PLOS ONE, Public Library of Science, vol. 12(10), pages 1-21, October.
    15. Xu, Fang & Ou, Guiyan & Ma, Tingcan & Wang, Xianwen, 2021. "The consistency of impact of preprints and their journal publications," Journal of Informetrics, Elsevier, vol. 15(2).
    16. Tohalino, Jorge V. & Amancio, Diego R., 2018. "Extractive multi-document summarization using multilayer networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 503(C), pages 526-539.
    17. William E Savage & Anthony J Olejniczak, 2022. "More journal articles and fewer books: Publication practices in the social sciences in the 2010’s," PLOS ONE, Public Library of Science, vol. 17(2), pages 1-16, February.
    18. Stephan Stahlschmidt & Dimity Stephen, 2022. "From indexation policies through citation networks to normalized citation impacts: Web of Science, Scopus, and Dimensions as varying resonance chambers," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(5), pages 2413-2431, May.
    19. Cyril Labbé & Dominique Labbé, 2013. "Duplicate and fake publications in the scientific literature: how many SCIgen papers in computer science?," Scientometrics, Springer;Akadémiai Kiadó, vol. 94(1), pages 379-396, January.
    20. Juan Andrés Cabral & Florencia Iara Pucci, 2020. "¿Cuál es el alcance de la revolución de la credibilidad?," Asociación Argentina de Economía Política: Working Papers 4318, Asociación Argentina de Economía Política.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jinfst:v:72:y:2021:i:12:p:1461-1476. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.asis.org .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.