IDEAS home Printed from https://ideas.repec.org/a/bla/jinfst/v72y2021i12p1461-1476.html
   My bibliography  Save this article

Prevalence of nonsensical algorithmically generated papers in the scientific literature

Author

Listed:
  • Guillaume Cabanac
  • Cyril Labbé

Abstract

In 2014 leading publishers withdrew more than 120 nonsensical publications automatically generated with the SCIgen program. Casual observations suggested that similar problematic papers are still published and sold, without follow‐up retractions. No systematic screening has been performed and the prevalence of such nonsensical publications in the scientific literature is unknown. Our contribution is 2‐fold. First, we designed a detector that combs the scientific literature for grammar‐based computer‐generated papers. Applied to SCIgen, it has a 83.6% precision. Second, we performed a scientometric study of the 243 detected SCIgen‐papers from 19 publishers. We estimate the prevalence of SCIgen‐papers to be 75 per million papers in Information and Computing Sciences. Only 19% of the 243 problematic papers were dealt with: formal retraction (12) or silent removal (34). Publishers still serve and sometimes sell the remaining 197 papers without any caveat. We found evidence of citation manipulation via edited SCIgen bibliographies. This work reveals metric gaming up to the point of absurdity: fraudsters publish nonsensical algorithmically generated papers featuring genuine references. It stresses the need to screen papers for nonsense before peer‐review and chase citation manipulation in published papers. Overall, this is yet another illustration of the harmful effects of the pressure to publish or perish.

Suggested Citation

  • Guillaume Cabanac & Cyril Labbé, 2021. "Prevalence of nonsensical algorithmically generated papers in the scientific literature," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 72(12), pages 1461-1476, December.
  • Handle: RePEc:bla:jinfst:v:72:y:2021:i:12:p:1461-1476
    DOI: 10.1002/asi.24495
    as

    Download full text from publisher

    File URL: https://doi.org/10.1002/asi.24495
    Download Restriction: no

    File URL: https://libkey.io/10.1002/asi.24495?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Priyanka Pulla, 2019. "The plan to mine the world’s research papers," Nature, Nature, vol. 571(7765), pages 316-318, July.
    2. Anne-Wil Harzing, 2019. "Two new kids on the block: How do Crossref and Dimensions compare with Google Scholar, Microsoft Academic, Scopus and the Web of Science?," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(1), pages 341-349, July.
    3. Diego Raphael Amancio, 2015. "Comparing the topological properties of real and artificially generated scientific manuscripts," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 1763-1779, December.
    4. Philip Ball, 2005. "Computer conference welcomes gobbledegook paper," Nature, Nature, vol. 434(7036), pages 946-946, April.
    5. Paul Ginsparg, 2014. "ArXiv screens spot fake papers," Nature, Nature, vol. 508(7494), pages 44-44, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Howell, Bronwyn E. & Potgieter, Petrus H., 2023. "AI-generated lemons: a sour outlook for content producers?," 32nd European Regional ITS Conference, Madrid 2023: Realising the digital decade in the European Union – Easier said than done? 277971, International Telecommunications Society (ITS).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jennifer A. Byrne & Cyril Labbé, 2017. "Striking similarities between publications from China describing single gene knockdown experiments in human cancer cell lines," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(3), pages 1471-1493, March.
    2. Nguyen Minh Tien & Cyril Labbé, 2018. "Detecting automatically generated sentences with grammatical structure similarity," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 1247-1271, August.
    3. Marek Kwiek & Wojciech Roszka, 2022. "Academic vs. biological age in research on academic careers: a large-scale study with implications for scientifically developing systems," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(6), pages 3543-3575, June.
    4. Mehdi Toloo & Rouhollah Khodabandelou & Amar Oukil, 2022. "A Comprehensive Bibliometric Analysis of Fractional Programming (1965–2020)," Mathematics, MDPI, vol. 10(11), pages 1-21, May.
    5. Zhentao Liang & Jin Mao & Kun Lu & Gang Li, 2021. "Finding citations for PubMed: a large-scale comparison between five freely available bibliographic data sources," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(12), pages 9519-9542, December.
    6. Olena Horobets, 2021. "Research Data as a Result of Research Activities: the Role and Significance for the Official Statistics," Journal of the Knowledge Economy, Springer;Portland International Center for Management of Engineering and Technology (PICMET), vol. 12(3), pages 1424-1436, September.
    7. Hunter Bennett & Flynn Slattery, 2023. "Graphical abstracts are associated with greater Altmetric attention scores, but not citations, in sport science," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(6), pages 3793-3804, June.
    8. Xu, Fang & Ou, Guiyan & Ma, Tingcan & Wang, Xianwen, 2021. "The consistency of impact of preprints and their journal publications," Journal of Informetrics, Elsevier, vol. 15(2).
    9. Antonio Miceli & Birgit Hagen & Maria Pia Riccardi & Francesco Sotti & Davide Settembre-Blundo, 2021. "Thriving, Not Just Surviving in Changing Times: How Sustainability, Agility and Digitalization Intertwine with Organizational Resilience," Sustainability, MDPI, vol. 13(4), pages 1-17, February.
    10. Jorge A. V. Tohalino & Laura V. C. Quispe & Diego R. Amancio, 2021. "Analyzing the relationship between text features and grants productivity," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(5), pages 4255-4275, May.
    11. Adélie Ranville & Marcos Barros, 2022. "Towards Normative Theories of Social Entrepreneurship. A Review of the Top Publications of the Field," Journal of Business Ethics, Springer, vol. 180(2), pages 407-438, October.
    12. Kyle J. Burghardt & Bradley H. Howlett & Audrey S. Khoury & Stephanie M. Fern & Paul R. Burghardt, 2020. "Three Commonly Utilized Scholarly Databases and a Social Network Site Provide Different, But Related, Metrics of Pharmacy Faculty Publication," Publications, MDPI, vol. 8(2), pages 1-10, April.
    13. Steve J. Bickley & Ho Fai Chan & Benno Torgler, 2022. "Artificial intelligence in the field of economics," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(4), pages 2055-2084, April.
    14. Tohalino, Jorge V. & Amancio, Diego R., 2018. "Extractive multi-document summarization using multilayer networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 503(C), pages 526-539.
    15. Cyril Labbé & Dominique Labbé, 2013. "Duplicate and fake publications in the scientific literature: how many SCIgen papers in computer science?," Scientometrics, Springer;Akadémiai Kiadó, vol. 94(1), pages 379-396, January.
    16. Diego Raphael Amancio, 2015. "Comparing the topological properties of real and artificially generated scientific manuscripts," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 1763-1779, December.
    17. Ahmed Idi Kato, 2023. "Unlocking the Potential of Microfinance Solutions on Urban Woman Entrepreneurship Development in East Africa: A Bibliometric Analysis Perspective," Sustainability, MDPI, vol. 15(20), pages 1-22, October.
    18. Silva, Filipi N. & Amancio, Diego R. & Bardosova, Maria & Costa, Luciano da F. & Oliveira, Osvaldo N., 2016. "Using network science and text analytics to produce surveys in a scientific topic," Journal of Informetrics, Elsevier, vol. 10(2), pages 487-502.
    19. Adam Day, 2022. "Exploratory analysis of text duplication in peer-review reveals peer-review fraud and paper mills," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(10), pages 5965-5987, October.
    20. Vivek Kumar Singh & Prashasti Singh & Mousumi Karmakar & Jacqueline Leta & Philipp Mayr, 2021. "The journal coverage of Web of Science, Scopus and Dimensions: A comparative analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(6), pages 5113-5142, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jinfst:v:72:y:2021:i:12:p:1461-1476. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.asis.org .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.