IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0260395.html
   My bibliography  Save this article

Detecting fabrication in large-scale molecular omics data

Author

Listed:
  • Michael S Bradshaw
  • Samuel H Payne

Abstract

Fraud is a pervasive problem and can occur as fabrication, falsification, plagiarism, or theft. The scientific community is not exempt from this universal problem and several studies have recently been caught manipulating or fabricating data. Current measures to prevent and deter scientific misconduct come in the form of the peer-review process and on-site clinical trial auditors. As recent advances in high-throughput omics technologies have moved biology into the realm of big-data, fraud detection methods must be updated for sophisticated computational fraud. In the financial sector, machine learning and digit-frequencies are successfully used to detect fraud. Drawing from these sources, we develop methods of fabrication detection in biomedical research and show that machine learning can be used to detect fraud in large-scale omic experiments. Using the gene copy-number data as input, machine learning models correctly predicted fraud with 58–100% accuracy. With digit frequency as input features, the models detected fraud with 82%-100% accuracy. All of the data and analysis scripts used in this project are available at https://github.com/MSBradshaw/FakeData.

Suggested Citation

  • Michael S Bradshaw & Samuel H Payne, 2021. "Detecting fabrication in large-scale molecular omics data," PLOS ONE, Public Library of Science, vol. 16(11), pages 1-15, November.
  • Handle: RePEc:plo:pone00:0260395
    DOI: 10.1371/journal.pone.0260395
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0260395
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0260395&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0260395?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Clare Bycroft & Colin Freeman & Desislava Petkova & Gavin Band & Lloyd T. Elliott & Kevin Sharp & Allan Motyer & Damjan Vukcevic & Olivier Delaneau & Jared O’Connell & Adrian Cortes & Samantha Welsh &, 2018. "The UK Biobank resource with deep phenotyping and genomic data," Nature, Nature, vol. 562(7726), pages 203-209, October.
    2. Daniele Fanelli, 2009. "How Many Scientists Fabricate and Falsify Research? A Systematic Review and Meta-Analysis of Survey Data," PLOS ONE, Public Library of Science, vol. 4(5), pages 1-11, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Matteo Di Scipio & Mohammad Khan & Shihong Mao & Michael Chong & Conor Judge & Nazia Pathan & Nicolas Perrot & Walter Nelson & Ricky Lali & Shuang Di & Robert Morton & Jeremy Petch & Guillaume Paré, 2023. "A versatile, fast and unbiased method for estimation of gene-by-environment interaction effects on biobank-scale datasets," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    2. Jacob Joseph & Chang Liu & Qin Hui & Krishna Aragam & Zeyuan Wang & Brian Charest & Jennifer E. Huffman & Jacob M. Keaton & Todd L. Edwards & Serkalem Demissie & Luc Djousse & Juan P. Casas & J. Micha, 2022. "Genetic architecture of heart failure with preserved versus reduced ejection fraction," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    3. Vincent Michaud & Eulalie Lasseaux & David J. Green & Dave T. Gerrard & Claudio Plaisant & Tomas Fitzgerald & Ewan Birney & Benoît Arveiler & Graeme C. Black & Panagiotis I. Sergouniotis, 2022. "The contribution of common regulatory and protein-coding TYR variants to the genetic architecture of albinism," Nature Communications, Nature, vol. 13(1), pages 1-8, December.
    4. Moustafa, Khaled, 2018. "Don't fall in common science pitfall!," FrenXiv ycjha, Center for Open Science.
    5. Natalie DeForest & Yuqi Wang & Zhiyi Zhu & Jacqueline S. Dron & Ryan Koesterer & Pradeep Natarajan & Jason Flannick & Tiffany Amariuta & Gina M. Peloso & Amit R. Majithia, 2024. "Genome-wide discovery and integrative genomic characterization of insulin resistance loci using serum triglycerides to HDL-cholesterol ratio as a proxy," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    6. Dick Schijven & Sourena Soheili-Nezhad & Simon E. Fisher & Clyde Francks, 2024. "Exome-wide analysis implicates rare protein-altering variants in human handedness," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    7. Lili Liu & Atlas Khan & Elena Sanchez-Rodriguez & Francesca Zanoni & Yifu Li & Nicholas Steers & Olivia Balderes & Junying Zhang & Priya Krithivasan & Robert A. LeDesma & Clara Fischman & Scott J. Heb, 2022. "Genetic regulation of serum IgA levels and susceptibility to common immune, infectious, kidney, and cardio-metabolic traits," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    8. Shahram Bahrami & Kaja Nordengen & Jaroslav Rokicki & Alexey A. Shadrin & Zillur Rahman & Olav B. Smeland & Piotr P. Jaholkowski & Nadine Parker & Pravesh Parekh & Kevin S. O’Connell & Torbjørn Elvsås, 2024. "The genetic landscape of basal ganglia and implications for common brain disorders," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    9. Sylvia Hartmann & Summaira Yasmeen & Benjamin M. Jacobs & Spiros Denaxas & Munir Pirmohamed & Eric R. Gamazon & Mark J. Caulfield & Harry Hemingway & Maik Pietzner & Claudia Langenberg, 2023. "ADRA2A and IRX1 are putative risk genes for Raynaud’s phenomenon," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    10. Love, Peter E.D. & Ika, Lavagnon A. & Ahiaga-Dagbui, Dominic D., 2019. "On de-bunking ‘fake news’ in a post truth era: Why does the Planning Fallacy explanation for cost overruns fall short?," Transportation Research Part A: Policy and Practice, Elsevier, vol. 126(C), pages 397-408.
    11. Jeremy Hall & Ben R. Martin, 2019. "Towards a Taxonomy of Academic Misconduct: The Case of Business School Research," SPRU Working Paper Series 2019-02, SPRU - Science Policy Research Unit, University of Sussex Business School.
    12. Kartal, Melis & Tremewan, James, 2018. "An offer you can refuse: The effect of transparency with endogenous conflict of interest," Journal of Public Economics, Elsevier, vol. 161(C), pages 44-55.
    13. Robert J Warren II & Joshua R King & Charlene Tarsa & Brian Haas & Jeremy Henderson, 2017. "A systematic review of context bias in invasion biology," PLOS ONE, Public Library of Science, vol. 12(8), pages 1-12, August.
    14. Mit Shah & Marco H. A. Inácio & Chang Lu & Pierre-Raphaël Schiratti & Sean L. Zheng & Adam Clement & Antonio Marvao & Wenjia Bai & Andrew P. King & James S. Ware & Martin R. Wilkins & Johanna Mielke &, 2023. "Environmental and genetic predictors of human cardiovascular ageing," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    15. Mathias Seviiri & Matthew H. Law & Jue-Sheng Ong & Puya Gharahkhani & Pierre Fontanillas & Catherine M. Olsen & David C. Whiteman & Stuart MacGregor, 2022. "A multi-phenotype analysis reveals 19 susceptibility loci for basal cell carcinoma and 15 for squamous cell carcinoma," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    16. Jasper Brinkerink, 2023. "When Shooting for the Stars Becomes Aiming for Asterisks: P-Hacking in Family Business Research," Entrepreneurship Theory and Practice, , vol. 47(2), pages 304-343, March.
    17. Frederique Bordignon, 2020. "Self-correction of science: a comparative study of negative citations and post-publication peer review," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(2), pages 1225-1239, August.
    18. Hensel, Przemysław G., 2019. "Supporting replication research in management journals: Qualitative analysis of editorials published between 1970 and 2015," European Management Journal, Elsevier, vol. 37(1), pages 45-57.
    19. Zhaotong Lin & Wei Pan, 2024. "A robust cis-Mendelian randomization method with application to drug target discovery," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    20. Zhening Liu & Hangkai Huang & Jiarong Xie & Yingying Xu & Chengfu Xu, 2024. "Circulating fatty acids and risk of hepatocellular carcinoma and chronic liver disease mortality in the UK Biobank," Nature Communications, Nature, vol. 15(1), pages 1-10, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0260395. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.