IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1009279.html
   My bibliography  Save this article

Eliminating accidental deviations to minimize generalization error and maximize replicability: Applications in connectomics and genomics

Author

Listed:
  • Eric W Bridgeford
  • Shangsi Wang
  • Zeyi Wang
  • Ting Xu
  • Cameron Craddock
  • Jayanta Dey
  • Gregory Kiar
  • William Gray-Roncal
  • Carlo Colantuoni
  • Christopher Douville
  • Stephanie Noble
  • Carey E Priebe
  • Brian Caffo
  • Michael Milham
  • Xi-Nian Zuo
  • Consortium for Reliability and Reproducibility
  • Joshua T Vogelstein

Abstract

Replicability, the ability to replicate scientific findings, is a prerequisite for scientific discovery and clinical utility. Troublingly, we are in the midst of a replicability crisis. A key to replicability is that multiple measurements of the same item (e.g., experimental sample or clinical participant) under fixed experimental constraints are relatively similar to one another. Thus, statistics that quantify the relative contributions of accidental deviations—such as measurement error—as compared to systematic deviations—such as individual differences—are critical. We demonstrate that existing replicability statistics, such as intra-class correlation coefficient and fingerprinting, fail to adequately differentiate between accidental and systematic deviations in very simple settings. We therefore propose a novel statistic, discriminability, which quantifies the degree to which an individual’s samples are relatively similar to one another, without restricting the data to be univariate, Gaussian, or even Euclidean. Using this statistic, we introduce the possibility of optimizing experimental design via increasing discriminability and prove that optimizing discriminability improves performance bounds in subsequent inference tasks. In extensive simulated and real datasets (focusing on brain imaging and demonstrating on genomics), only optimizing data discriminability improves performance on all subsequent inference tasks for each dataset. We therefore suggest that designing experiments and analyses to optimize discriminability may be a crucial step in solving the replicability crisis, and more generally, mitigating accidental measurement error.Author summary: In recent decades, the size and complexity of data has grown exponentially. Unfortunately, the increased scale of modern datasets brings many new challenges. At present, we are in the midst of a replicability crisis, in which scientific discoveries fail to replicate to new datasets. Difficulties in the measurement procedure and measurement processing pipelines coupled with the influx of complex high-resolution measurements, we believe, are at the core of the replicability crisis. If measurements themselves are not replicable, what hope can we have that we will be able to use the measurements for replicable scientific findings? We introduce the “discriminability” statistic, which quantifies how discriminable measurements are from one another, without limitations on the structure of the underlying measurements. We prove that discriminable strategies tend to be strategies which provide better accuracy on downstream scientific questions. We demonstrate the utility of discriminability over competing approaches in this context on two disparate datasets from both neuroimaging and genomics. Together, we believe these results suggest the value of designing experimental protocols and analysis procedures which optimize the discriminability.

Suggested Citation

  • Eric W Bridgeford & Shangsi Wang & Zeyi Wang & Ting Xu & Cameron Craddock & Jayanta Dey & Gregory Kiar & William Gray-Roncal & Carlo Colantuoni & Christopher Douville & Stephanie Noble & Carey E Prieb, 2021. "Eliminating accidental deviations to minimize generalization error and maximize replicability: Applications in connectomics and genomics," PLOS Computational Biology, Public Library of Science, vol. 17(9), pages 1-20, September.
  • Handle: RePEc:plo:pcbi00:1009279
    DOI: 10.1371/journal.pcbi.1009279
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009279
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1009279&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1009279?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. John P A Ioannidis, 2005. "Why Most Published Research Findings Are False," PLOS Medicine, Public Library of Science, vol. 2(8), pages 1-1, August.
    2. Vogelstein, Joshua T., 2020. "P-Values in a Post-Truth World," OSF Preprints yw6sr, Center for Open Science.
    3. Zeileis, Achim, 2006. "Object-oriented Computation of Sandwich Estimators," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 16(i09).
    4. Nathan W Churchill & Robyn Spring & Babak Afshin-Pour & Fan Dong & Stephen C Strother, 2015. "An Automated, Adaptive Framework for Optimizing Preprocessing Pipelines in Task-Based Functional MRI," PLOS ONE, Public Library of Science, vol. 10(7), pages 1-25, July.
    5. Xi-Nian Zuo & Ting Xu & Michael Peter Milham, 2019. "Harnessing reliability for neuroscience research," Nature Human Behaviour, Nature, vol. 3(8), pages 768-771, August.
    6. Ronald D. Fricker & Katherine Burke & Xiaoyan Han & William H. Woodall, 2019. "Assessing the Statistical Analyses Used in Basic and Applied Social Psychology After Their p-Value Ban," The American Statistician, Taylor & Francis Journals, vol. 73(S1), pages 374-384, March.
    7. Jeffrey T. Leek & Roger D. Peng, 2015. "Statistics: P values are just the tip of the iceberg," Nature, Nature, vol. 520(7549), pages 612-612, April.
    8. Berna Devezer & Luis G Nardin & Bert Baumgaertner & Erkan Ozge Buzbas, 2019. "Scientific discovery in a model-centric framework: Reproducibility, innovation, and epistemic diversity," PLOS ONE, Public Library of Science, vol. 14(5), pages 1-23, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Gregory Kiar & Jeanette A. Mumford & Ting Xu & Joshua T. Vogelstein & Tristan Glatard & Michael P. Milham, 2024. "Why experimental variation in neuroimaging should be embraced," Nature Communications, Nature, vol. 15(1), pages 1-9, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Keith R Lohse & Kristin L Sainani & J Andrew Taylor & Michael L Butson & Emma J Knight & Andrew J Vickers, 2020. "Systematic review of the use of “magnitude-based inference” in sports science and medicine," PLOS ONE, Public Library of Science, vol. 15(6), pages 1-22, June.
    2. Uwe Hassler & Marc‐Oliver Pohle, 2022. "Unlucky Number 13? Manipulating Evidence Subject to Snooping," International Statistical Review, International Statistical Institute, vol. 90(2), pages 397-410, August.
    3. Jyotirmoy Sarkar, 2018. "Will P†Value Triumph over Abuses and Attacks?," Biostatistics and Biometrics Open Access Journal, Juniper Publishers Inc., vol. 7(4), pages 66-71, July.
    4. Timo Dimitriadis & iaochun Liu & Julie Schnaitmann, 2023. "Encompassing Tests for Value at Risk and Expected Shortfall Multistep Forecasts Based on Inference on the Boundary," Journal of Financial Econometrics, Oxford University Press, vol. 21(2), pages 412-444.
    5. Jiang, Xianfeng & Packer, Frank, 2019. "Credit ratings of Chinese firms by domestic and global agencies: Assessing the determinants and impact," Journal of Banking & Finance, Elsevier, vol. 105(C), pages 178-193.
    6. Kevin J. Boyle & Mark Morrison & Darla Hatton MacDonald & Roderick Duncan & John Rose, 2016. "Investigating Internet and Mail Implementation of Stated-Preference Surveys While Controlling for Differences in Sample Frames," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 64(3), pages 401-419, July.
    7. Jelte M Wicherts & Marjan Bakker & Dylan Molenaar, 2011. "Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results," PLOS ONE, Public Library of Science, vol. 6(11), pages 1-7, November.
    8. Ball, Laurence & Carvalho, Carlos & Evans, Christopher & Antonio Ricci, Luca, 2024. "Weighted Median Inflation Around the World: A Measure of Core Inflation," Journal of International Money and Finance, Elsevier, vol. 142(C).
    9. Frederique Bordignon, 2020. "Self-correction of science: a comparative study of negative citations and post-publication peer review," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(2), pages 1225-1239, August.
    10. Omar Al-Ubaydli & John A. List, 2015. "Do Natural Field Experiments Afford Researchers More or Less Control than Laboratory Experiments? A Simple Model," NBER Working Papers 20877, National Bureau of Economic Research, Inc.
    11. Guan, Sihai & Wan, Dongyu & Yang, Yanmiao & Biswal, Bharat, 2022. "Sources of multifractality of the brain rs-fMRI signal," Chaos, Solitons & Fractals, Elsevier, vol. 160(C).
    12. Aurelie Seguin & Wolfgang Forstmeier, 2012. "No Band Color Effects on Male Courtship Rate or Body Mass in the Zebra Finch: Four Experiments and a Meta-Analysis," PLOS ONE, Public Library of Science, vol. 7(6), pages 1-11, June.
    13. Sviták, Jan & Tichem, Jan & Haasbeek, Stefan, 2021. "Price effects of search advertising restrictions," International Journal of Industrial Organization, Elsevier, vol. 77(C).
    14. Dragana Radicic & Geoffrey Pugh & Hugo Hollanders & René Wintjes & Jon Fairburn, 2016. "The impact of innovation support programs on small and medium enterprises innovation in traditional manufacturing industries: An evaluation for seven European Union regions," Environment and Planning C, , vol. 34(8), pages 1425-1452, December.
    15. Colin F. Camerer & Anna Dreber & Felix Holzmeister & Teck-Hua Ho & Jürgen Huber & Magnus Johannesson & Michael Kirchler & Gideon Nave & Brian A. Nosek & Thomas Pfeiffer & Adam Altmejd & Nick Buttrick , 2018. "Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015," Nature Human Behaviour, Nature, vol. 2(9), pages 637-644, September.
    16. Li, Lunzheng & Maniadis, Zacharias & Sedikides, Constantine, 2021. "Anchoring in Economics: A Meta-Analysis of Studies on Willingness-To-Pay and Willingness-To-Accept," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 90(C).
    17. Christopher F. Parmeter, 2018. "Estimation of the two-tiered stochastic frontier model with the scaling property," Journal of Productivity Analysis, Springer, vol. 49(1), pages 37-47, February.
    18. Hasler Mario, 2013. "Multiple Contrasts for Repeated Measures," The International Journal of Biostatistics, De Gruyter, vol. 9(1), pages 49-61, July.
    19. Diekmann Andreas, 2011. "Are Most Published Research Findings False?," Journal of Economics and Statistics (Jahrbuecher fuer Nationaloekonomie und Statistik), De Gruyter, vol. 231(5-6), pages 628-635, October.
    20. Daniele Fanelli, 2012. "Negative results are disappearing from most disciplines and countries," Scientometrics, Springer;Akadémiai Kiadó, vol. 90(3), pages 891-904, March.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1009279. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.