IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v102y2015i1d10.1007_s11192-014-1251-5.html
   My bibliography  Save this article

Null hypothesis significance tests. A mix-up of two different theories: the basis for widespread confusion and numerous misinterpretations

Author

Listed:
  • Jesper W. Schneider

    (Aarhus University)

Abstract

Null hypothesis statistical significance tests (NHST) are widely used in quantitative research in the empirical sciences including scientometrics. Nevertheless, since their introduction nearly a century ago significance tests have been controversial. Many researchers are not aware of the numerous criticisms raised against NHST. As practiced, NHST has been characterized as a ‘null ritual’ that is overused and too often misapplied and misinterpreted. NHST is in fact a patchwork of two fundamentally different classical statistical testing models, often blended with some wishful quasi-Bayesian interpretations. This is undoubtedly a major reason why NHST is very often misunderstood. But NHST also has intrinsic logical problems and the epistemic range of the information provided by such tests is much more limited than most researchers recognize. In this article we introduce to the scientometric community the theoretical origins of NHST, which is mostly absent from standard statistical textbooks, and we discuss some of the most prevalent problems relating to the practice of NHST and trace these problems back to the mix-up of the two different theoretical origins. Finally, we illustrate some of the misunderstandings with examples from the scientometric literature and bring forward some modest recommendations for a more sound practice in quantitative data analysis.

Suggested Citation

  • Jesper W. Schneider, 2015. "Null hypothesis significance tests. A mix-up of two different theories: the basis for widespread confusion and numerous misinterpretations," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(1), pages 411-432, January.
  • Handle: RePEc:spr:scient:v:102:y:2015:i:1:d:10.1007_s11192-014-1251-5
    DOI: 10.1007/s11192-014-1251-5
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-014-1251-5
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-014-1251-5?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Armstrong, J. Scott, 2007. "Significance tests harm progress in forecasting," International Journal of Forecasting, Elsevier, vol. 23(2), pages 321-327.
    2. Schrodt, Philip A., 2006. "Beyond the Linear Frequentist Orthodoxy," Political Analysis, Cambridge University Press, vol. 14(3), pages 335-339, July.
    3. Schneider, Jesper W., 2013. "Caveats for using statistical significance tests in research assessments," Journal of Informetrics, Elsevier, vol. 7(1), pages 50-62.
    4. John P A Ioannidis, 2005. "Why Most Published Research Findings Are False," PLOS Medicine, Public Library of Science, vol. 2(8), pages 1-1, August.
    5. Gelman, Andrew & Stern, Hal, 2006. "The Difference Between," The American Statistician, American Statistical Association, vol. 60, pages 328-331, November.
    6. Hubbard R. & Bayarri M.J., 2003. "Confusion Over Measures of Evidence (ps) Versus Errors (alphas) in Classical Statistical Testing," The American Statistician, American Statistical Association, vol. 57, pages 171-178, August.
    7. Sellke T. & Bayarri M. J. & Berger J. O., 2001. "Calibration of rho Values for Testing Precise Null Hypotheses," The American Statistician, American Statistical Association, vol. 55, pages 62-71, February.
    8. Andreas Schwab & Eric Abrahamson & William H. Starbuck & Fiona Fidler, 2011. "PERSPECTIVE---Researchers Should Make Thoughtful Assessments Instead of Null-Hypothesis Significance Tests," Organization Science, INFORMS, vol. 22(4), pages 1105-1120, August.
    9. Lutz Bornmann & Loet Leydesdorff, 2013. "Statistical tests and research assessments: A comment on Schneider (2012)," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 64(6), pages 1306-1308, June.
    10. Anne L. Schneider & Robert E. Darcy, 1984. "Policy Implications of Using Significance Tests in Evaluation Research," Evaluation Review, , vol. 8(4), pages 573-582, August.
    11. Steven Goodman & Sander Greenland, 2007. "Why Most Published Research Findings Are False: Problems in the Analysis," PLOS Medicine, Public Library of Science, vol. 4(4), pages 1-1, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jesper W. Schneider, 2018. "Response to commentary on “Is NHST logically flawed”," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(3), pages 2193-2194, September.
    2. Alexandre Galvão Patriota, 2018. "Is NHST logically flawed? Commentary on: “NHST is still logically flawed”," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(3), pages 2189-2191, September.
    3. Bonaccorsi, Andrea & Cicero, Tindaro, 2016. "Nondeterministic ranking of university departments," Journal of Informetrics, Elsevier, vol. 10(1), pages 224-237.
    4. Jinshan Wu, 2018. "Is there an intrinsic logical error in null hypothesis significance tests? Commentary on: “Null hypothesis significance tests. A mix-up of two different theories: the basis for widespread confusion an," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(1), pages 621-625, April.
    5. Zhiqi Wang & Ronald Rousseau, 2021. "COVID-19, the Yule-Simpson paradox and research evaluation," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(4), pages 3501-3511, April.
    6. Williams, Richard & Bornmann, Lutz, 2016. "Sampling issues in bibliometric analysis," Journal of Informetrics, Elsevier, vol. 10(4), pages 1225-1232.
    7. Marko Hofmann & Silja Meyer-Nieberg, 2018. "Time to dispense with the p-value in OR?," Central European Journal of Operations Research, Springer;Slovak Society for Operations Research;Hungarian Operational Research Society;Czech Society for Operations Research;Österr. Gesellschaft für Operations Research (ÖGOR);Slovenian Society Informatika - Section for Operational Research;Croatian Operational Research Society, vol. 26(1), pages 193-214, March.
    8. Lorna Wildgaard, 2015. "A comparison of 17 author-level bibliometric indicators for researchers in Astronomy, Environmental Science, Philosophy and Public Health in Web of Science and Google Scholar," Scientometrics, Springer;Akadémiai Kiadó, vol. 104(3), pages 873-906, September.
    9. Jesper W. Schneider, 2018. "NHST is still logically flawed," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(1), pages 627-635, April.
    10. Tom Engsted, 2024. "What Is the False Discovery Rate in Empirical Research?," Econ Journal Watch, Econ Journal Watch, vol. 21(1), pages 1-92–112, March.
    11. Peter Ingwersen & Soeren Holm & Birger Larsen & Thomas Ploug, 2021. "Do journals and corporate sponsors back certain views in topics where disagreement prevails?," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(1), pages 389-415, January.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Blakeley B. McShane & David Gal, 2016. "Blinding Us to the Obvious? The Effect of Statistical Training on the Evaluation of Evidence," Management Science, INFORMS, vol. 62(6), pages 1707-1718, June.
    2. Hirschauer Norbert & Mußhoff Oliver & Grüner Sven & Frey Ulrich & Theesfeld Insa & Wagner Peter, 2016. "Die Interpretation des p-Wertes – Grundsätzliche Missverständnisse," Journal of Economics and Statistics (Jahrbuecher fuer Nationaloekonomie und Statistik), De Gruyter, vol. 236(5), pages 557-575, October.
    3. Hirschauer Norbert & Grüner Sven & Mußhoff Oliver & Becker Claudia, 2019. "Twenty Steps Towards an Adequate Inferential Interpretation of p-Values in Econometrics," Journal of Economics and Statistics (Jahrbuecher fuer Nationaloekonomie und Statistik), De Gruyter, vol. 239(4), pages 703-721, August.
    4. Mayo, Deborah & Morey, Richard Donald, 2017. "A Poor Prognosis for the Diagnostic Screening Critique of Statistical Tests," OSF Preprints ps38b, Center for Open Science.
    5. Jyotirmoy Sarkar, 2018. "Will P†Value Triumph over Abuses and Attacks?," Biostatistics and Biometrics Open Access Journal, Juniper Publishers Inc., vol. 7(4), pages 66-71, July.
    6. Colin F. Camerer & Anna Dreber & Felix Holzmeister & Teck-Hua Ho & Jürgen Huber & Magnus Johannesson & Michael Kirchler & Gideon Nave & Brian A. Nosek & Thomas Pfeiffer & Adam Altmejd & Nick Buttrick , 2018. "Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015," Nature Human Behaviour, Nature, vol. 2(9), pages 637-644, September.
    7. David Spiegelhalter, 2017. "Trust in numbers," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 180(4), pages 948-965, October.
    8. Kim, Jae H. & Ji, Philip Inyeob, 2015. "Significance testing in empirical finance: A critical review and assessment," Journal of Empirical Finance, Elsevier, vol. 34(C), pages 1-14.
    9. Camerer, Colin & Dreber, Anna & Forsell, Eskil & Ho, Teck-Hua & Huber, Jurgen & Johannesson, Magnus & Kirchler, Michael & Almenberg, Johan & Altmejd, Adam & Chan, Taizan & Heikensten, Emma & Holzmeist, 2016. "Evaluating replicability of laboratory experiments in Economics," MPRA Paper 75461, University Library of Munich, Germany.
    10. Lars Ole Schwen & Sabrina Rueschenbaum, 2018. "Ten quick tips for getting the most scientific value out of numerical data," PLOS Computational Biology, Public Library of Science, vol. 14(10), pages 1-21, October.
    11. Arjen Witteloostuijn, 2020. "New-day statistical thinking: A bold proposal for a radical change in practices," Journal of International Business Studies, Palgrave Macmillan;Academy of International Business, vol. 51(2), pages 274-278, March.
    12. Pérez, María-Eglée & Pericchi, Luis Raúl, 2014. "Changing statistical significance with the amount of information: The adaptive α significance level," Statistics & Probability Letters, Elsevier, vol. 85(C), pages 20-24.
    13. Hannah Fraser & Tim Parker & Shinichi Nakagawa & Ashley Barnett & Fiona Fidler, 2018. "Questionable research practices in ecology and evolution," PLOS ONE, Public Library of Science, vol. 13(7), pages 1-16, July.
    14. Andreas Schwab, 2018. "Investigating and Communicating the Uncertainty of Effects: The Power of Graphs," Entrepreneurship Theory and Practice, , vol. 42(6), pages 823-834, November.
    15. Jesper W. Schneider, 2018. "NHST is still logically flawed," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(1), pages 627-635, April.
    16. Nosek, Brian A. & Ebersole, Charles R. & DeHaven, Alexander Carl & Mellor, David Thomas, 2018. "The Preregistration Revolution," OSF Preprints 2dxu5, Center for Open Science.
    17. Denes Szucs & John P A Ioannidis, 2017. "Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature," PLOS Biology, Public Library of Science, vol. 15(3), pages 1-18, March.
    18. Bornmann, Lutz, 2013. "The problem of citation impact assessments for recent publication years in institutional evaluations," Journal of Informetrics, Elsevier, vol. 7(3), pages 722-729.
    19. Andreas Schwab & Eric Abrahamson & William H. Starbuck & Fiona Fidler, 2011. "PERSPECTIVE---Researchers Should Make Thoughtful Assessments Instead of Null-Hypothesis Significance Tests," Organization Science, INFORMS, vol. 22(4), pages 1105-1120, August.
    20. Verleysen, Frederik T. & Engels, Tim C.E., 2014. "Barycenter representation of book publishing internationalization in the Social Sciences and Humanities," Journal of Informetrics, Elsevier, vol. 8(1), pages 234-240.

    More about this item

    Keywords

    Null hypothesis significance test; Fisher’s significance test; Neyman–Pearson’s hypothesis test; Statistical inference; Scientometrics;
    All these keywords.

    JEL classification:

    • C12 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Hypothesis Testing: General

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:102:y:2015:i:1:d:10.1007_s11192-014-1251-5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.