IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0222703.html
   My bibliography  Save this article

A non-parametric significance test to compare corpora

Author

Listed:
  • Alexander Koplenig

Abstract

Classical null hypothesis significance tests are not appropriate in corpus linguistics, because the randomness assumption underlying these testing procedures is not fulfilled. Nevertheless, there are numerous scenarios where it would be beneficial to have some kind of test in order to judge the relevance of a result (e.g. a difference between two corpora) by answering the question whether the attribute of interest is pronounced enough to warrant the conclusion that it is substantial and not due to chance. In this paper, I outline such a test.

Suggested Citation

  • Alexander Koplenig, 2019. "A non-parametric significance test to compare corpora," PLOS ONE, Public Library of Science, vol. 14(9), pages 1-18, September.
  • Handle: RePEc:plo:pone00:0222703
    DOI: 10.1371/journal.pone.0222703
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0222703
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0222703&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0222703?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Regina Nuzzo, 2014. "Scientific method: Statistical errors," Nature, Nature, vol. 506(7487), pages 150-152, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jyotirmoy Sarkar, 2018. "Will P†Value Triumph over Abuses and Attacks?," Biostatistics and Biometrics Open Access Journal, Juniper Publishers Inc., vol. 7(4), pages 66-71, July.
    2. Gunnarsson, Björn Rafn & vanden Broucke, Seppe & Baesens, Bart & Óskarsdóttir, María & Lemahieu, Wilfried, 2021. "Deep learning for credit scoring: Do or don’t?," European Journal of Operational Research, Elsevier, vol. 295(1), pages 292-305.
    3. Arthur Matsuo Yamashita Rios de Sousa & Hideki Takayasu & Misako Takayasu, 2017. "Detection of statistical asymmetries in non-stationary sign time series: Analysis of foreign exchange data," PLOS ONE, Public Library of Science, vol. 12(5), pages 1-18, May.
    4. Maurizio Canavari & Andreas C. Drichoutis & Jayson L. Lusk & Rodolfo M. Nayga, Jr., 2018. "How to run an experimental auction: A review of recent advances," Working Papers 2018-5, Agricultural University of Athens, Department Of Agricultural Economics.
    5. Andrew Gelman & Christian Hennig, 2017. "Beyond subjective and objective in statistics," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 180(4), pages 967-1033, October.
    6. Felipe Campelo & Fernanda Takahashi, 2019. "Sample size estimation for power and accuracy in the experimental comparison of algorithms," Journal of Heuristics, Springer, vol. 25(2), pages 305-338, April.
    7. Martin E Héroux & Janet L Taylor & Simon C Gandevia, 2015. "The Use and Abuse of Transcranial Magnetic Stimulation to Modulate Corticospinal Excitability in Humans," PLOS ONE, Public Library of Science, vol. 10(12), pages 1-10, December.
    8. Roger Beecham & Nick Williams & Alexis Comber, 2020. "Regionally-structured explanations behind area-level populism: An update to recent ecological analyses," PLOS ONE, Public Library of Science, vol. 15(3), pages 1-20, March.
    9. Tagiew, Rustam & Ignatov, Dmitry, 2016. "Gift Ratios in Laboratory Experiments," MPRA Paper 77603, University Library of Munich, Germany.
    10. Megan L Head & Luke Holman & Rob Lanfear & Andrew T Kahn & Michael D Jennions, 2015. "The Extent and Consequences of P-Hacking in Science," PLOS Biology, Public Library of Science, vol. 13(3), pages 1-15, March.
    11. Louis Anthony (Tony) Cox, 2015. "Overcoming Learning Aversion in Evaluating and Managing Uncertain Risks," Risk Analysis, John Wiley & Sons, vol. 35(10), pages 1892-1910, October.
    12. Bettina Budeus & Jörg Timm & Daniel Hoffmann, 2016. "SeqFeatR for the Discovery of Feature-Sequence Associations," PLOS ONE, Public Library of Science, vol. 11(1), pages 1-12, January.
    13. Shuxin Guo & Qiang Liu, 2024. "Data-generating process and time-series asset pricing," Papers 2405.10920, arXiv.org.
    14. Luo, Xing & Zhang, Dongxiao & Zhu, Xu, 2022. "Combining transfer learning and constrained long short-term memory for power generation forecasting of newly-constructed photovoltaic plants," Renewable Energy, Elsevier, vol. 185(C), pages 1062-1077.
    15. Chengwu Yang & Carlo Panlilio & Nicole Verdiglione & Erik B Lehman & Robert M Hamm & Richard Fiene & Sarah Dore & David E Bard & Breanna Grable & Benjamin Levi, 2020. "Generalizing findings from a randomized controlled trial to a real-world study of the iLookOut, an online education program to improve early childhood care and education providers’ knowledge and attit," PLOS ONE, Public Library of Science, vol. 15(1), pages 1-11, January.
    16. Micha³ Zdziarski & Dominika Czerniawska, 2016. "Board Homophily, Board Diversity and Network Centrality (Homofilia, zroznicowanie i centralnosc rady w sieci)," Problemy Zarzadzania, University of Warsaw, Faculty of Management, vol. 14(60), pages 117-133.
    17. Jorge Arede & Sogand Poureghbali & Tomás Freitas & John Fernandes & Wolfgang I. Schöllhorn & Nuno Leite, 2021. "The Effect of Differential Repeated Sprint Training on Physical Performance in Female Basketball Players: A Pilot Study," IJERPH, MDPI, vol. 18(23), pages 1-16, November.
    18. Juan Li & Hanzhang Xu & Wei Pan & Bei Wu, 2017. "Association between tooth loss and cognitive decline: A 13-year longitudinal study of Chinese older adults," PLOS ONE, Public Library of Science, vol. 12(2), pages 1-12, February.
    19. Felix Holzmeister & Magnus Johannesson & Robert Böhm & Anna Dreber & Jürgen Huber & Michael Kirchler, 2023. "Heterogeneity in effect size estimates: Empirical evidence and practical implications," Working Papers 2023-17, Faculty of Economics and Statistics, Universität Innsbruck.
    20. Michael E. Mann & Elisabeth A. Lloyd & Naomi Oreskes, 2017. "Assessing climate change impacts on extreme weather events: the case for an alternative (Bayesian) approach," Climatic Change, Springer, vol. 144(2), pages 131-142, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0222703. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.