IDEAS home Printed from https://ideas.repec.org/a/pal/palcom/v5y2019i1d10.1057_s41599-019-0340-8.html
   My bibliography  Save this article

Raiders of the lost HARK: a reproducible inference framework for big data science

Author

Listed:
  • Mattia Prosperi

    (University of Florida)

  • Jiang Bian

    (University of Florida)

  • Iain E. Buchan

    (University of Liverpool)

  • James S. Koopman

    (University of Michigan)

  • Matthew Sperrin

    (University of Manchester)

  • Mo Wang

    (University of Florida)

Abstract

Hypothesizing after the results are known (HARK) has been disparaged as data dredging, and safeguards including hypothesis preregistration and statistically rigorous oversight have been recommended. Despite potential drawbacks, HARK has deepened thinking about complex causal processes. Some of the HARK precautions can conflict with the modern reality of researchers’ obligations to use big, ‘organic’ data sources—from high-throughput genomics to social media streams. We here propose a HARK-solid, reproducible inference framework suitable for big data, based on models that represent formalization of hypotheses. Reproducibility is attained by employing two levels of model validation: internal (relative to data collated around hypotheses) and external (independent to the hypotheses used to generate data or to the data used to generate hypotheses). With a model-centered paradigm, the reproducibility focus changes from the ability of others to reproduce both data and specific inferences from a study to the ability to evaluate models as representation of reality. Validation underpins ‘natural selection’ in a knowledge base maintained by the scientific community. The community itself is thereby supported to be more productive in generating and critically evaluating theories that integrate wider, complex systems.

Suggested Citation

  • Mattia Prosperi & Jiang Bian & Iain E. Buchan & James S. Koopman & Matthew Sperrin & Mo Wang, 2019. "Raiders of the lost HARK: a reproducible inference framework for big data science," Palgrave Communications, Palgrave Macmillan, vol. 5(1), pages 1-12, December.
  • Handle: RePEc:pal:palcom:v:5:y:2019:i:1:d:10.1057_s41599-019-0340-8
    DOI: 10.1057/s41599-019-0340-8
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1057/s41599-019-0340-8
    File Function: Abstract
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1057/s41599-019-0340-8?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Laibson, David I. & Rietveld, Cornelius A. & Conley, Dalton & Eriksson, Nicholas & Esko, Tonu & Medland, Sarah E. & Vinkhuyzen, Anna A. E. & Yang, Jian & Boardman, Jason D. & Chabris, Christopher F. &, 2014. "Replicability and Robustness of Genome-Wide-Association Studies for Behavioral Traits," Scholarly Articles 33371478, Harvard University Department of Economics.
    2. Kenneth F Schulz & Douglas G Altman & David Moher & for the CONSORT Group, 2010. "CONSORT 2010 Statement: Updated Guidelines for Reporting Parallel Group Randomised Trials," PLOS Medicine, Public Library of Science, vol. 7(3), pages 1-7, March.
    3. Vancouver, Jeffrey B., 2018. "In Defense of HARKing," Industrial and Organizational Psychology, Cambridge University Press, vol. 11(1), pages 73-80, March.
    4. Estelle Dumas-Mallet & Katherine Button & Thomas Boraud & Marcus Munafo & François Gonon, 2016. "Replication Validity of Initial Association Studies: A Comparison between Psychiatry, Neurology and Four Somatic Diseases," PLOS ONE, Public Library of Science, vol. 11(6), pages 1-20, June.
    5. Megan L Head & Luke Holman & Rob Lanfear & Andrew T Kahn & Michael D Jennions, 2015. "The Extent and Consequences of P-Hacking in Science," PLOS Biology, Public Library of Science, vol. 13(3), pages 1-15, March.
    6. Mazzola, Joseph J. & Deuling, Jacqueline K., 2013. "Forgetting What We Learned as Graduate Students: HARKing and Selective Outcome Reporting in I–O Journal Articles," Industrial and Organizational Psychology, Cambridge University Press, vol. 6(3), pages 279-284, September.
    7. Marcus R. Munafò & Brian A. Nosek & Dorothy V. M. Bishop & Katherine S. Button & Christopher D. Chambers & Nathalie Percie du Sert & Uri Simonsohn & Eric-Jan Wagenmakers & Jennifer J. Ware & John P. A, 2017. "A manifesto for reproducible science," Nature Human Behaviour, Nature, vol. 1(1), pages 1-9, January.
    8. Nosek, Brian A. & Ebersole, Charles R. & DeHaven, Alexander Carl & Mellor, David Thomas, 2018. "The Preregistration Revolution," OSF Preprints 2dxu5, Center for Open Science.
    9. Robert A. Stine, 2004. "Model Selection Using Information Theory and the MDL Principle," Sociological Methods & Research, , vol. 33(2), pages 230-260, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jasper Brinkerink, 2023. "When Shooting for the Stars Becomes Aiming for Asterisks: P-Hacking in Family Business Research," Entrepreneurship Theory and Practice, , vol. 47(2), pages 304-343, March.
    2. Eszter Czibor & David Jimenez‐Gomez & John A. List, 2019. "The Dozen Things Experimental Economists Should Do (More of)," Southern Economic Journal, John Wiley & Sons, vol. 86(2), pages 371-432, October.
    3. Rubin, Mark, 2020. "Does preregistration improve the credibility of research findings?," MetaArXiv vgr89, Center for Open Science.
    4. Christopher Allen & David M A Mehler, 2019. "Open science challenges, benefits and tips in early career and beyond," PLOS Biology, Public Library of Science, vol. 17(5), pages 1-14, May.
    5. Brinkerink, Jasper & De Massis, Alfredo & Kellermanns, Franz, 2022. "One finding is no finding: Toward a replication culture in family business research," Journal of Family Business Strategy, Elsevier, vol. 13(4).
    6. Filip Melinscak & Dominik R Bach, 2020. "Computational optimization of associative learning experiments," PLOS Computational Biology, Public Library of Science, vol. 16(1), pages 1-23, January.
    7. Cantone, Giulio Giacomo, 2023. "The multiversal methodology as a remedy of the replication crisis," MetaArXiv kuhmz, Center for Open Science.
    8. Merton S. Krause, 2019. "Replication and preregistration," Quality & Quantity: International Journal of Methodology, Springer, vol. 53(5), pages 2647-2652, September.
    9. Persson, Emil & Tinghög, Gustav, 2020. "Opportunity cost neglect in public policy," Journal of Economic Behavior & Organization, Elsevier, vol. 170(C), pages 301-312.
    10. Kraft-Todd, Gordon T. & Rand, David G., 2021. "Practice what you preach: Credibility-enhancing displays and the growth of open science," Organizational Behavior and Human Decision Processes, Elsevier, vol. 164(C), pages 1-10.
    11. Nosek, Brian A. & Errington, Timothy M., 2019. "What is replication?," MetaArXiv u4g6t, Center for Open Science.
    12. Logg, Jennifer M. & Dorison, Charles A., 2021. "Pre-registration: Weighing costs and benefits for researchers," Organizational Behavior and Human Decision Processes, Elsevier, vol. 167(C), pages 18-27.
    13. Ángel Enrique & Juana Bretón-López & Guadalupe Molinari & Rosa M. Baños & Cristina Botella, 2018. "Efficacy of an adaptation of the Best Possible Self intervention implemented through positive technology: a randomized control trial," Applied Research in Quality of Life, Springer;International Society for Quality-of-Life Studies, vol. 13(3), pages 671-689, September.
    14. Gerben ter Riet & Paula Chesley & Alan G Gross & Lara Siebeling & Patrick Muggensturm & Nadine Heller & Martin Umbehr & Daniela Vollenweider & Tsung Yu & Elie A Akl & Lizzy Brewster & Olaf M Dekkers &, 2013. "All That Glitters Isn't Gold: A Survey on Acknowledgment of Limitations in Biomedical Studies," PLOS ONE, Public Library of Science, vol. 8(11), pages 1-6, November.
    15. Spyridon N Papageorgiou & Georgios N Antonoglou & George K Sándor & Theodore Eliades, 2017. "Randomized clinical trials in orthodontics are rarely registered a priori and often published late or not at all," PLOS ONE, Public Library of Science, vol. 12(8), pages 1-13, August.
    16. Eun-Hi Kong & Myoungsuk Kim & Seonho Kim, 2021. "Effects of a Web-Based Educational Program Regarding Physical Restraint Reduction in Long-Term Care Settings on Nursing Students: A Cluster Randomized Controlled Trial," IJERPH, MDPI, vol. 18(13), pages 1-10, June.
    17. Stavros Petrou & Oliver Rivero-Arias & Helen Dakin & Louise Longworth & Mark Oppe & Robert Froud & Alastair Gray, 2015. "Preferred Reporting Items for Studies Mapping onto Preference-Based Outcome Measures: The MAPS Statement," Medical Decision Making, , vol. 35(6), pages 1-8, August.
    18. Alexander P. L. Martindale & Carrie D. Llewellyn & Richard O. Visser & Benjamin Ng & Victoria Ngai & Aditya U. Kale & Lavinia Ferrante Ruffano & Robert M. Golub & Gary S. Collins & David Moher & Melis, 2024. "Concordance of randomised controlled trials for artificial intelligence interventions with the CONSORT-AI reporting guidelines," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    19. Piers Steel & Sjoerd Beugelsdijk & Herman Aguinis, 2021. "The anatomy of an award-winning meta-analysis: Recommendations for authors, reviewers, and readers of meta-analytic reviews," Journal of International Business Studies, Palgrave Macmillan;Academy of International Business, vol. 52(1), pages 23-44, February.
    20. Maria Giné-Garriga & Carme Martin-Borràs & Anna Puig-Ribera & Carlos Martín-Cantera & Mercè Solà & Antonio Cuesta-Vargas & on behalf of the PPAF Group, 2013. "The Effect of a Physical Activity Program on the Total Number of Primary Care Visits in Inactive Patients: A 15-Month Randomized Controlled Trial," PLOS ONE, Public Library of Science, vol. 8(6), pages 1-8, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:pal:palcom:v:5:y:2019:i:1:d:10.1057_s41599-019-0340-8. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: https://www.nature.com/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.