IDEAS home Printed from https://ideas.repec.org/p/nbr/nberwo/25858.html
   My bibliography  Save this paper

Stability of Experimental Results: Forecasts and Evidence

Author

Listed:
  • Stefano DellaVigna
  • Devin Pope

Abstract

How robust are experimental results to changes in design? And can researchers anticipate which changes matter most? We consider a specific context, a real-effort task with multiple behavioral treatments, and examine the stability along six dimensions: (i) pure replication; (ii) demographics; (iii) geography and culture; (iv) the task; (v) the output measure; (vi) the presence of a consent form. We use rank-order correlation across the treatments as measure of stability, and compare the observed correlation to the one under a benchmark of full stability (which allows for noise), and to expert forecasts. The academic experts expect that the pure replication will be close to perfect, that the results will differ sizably across demographic groups (age/gender/education), and that changes to the task and output will make a further impact. We find near perfect replication of the experimental results, and full stability of the results across demographics, significantly higher than the experts expected. The results are quite different across task and output change, mostly because the task change adds noise to the findings. The results are also stable to the lack of consent. Overall, the full stability benchmark is an excellent predictor of the observed stability, while expert forecasts are not that informative. This suggests that researchers' predictions about external validity may not be as informative as they expect. We discuss the implications of both the methods and the results for conceptual replication.

Suggested Citation

  • Stefano DellaVigna & Devin Pope, 2019. "Stability of Experimental Results: Forecasts and Evidence," NBER Working Papers 25858, National Bureau of Economic Research, Inc.
  • Handle: RePEc:nbr:nberwo:25858
    Note: LS
    as

    Download full text from publisher

    File URL: http://www.nber.org/papers/w25858.pdf
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Stefano DellaVigna & Devin Pope, 2018. "Predicting Experimental Results: Who Knows What?," Journal of Political Economy, University of Chicago Press, vol. 126(6), pages 2410-2456.
    2. John Horton & David Rand & Richard Zeckhauser, 2011. "The online laboratory: conducting experiments in a real labor market," Experimental Economics, Springer;Economic Science Association, vol. 14(3), pages 399-425, September.
    3. Bertrand, Marianne & Karlan, Dean S. & Mullainathan, Sendhil & Shafir, Eldar & Zinman, Jonathan, 2005. "What's Psychology Worth? A Field Experiment in the Consumer Credit Market," Center Discussion Papers 28441, Yale University, Economic Growth Center.
    4. Gerber, Alan S. & Green, Donald P., 2000. "The Effects of Canvassing, Telephone Calls, and Direct Mail on Voter Turnout: A Field Experiment," American Political Science Review, Cambridge University Press, vol. 94(3), pages 653-663, September.
    5. Ilyana Kuziemko & Michael I. Norton & Emmanuel Saez & Stefanie Stantcheva, 2015. "How Elastic Are Preferences for Redistribution? Evidence from Randomized Survey Experiments," American Economic Review, American Economic Association, vol. 105(4), pages 1478-1508, April.
    6. Saurabh Bhargava & Dayanand Manoli, 2015. "Psychological Frictions and the Incomplete Take-Up of Social Benefits: Evidence from an IRS Field Experiment," American Economic Review, American Economic Association, vol. 105(11), pages 3489-3529, November.
    7. Felipe A. Araujo & Erin Carbone & Lynn Conell-Price & Marli W. Dunietz & Ania Jaroszewicz & Rachel Landsman & Diego Lamé & Lise Vesterlund & Stephanie W. Wang & Alistair J. Wilson, 2016. "The slider task: an example of restricted inference on incentive effects," Journal of the Economic Science Association, Springer;Economic Science Association, vol. 2(1), pages 1-12, May.
    8. Marianne Bertrand & Dean Karlan & Sendhil Mullainathan & Eldar Shafir & Jonathan Zinman, 2010. "What's Advertising Content Worth? Evidence from a Consumer Credit Marketing Field Experiment," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 125(1), pages 263-306.
    9. Erik Snowberg & Leeat Yariv, 2018. "Testing the Waters: Behavior across Participant Pools," NBER Working Papers 24781, National Bureau of Economic Research, Inc.
    10. Erik Snowberg & Leeat Yariv, 2018. "Testing the Waters: Behavior across Participant Pools," CESifo Working Paper Series 7136, CESifo.
    11. Armin Falk & James J. Heckman, 2009. "Lab Experiments are a Major Source of Knowledge in the Social Sciences," CESifo Working Paper Series 2894, CESifo.
    12. Uri Gneezy & John A List, 2006. "Putting Behavioral Economics to Work: Testing for Gift Exchange in Labor Markets Using Field Experiments," Econometrica, Econometric Society, vol. 74(5), pages 1365-1384, September.
    13. David Laibson, 1997. "Golden Eggs and Hyperbolic Discounting," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 112(2), pages 443-478.
    14. Logan S. Casey & Jesse Chandler & Adam Seth Levine & Andrew Proctor & Dara Z. Strolovitch, 2017. "Intertemporal Differences Among MTurk Workers: Time-Based Sample Variations and Implications for Online Data Collection," SAGE Open, , vol. 7(2), pages 21582440177, June.
    15. Jordi Brandts & Gary Charness, 2011. "The strategy versus the direct-response method: a first survey of experimental comparisons," Experimental Economics, Springer;Economic Science Association, vol. 14(3), pages 375-398, September.
    16. Eva Vivalt, 0. "How Much Can We Generalize From Impact Evaluations?," Journal of the European Economic Association, European Economic Association, vol. 18(6), pages 3045-3089.
    17. Stefano DellaVigna & Devin Pope, 2018. "What Motivates Effort? Evidence and Expert Forecasts," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 85(2), pages 1029-1069.
    18. Imas, Alex, 2014. "Working for the “warm glow”: On the benefits and limits of prosocial incentives," Journal of Public Economics, Elsevier, vol. 114(C), pages 14-18.
    19. Yariv, Leeat & Snowberg, Erik, 2018. "Testing the Waters: Behavior across Participant Pools," CEPR Discussion Papers 13015, C.E.P.R. Discussion Papers.
    20. Hunt Allcott, 2015. "Site Selection Bias in Program Evaluation," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 130(3), pages 1117-1165.
    21. Stefano DellaVigna, 2018. "Structural Behavioral Economics," NBER Working Papers 24797, National Bureau of Economic Research, Inc.
    22. Camerer, Colin & Dreber, Anna & Forsell, Eskil & Ho, Teck-Hua & Huber, Jurgen & Johannesson, Magnus & Kirchler, Michael & Almenberg, Johan & Altmejd, Adam & Chan, Taizan & Heikensten, Emma & Holzmeist, 2016. "Evaluating replicability of laboratory experiments in Economics," MPRA Paper 75461, University Library of Munich, Germany.
    23. Alan Gerber & Donald Green, 2000. "The effects of canvassing, direct mail, and telephone contact on voter turnout: A field experiment," Natural Field Experiments 00248, The Field Experiments Website.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Olivier Armantier & Charles Holt, 2024. "Can Discount Window Stigma Be Cured? An Experimental Investigation," Staff Reports 1103, Federal Reserve Bank of New York.
    2. Isaiah Andrews & Drew Fudenberg & Lihua Lei & Annie Liang & Chaofeng Wu, 2022. "The Transfer Performance of Economic Models," Papers 2202.04796, arXiv.org, revised Jul 2024.
    3. Johannes G. Jaspersen & Marc A. Ragin & Justin R. Sydnor, 2022. "Insurance demand experiments: Comparing crowdworking to the lab," Journal of Risk & Insurance, The American Risk and Insurance Association, vol. 89(4), pages 1077-1107, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Eszter Czibor & David Jimenez‐Gomez & John A. List, 2019. "The Dozen Things Experimental Economists Should Do (More of)," Southern Economic Journal, John Wiley & Sons, vol. 86(2), pages 371-432, October.
    2. Stefano DellaVigna & Devin Pope, 2018. "What Motivates Effort? Evidence and Expert Forecasts," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 85(2), pages 1029-1069.
    3. Nickolas Gagnon & Kristof Bosmans & Arno Riedl, 2020. "The Effect of Unfair Chances and Gender Discrimination on Labor Supply," CESifo Working Paper Series 8058, CESifo.
    4. Marcus Giamattei & Kyanoush Seyed Yahosseini & Simon Gächter & Lucas Molleman, 2020. "LIONESS Lab: a free web-based platform for conducting interactive experiments online," Journal of the Economic Science Association, Springer;Economic Science Association, vol. 6(1), pages 95-111, June.
    5. Stefano DellaVigna, 2009. "Psychology and Economics: Evidence from the Field," Journal of Economic Literature, American Economic Association, vol. 47(2), pages 315-372, June.
    6. Stefano DellaVigna & Elizabeth Linos, 2022. "RCTs to Scale: Comprehensive Evidence From Two Nudge Units," Econometrica, Econometric Society, vol. 90(1), pages 81-116, January.
    7. Omar Al-Ubaydli & John List, 2016. "Field Experiments in Markets," Artefactual Field Experiments j0002, The Field Experiments Website.
    8. Jörg Peters & Jörg Langbein & Gareth Roberts, 2018. "Generalization in the Tropics – Development Policy, Randomized Controlled Trials, and External Validity," The World Bank Research Observer, World Bank, vol. 33(1), pages 34-64.
    9. Jonathan Schulz & Uwe Sunde & Petra Thiemann & Christian Thoeni, 2019. "Selection into Experiments: Evidence from a Population of Students," Discussion Papers 2019-09, The Centre for Decision Research and Experimental Economics, School of Economics, University of Nottingham.
    10. Steffen Altmann & Christian Traxler & Philipp Weinschenk, 2017. "Deadlines and Cognitive Limitations," CESifo Working Paper Series 6761, CESifo.
    11. Galasso, Vincenzo & Nannicini, Tommaso, 2016. "Persuasion and Gender: Experimental Evidence from Two Political Campaigns," CEPR Discussion Papers 11238, C.E.P.R. Discussion Papers.
    12. Ortega, Daniel & Scartascini, Carlos, 2020. "Don’t blame the messenger. The Delivery method of a message matters," Journal of Economic Behavior & Organization, Elsevier, vol. 170(C), pages 286-300.
    13. Stefano DellaVigna & Devin Pope, 2018. "Predicting Experimental Results: Who Knows What?," Journal of Political Economy, University of Chicago Press, vol. 126(6), pages 2410-2456.
    14. Lauren H. Cohen & Umit G. Gurun, 2018. "Buying the Verdict," NBER Working Papers 24542, National Bureau of Economic Research, Inc.
    15. Holger Herz & Deborah Kistler & Christian Zehnder & Christian Zihlmann, 2022. "Hindsight Bias and Trust in Government," CESifo Working Paper Series 9767, CESifo.
    16. Kaisa Kotakorpi & Satu Metsälampi & Topi Miettinen & Tuomas Nurminen, 2019. "The effect of reporting institutions on tax evasion:Evidence from the lab," Discussion Papers 127, Aboa Centre for Economics.
    17. Omar Al-Ubaydli & John List & Claire Mackevicius & Min Sok Lee & Dana Suskind, 2019. "How Can Experiments Play a Greater Role in Public Policy? 12 Proposals from an Economic Model of Scaling," Artefactual Field Experiments 00679, The Field Experiments Website.
    18. Doerrenberg, Philipp & Duncan, Denvil & Löffler, Max, 2023. "Asymmetric labor-supply responses to wage changes: Experimental evidence from an online labor market," Labour Economics, Elsevier, vol. 81(C).
    19. Galasso, Vincenzo & Nannicini, Tommaso, 2016. "Persuasion and Gender: Experimental Evidence from Two Political Campaigns," CEPR Discussion Papers 11238, C.E.P.R. Discussion Papers.
    20. Esterling, Kevin & Brady, David & Schwitzgebel, Eric, 2021. "The Necessity of Construct and External Validity for Generalized Causal Claims," OSF Preprints 2s8w5, Center for Open Science.

    More about this item

    JEL classification:

    • C9 - Mathematical and Quantitative Methods - - Design of Experiments
    • C91 - Mathematical and Quantitative Methods - - Design of Experiments - - - Laboratory, Individual Behavior
    • C93 - Mathematical and Quantitative Methods - - Design of Experiments - - - Field Experiments

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nbr:nberwo:25858. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: the person in charge (email available below). General contact details of provider: https://edirc.repec.org/data/nberrus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.