IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2209.06631.html
   My bibliography  Save this paper

Sample Fit Reliability

Author

Listed:
  • Gabriel Okasa
  • Kenneth A. Younge

Abstract

Researchers frequently test and improve model fit by holding a sample constant and varying the model. We propose methods to test and improve sample fit by holding a model constant and varying the sample. Much as the bootstrap is a well-known method to re-sample data and estimate the uncertainty of the fit of parameters in a model, we develop Sample Fit Reliability (SFR) as a set of computational methods to re-sample data and estimate the reliability of the fit of observations in a sample. SFR uses Scoring to assess the reliability of each observation in a sample, Annealing to check the sensitivity of results to removing unreliable data, and Fitting to re-weight observations for more robust analysis. We provide simulation evidence to demonstrate the advantages of using SFR, and we replicate three empirical studies with treatment effects to illustrate how SFR reveals new insights about each study.

Suggested Citation

  • Gabriel Okasa & Kenneth A. Younge, 2022. "Sample Fit Reliability," Papers 2209.06631, arXiv.org.
  • Handle: RePEc:arx:papers:2209.06631
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2209.06631
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Richard K. Crump & V. Joseph Hotz & Guido W. Imbens & Oscar A. Mitnik, 2009. "Dealing with limited overlap in estimation of average treatment effects," Biometrika, Biometrika Trust, vol. 96(1), pages 187-199.
    2. Joshua D. Angrist & Jörn-Steffen Pischke, 2010. "The Credibility Revolution in Empirical Economics: How Better Research Design Is Taking the Con out of Econometrics," Journal of Economic Perspectives, American Economic Association, vol. 24(2), pages 3-30, Spring.
    3. Robert J. LaLonde, 1984. "Evaluating the Econometric Evaluations of Training Programs with Experimental Data," Working Papers 563, Princeton University, Department of Economics, Industrial Relations Section..
    4. Richard K. Crump & V. Joseph Hotz & Guido W. Imbens & Oscar A. Mitnik, 2008. "Nonparametric Tests for Treatment Effect Heterogeneity," The Review of Economics and Statistics, MIT Press, vol. 90(3), pages 389-405, August.
    5. Jan H. Höffler, 2017. "Replication and Economics Journal Policies," American Economic Review, American Economic Association, vol. 107(5), pages 52-55, May.
    6. Nikolas Kuschnig & Gregor Zens & Jesús Crespo Cuaresma, 2021. "Hidden in Plain Sight: Influential Sets in Linear Models," CESifo Working Paper Series 8981, CESifo.
    7. Lechner, Michael, 2018. "Modified Causal Forests for Estimating Heterogeneous Causal Effects," IZA Discussion Papers 12040, Institute of Labor Economics (IZA).
    8. Paul Goldsmith-Pinkham & Peter Hull & Michal Kolesár, 2024. "Contamination Bias in Linear Regressions," American Economic Review, American Economic Association, vol. 114(12), pages 4015-4051, December.
    9. Mark F. J. Steel, 2020. "Model Averaging and Its Use in Economics," Journal of Economic Literature, American Economic Association, vol. 58(3), pages 644-719, September.
    10. Steven Lehrer & Tian Xie, 2017. "Box Office Buzz: Does Social Media Data Steal the Show from Model Uncertainty When Forecasting for Hollywood?," The Review of Economics and Statistics, MIT Press, vol. 99(5), pages 749-755, December.
    11. Matias Busso & John DiNardo & Justin McCrary, 2014. "New Evidence on the Finite Sample Properties of Propensity Score Reweighting and Matching Estimators," The Review of Economics and Statistics, MIT Press, vol. 96(5), pages 885-897, December.
    12. Hugo Bodory & Lorenzo Camponovo & Martin Huber & Michael Lechner, 2020. "The Finite Sample Performance of Inference Methods for Propensity Score Matching and Weighting Estimators," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 38(1), pages 183-200, January.
    13. Leamer, Edward E, 1983. "Let's Take the Con Out of Econometrics," American Economic Review, American Economic Association, vol. 73(1), pages 31-43, March.
    14. JAMES G. MacKINNON, 2006. "Bootstrap Methods in Econometrics," The Economic Record, The Economic Society of Australia, vol. 82(s1), pages 2-18, September.
    15. Dean Karlan & John A. List, 2007. "Does Price Matter in Charitable Giving? Evidence from a Large-Scale Natural Field Experiment," American Economic Review, American Economic Association, vol. 97(5), pages 1774-1793, December.
    16. Manuela Angelucci & Dean Karlan & Jonathan Zinman, 2015. "Microcredit Impacts: Evidence from a Randomized Microcredit Program Placement Experiment by Compartamos Banco," American Economic Journal: Applied Economics, American Economic Association, vol. 7(1), pages 151-182, January.
    17. Gustavo Canavire-Bacarreza & Luis Castro Peñarrieta & Darwin Ugarte Ontiveros, 2021. "Outliers in Semi-Parametric Estimation of Treatment Effects," Econometrics, MDPI, vol. 9(2), pages 1-32, April.
    18. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    19. Michael Lechner & Anthony Strittmatter, 2019. "Practical procedures to deal with common support problems in matching estimation," Econometric Reviews, Taylor & Francis Journals, vol. 38(2), pages 193-207, February.
    20. Susan Athey & Guido W. Imbens, 2019. "Machine Learning Methods That Economists Should Know About," Annual Review of Economics, Annual Reviews, vol. 11(1), pages 685-725, August.
    21. Tobias Cagala & Ulrich Glogowsky & Johannes Rincke & Anthony Strittmatter, 2021. "Optimal Targeting in Fundraising: A Causal Machine-Learning Approach," Papers 2103.10251, arXiv.org, revised Sep 2021.
    22. Iacus, Stefano M. & King, Gary & Porro, Giuseppe, 2011. "Multivariate Matching Methods That Are Monotonic Imbalance Bounding," Journal of the American Statistical Association, American Statistical Association, vol. 106(493), pages 345-361.
    23. Manuel Koller & Werner A. Stahel, 2017. "Nonsingular subsampling for regression S estimators with categorical predictors," Computational Statistics, Springer, vol. 32(2), pages 631-646, June.
    24. X Nie & S Wager, 2021. "Quasi-oracle estimation of heterogeneous treatment effects [TensorFlow: A system for large-scale machine learning]," Biometrika, Biometrika Trust, vol. 108(2), pages 299-319.
    25. Olive, David J. & Hawkins, Douglas M., 2007. "Behavior of elemental sets in regression," Statistics & Probability Letters, Elsevier, vol. 77(6), pages 621-624, March.
    26. James Berry & Lucas C. Coffman & Douglas Hanley & Rania Gihleb & Alistair J. Wilson, 2017. "Assessing the Rate of Replication in Economics," American Economic Review, American Economic Association, vol. 107(5), pages 27-31, May.
    27. LaLonde, Robert J, 1986. "Evaluating the Econometric Evaluations of Training Programs with Experimental Data," American Economic Review, American Economic Association, vol. 76(4), pages 604-620, September.
    28. Alexis Diamond & Jasjeet S. Sekhon, 2013. "Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies," The Review of Economics and Statistics, MIT Press, vol. 95(3), pages 932-945, July.
    29. Rachael Meager, 2019. "Understanding the Average Impact of Microcredit Expansions: A Bayesian Hierarchical Analysis of Seven Randomized Experiments," American Economic Journal: Applied Economics, American Economic Association, vol. 11(1), pages 57-91, January.
    30. Meager, Rachael, 2019. "Understanding the average impact of microcredit expansions: a Bayesian hierarchical analysis of seven randomized experiments," LSE Research Online Documents on Economics 88190, London School of Economics and Political Science, LSE Library.
    31. Hubbard, Raymond & Vetter, Daniel E., 1996. "An empirical comparison of published replication research in accounting, economics, finance, management, and marketing," Journal of Business Research, Elsevier, vol. 35(2), pages 153-164, February.
    32. Richard Anderson & William Greene & B. D. McCullough & H. D. Vinod, 2008. "The role of data/code archives in the future of economic research," Journal of Economic Methodology, Taylor & Francis Journals, vol. 15(1), pages 99-119.
    33. Harrison, David Jr. & Rubinfeld, Daniel L., 1978. "Hedonic housing prices and the demand for clean air," Journal of Environmental Economics and Management, Elsevier, vol. 5(1), pages 81-102, March.
    34. Susan Athey & Guido W. Imbens, 2017. "The State of Applied Econometrics: Causality and Policy Evaluation," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 3-32, Spring.
    35. Jason Abrevaya & Yu-Chin Hsu & Robert P. Lieli, 2015. "Estimating Conditional Average Treatment Effects," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 33(4), pages 485-505, October.
    36. Athey, Susan & Imbens, Guido W., 2019. "Machine Learning Methods Economists Should Know About," Research Papers 3776, Stanford University, Graduate School of Business.
    37. Abhijit Banerjee & Dean Karlan & Jonathan Zinman, 2015. "Six Randomized Evaluations of Microcredit: Introduction and Further Steps," American Economic Journal: Applied Economics, American Economic Association, vol. 7(1), pages 1-21, January.
    38. Leamer, Edward E, 1985. "Sensitivity Analyses Would Help," American Economic Review, American Economic Association, vol. 75(3), pages 308-313, June.
    39. Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881, July.
    40. Ankur Moitra & Dhruv Rohatgi, 2022. "Provably Auditing Ordinary Least Squares in Low Dimensions," Papers 2205.14284, arXiv.org, revised Jun 2022.
    41. DiCiccio, Cyrus J. & Romano, Joseph P. & Wolf, Michael, 2019. "Improving weighted least squares inference," Econometrics and Statistics, Elsevier, vol. 10(C), pages 96-119.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Denis Fougère & Nicolas Jacquemet, 2020. "Policy Evaluation Using Causal Inference Methods," SciencePo Working papers Main hal-03455978, HAL.
    2. Harsh Parikh & Carlos Varjao & Louise Xu & Eric Tchetgen Tchetgen, 2022. "Validating Causal Inference Methods," Papers 2202.04208, arXiv.org, revised Jul 2022.
    3. Michael C Knaus, 2022. "Double machine learning-based programme evaluation under unconfoundedness [Econometric methods for program evaluation]," The Econometrics Journal, Royal Economic Society, vol. 25(3), pages 602-627.
    4. Ganesh Karapakula, 2023. "Stable Probability Weighting: Large-Sample and Finite-Sample Estimation and Inference Methods for Heterogeneous Causal Effects of Multivalued Treatments Under Limited Overlap," Papers 2301.05703, arXiv.org, revised Jan 2023.
    5. Brathwaite, Timothy & Walker, Joan L., 2018. "Causal inference in travel demand modeling (and the lack thereof)," Journal of choice modelling, Elsevier, vol. 26(C), pages 1-18.
    6. Nikolas Kuschnig & Gregor Zens & Jesús Crespo Cuaresma, 2021. "Hidden in Plain Sight: Influential Sets in Linear Models," CESifo Working Paper Series 8981, CESifo.
    7. Gabriel Okasa, 2022. "Meta-Learners for Estimation of Causal Effects: Finite Sample Cross-Fit Performance," Papers 2201.12692, arXiv.org.
    8. Michael Lechner, 2023. "Causal Machine Learning and its use for public policy," Swiss Journal of Economics and Statistics, Springer;Swiss Society of Economics and Statistics, vol. 159(1), pages 1-15, December.
    9. Arun Advani & Toru Kitagawa & Tymon Słoczyński, 2019. "Mostly harmless simulations? Using Monte Carlo studies for estimator selection," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 34(6), pages 893-910, September.
    10. Goller, Daniel & Lechner, Michael & Moczall, Andreas & Wolff, Joachim, 2020. "Does the estimation of the propensity score by machine learning improve matching estimation? The case of Germany's programmes for long term unemployed," Labour Economics, Elsevier, vol. 65(C).
    11. Advani, Arun & Sloczynski, Tymon, 2013. "Mostly Harmless Simulations? On the Internal Validity of Empirical Monte Carlo Studies," IZA Discussion Papers 7874, Institute of Labor Economics (IZA).
    12. Adeola Oyenubi & Martin Wittenberg, 2021. "Does the choice of balance-measure matter under genetic matching?," Empirical Economics, Springer, vol. 61(1), pages 489-502, July.
    13. Michael Lechner & Jana Mareckova, 2022. "Modified Causal Forest," Papers 2209.03744, arXiv.org.
    14. Athey, Susan & Imbens, Guido W. & Metzger, Jonas & Munro, Evan, 2024. "Using Wasserstein Generative Adversarial Networks for the design of Monte Carlo simulations," Journal of Econometrics, Elsevier, vol. 240(2).
    15. Masselus, Lise & Petrik, Christina & Ankel-Peters, Jörg, 2024. "Lost in the Design Space? Construct Validity in the Microfinance Literature," OSF Preprints nwp8k_v1, Center for Open Science.
    16. Daniel Goller, 2023. "Analysing a built-in advantage in asymmetric darts contests using causal machine learning," Annals of Operations Research, Springer, vol. 325(1), pages 649-679, June.
    17. Yiyi Huo & Yingying Fan & Fang Han, 2023. "On the adaptation of causal forests to manifold data," Papers 2311.16486, arXiv.org, revised Dec 2023.
    18. Guido W. Imbens, 2022. "Causality in Econometrics: Choice vs Chance," Econometrica, Econometric Society, vol. 90(6), pages 2541-2566, November.
    19. Mark Kattenberg & Bas Scheer & Jurre Thiel, 2023. "Causal forests with fixed effects for treatment effect heterogeneity in difference-in-differences," CPB Discussion Paper 452, CPB Netherlands Bureau for Economic Policy Analysis.
    20. Cockx, Bart & Lechner, Michael & Bollens, Joost, 2023. "Priority to unemployed immigrants? A causal machine learning evaluation of training in Belgium," Labour Economics, Elsevier, vol. 80(C).

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2209.06631. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.