IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2311.14204.html
   My bibliography  Save this paper

Reproducible Aggregation of Sample-Split Statistics

Author

Listed:
  • David M. Ritzwoller
  • Joseph P. Romano

Abstract

Statistical inference is often simplified by sample-splitting. This simplification comes at the cost of the introduction of randomness not native to the data. We propose a simple procedure for sequentially aggregating statistics constructed with multiple splits of the same sample. The user specifies a bound and a nominal error rate. If the procedure is implemented twice on the same data, the nominal error rate approximates the chance that the results differ by more than the bound. We analyze the accuracy of the nominal error rate and illustrate the application of the procedure to several widely applied statistical methods.

Suggested Citation

  • David M. Ritzwoller & Joseph P. Romano, 2023. "Reproducible Aggregation of Sample-Split Statistics," Papers 2311.14204, arXiv.org, revised Dec 2023.
  • Handle: RePEc:arx:papers:2311.14204
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2311.14204
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Robin Dunn & Aaditya Ramdas & Sivaraman Balakrishnan & Larry Wasserman, 2023. "Gaussian universal likelihood ratio testing," Biometrika, Biometrika Trust, vol. 110(2), pages 319-337.
    2. Jiafeng Chen & David M. Ritzwoller, 2021. "Semiparametric Estimation of Long-Term Treatment Effects," Papers 2107.14405, arXiv.org, revised Aug 2023.
    3. Jinyong Hahn, 1998. "On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects," Econometrica, Econometric Society, vol. 66(2), pages 315-332, March.
    4. Jing Lei, 2020. "Cross-Validation With Confidence," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(532), pages 1978-1997, December.
    5. Nitis Mukhopadhyay & Sujay Datta, 1996. "On sequential fixed-width confidence intervals for the mean and second-order expansions of the associated coverage probabilities," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 48(3), pages 497-507, September.
    6. Chen, Jiafeng & Ritzwoller, David M., 2023. "Semiparametric estimation of long-term treatment effects," Journal of Econometrics, Elsevier, vol. 237(2).
    7. Meinshausen, Nicolai & Meier, Lukas & Bühlmann, Peter, 2009. "p-Values for High-Dimensional Regression," Journal of the American Statistical Association, American Statistical Association, vol. 104(488), pages 1671-1681.
    8. Victor Chernozhukov & Mert Demirer & Esther Duflo & Iván Fernández-Val, 2018. "Generic Machine Learning Inference on Heterogeneous Treatment Effects in Randomized Experiments, with an Application to Immunization in India," NBER Working Papers 24678, National Bureau of Economic Research, Inc.
    9. DiCiccio, Cyrus J. & DiCiccio, Thomas J. & Romano, Joseph P., 2020. "Exact tests via multiple data splitting," Statistics & Probability Letters, Elsevier, vol. 166(C).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. David M. Ritzwoller & Vasilis Syrgkanis, 2024. "Simultaneous Inference for Local Structural Parameters with Random Forests," Papers 2405.07860, arXiv.org, revised Sep 2024.
    2. Alexandre Belloni & Victor Chernozhukov & Denis Chetverikov & Christian Hansen & Kengo Kato, 2018. "High-dimensional econometrics and regularized GMM," CeMMAP working papers CWP35/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    3. Simon Calmar Andersen & Louise Beuchert & Phillip Heiler & Helena Skyt Nielsen, 2023. "A Guide to Impact Evaluation under Sample Selection and Missing Data: Teacher's Aides and Adolescent Mental Health," Papers 2308.04963, arXiv.org.
    4. Ganesh Karapakula, 2023. "Stable Probability Weighting: Large-Sample and Finite-Sample Estimation and Inference Methods for Heterogeneous Causal Effects of Multivalued Treatments Under Limited Overlap," Papers 2301.05703, arXiv.org, revised Jan 2023.
    5. Solari, Aldo & Djordjilović, Vera, 2022. "Multi split conformal prediction," Statistics & Probability Letters, Elsevier, vol. 184(C).
    6. Xu, Yang & Zhao, Shishun & Hu, Tao & Sun, Jianguo, 2021. "Variable selection for generalized odds rate mixture cure models with interval-censored failure time data," Computational Statistics & Data Analysis, Elsevier, vol. 156(C).
    7. Nicolaj N. Mühlbach, 2020. "Tree-based Synthetic Control Methods: Consequences of moving the US Embassy," CREATES Research Papers 2020-04, Department of Economics and Business Economics, Aarhus University.
    8. Dettmann, E. & Becker, C. & Schmeißer, C., 2011. "Distance functions for matching in small samples," Computational Statistics & Data Analysis, Elsevier, vol. 55(5), pages 1942-1960, May.
    9. Kyle Colangelo & Ying-Ying Lee, 2019. "Double debiased machine learning nonparametric inference with continuous treatments," CeMMAP working papers CWP72/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    10. Alexander Hijzen & Sébastien Jean & Thierry Mayer, 2011. "The effects at home of initiating production abroad: evidence from matched French firms," Review of World Economics (Weltwirtschaftliches Archiv), Springer;Institut für Weltwirtschaft (Kiel Institute for the World Economy), vol. 147(3), pages 457-483, September.
    11. Sant’Anna, Pedro H.C. & Zhao, Jun, 2020. "Doubly robust difference-in-differences estimators," Journal of Econometrics, Elsevier, vol. 219(1), pages 101-122.
    12. Kitagawa, Toru & Muris, Chris, 2016. "Model averaging in semiparametric estimation of treatment effects," Journal of Econometrics, Elsevier, vol. 193(1), pages 271-289.
    13. M. Adam & O. Bonnet & E. Fize & T. Loisel & M. Rault & L. Wilner, 2023. "How does fuel demand respond to price changes? Quasi-experimental evidence based on high-frequency data," Documents de Travail de l'Insee - INSEE Working Papers 2023-17, Institut National de la Statistique et des Etudes Economiques.
    14. Sung Jae Jun & Sokbae Lee, 2024. "Causal Inference Under Outcome-Based Sampling with Monotonicity Assumptions," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 42(3), pages 998-1009, July.
    15. Abhijit Banerjee & Emily Breza & Esther Duflo & Cynthia Kinnan, 2019. "Can Microfinance Unlock a Poverty Trap for Some Entrepreneurs?," NBER Working Papers 26346, National Bureau of Economic Research, Inc.
    16. Samuel D. Lendle & Meenakshi S. Subbaraman & Mark J. van der Laan, 2013. "Identification and Efficient Estimation of the Natural Direct Effect among the Untreated," Biometrics, The International Biometric Society, vol. 69(2), pages 310-317, June.
    17. Alberto Abadie & Guido W. Imbens, 2002. "Simple and Bias-Corrected Matching Estimators for Average Treatment Effects," NBER Technical Working Papers 0283, National Bureau of Economic Research, Inc.
    18. Piasenti, Stefano & Valente, Marica & Van Veldhuizen, Roel & Pfeifer, Gregor, 2023. "Does Unfairness Hurt Women? The Effects of Losing Unfair Competitions," Working Papers 2023:7, Lund University, Department of Economics.
    19. Shi, Chengchun & Zhou, Yunzhe & Li, Lexin, 2023. "Testing directed acyclic graph via structural, supervised and generative adversarial learning," LSE Research Online Documents on Economics 119446, London School of Economics and Political Science, LSE Library.
    20. Waverly Wei & Maya Petersen & Mark J van der Laan & Zeyu Zheng & Chong Wu & Jingshen Wang, 2023. "Efficient targeted learning of heterogeneous treatment effects for multiple subgroups," Biometrics, The International Biometric Society, vol. 79(3), pages 1934-1946, September.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2311.14204. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.