IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2410.09027.html
   My bibliography  Save this paper

Variance reduction combining pre-experiment and in-experiment data

Author

Listed:
  • Zhexiao Lin
  • Pablo Crespo

Abstract

Online controlled experiments (A/B testing) are essential in data-driven decision-making for many companies. Increasing the sensitivity of these experiments, particularly with a fixed sample size, relies on reducing the variance of the estimator for the average treatment effect (ATE). Existing methods like CUPED and CUPAC use pre-experiment data to reduce variance, but their effectiveness depends on the correlation between the pre-experiment data and the outcome. In contrast, in-experiment data is often more strongly correlated with the outcome and thus more informative. In this paper, we introduce a novel method that combines both pre-experiment and in-experiment data to achieve greater variance reduction than CUPED and CUPAC, without introducing bias or additional computation complexity. We also establish asymptotic theory and provide consistent variance estimators for our method. Applying this method to multiple online experiments at Etsy, we reach substantial variance reduction over CUPAC with the inclusion of only a few in-experiment covariates. These results highlight the potential of our approach to significantly improve experiment sensitivity and accelerate decision-making.

Suggested Citation

  • Zhexiao Lin & Pablo Crespo, 2024. "Variance reduction combining pre-experiment and in-experiment data," Papers 2410.09027, arXiv.org.
  • Handle: RePEc:arx:papers:2410.09027
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2410.09027
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Constantine E. Frangakis & Donald B. Rubin, 2002. "Principal Stratification in Causal Inference," Biometrics, The International Biometric Society, vol. 58(1), pages 21-29, March.
    2. Peng Ding & Jiannan Lu, 2017. "Principal stratification analysis using principal scores," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(3), pages 757-777, June.
    3. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    4. Zhexiao Lin & Peng Ding & Fang Han, 2023. "Estimation Based on Nearest Neighbor Matching: From Density Ratio to Average Treatment Effect," Econometrica, Econometric Society, vol. 91(6), pages 2187-2217, November.
    5. Zhichao Jiang & Shu Yang & Peng Ding, 2022. "Multiply robust estimation of causal effects under principal ignorability," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(4), pages 1423-1445, September.
    6. Susan Athey & Raj Chetty & Guido W. Imbens & Hyunseung Kang, 2019. "The Surrogate Index: Combining Short-Term Proxies to Estimate Long-Term Treatment Effects More Rapidly and Precisely," NBER Working Papers 26463, National Bureau of Economic Research, Inc.
    7. P L Cohen & C B Fogarty, 2024. "No-harm calibration for generalized Oaxaca–Blinder estimators," Biometrika, Biometrika Trust, vol. 111(1), pages 331-338.
    8. Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Park Soojin & Kürüm Esra, 2020. "A Two-Stage Joint Modeling Method for Causal Mediation Analysis in the Presence of Treatment Noncompliance," Journal of Causal Inference, De Gruyter, vol. 8(1), pages 131-149, January.
    2. Brett R. Gordon & Robert Moakler & Florian Zettelmeyer, 2023. "Predictive Incrementality by Experimentation (PIE) for Ad Measurement," Papers 2304.06828, arXiv.org.
    3. Shanshan Luo & Wei Li & Yangbo He, 2023. "Causal inference with outcomes truncated by death in multiarm studies," Biometrics, The International Biometric Society, vol. 79(1), pages 502-513, March.
    4. Park Soojin & Kürüm Esra, 2020. "A Two-Stage Joint Modeling Method for Causal Mediation Analysis in the Presence of Treatment Noncompliance," Journal of Causal Inference, De Gruyter, vol. 8(1), pages 131-149, January.
    5. Choi, Jin-young & Lee, Goeun & Lee, Myoung-jae, 2023. "Endogenous treatment effect for any response conditional on control propensity score," Statistics & Probability Letters, Elsevier, vol. 196(C).
    6. Patrick M. Schnell & Richard Baumgartner & Shahrul Mt‐Isa & Vladimir Svetnik, 2022. "A principal stratification approach to estimating the effect of continuing treatment after observing early outcomes," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), pages 1065-1084, November.
    7. Zhichao Jiang & Shu Yang & Peng Ding, 2022. "Multiply robust estimation of causal effects under principal ignorability," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(4), pages 1423-1445, September.
    8. Avi Feller & Fabrizia Mealli & Luke Miratrix, 2017. "Principal Score Methods: Assumptions, Extensions, and Practical Considerations," Journal of Educational and Behavioral Statistics, , vol. 42(6), pages 726-758, December.
    9. Alexandre Belloni & Victor Chernozhukov & Denis Chetverikov & Christian Hansen & Kengo Kato, 2018. "High-dimensional econometrics and regularized GMM," CeMMAP working papers CWP35/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    10. Ruoxuan Xiong & Allison Koenecke & Michael Powell & Zhu Shen & Joshua T. Vogelstein & Susan Athey, 2021. "Federated Causal Inference in Heterogeneous Observational Data," Papers 2107.11732, arXiv.org, revised Apr 2023.
    11. Susan Athey & Guido W. Imbens & Stefan Wager, 2018. "Approximate residual balancing: debiased inference of average treatment effects in high dimensions," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 80(4), pages 597-623, September.
    12. Davide Viviano & Jelena Bradic, 2019. "Synthetic learner: model-free inference on treatments over time," Papers 1904.01490, arXiv.org, revised Aug 2022.
    13. Pedro Carneiro & Sokbae Lee & Daniel Wilhelm, 2020. "Optimal data collection for randomized control trials," The Econometrics Journal, Royal Economic Society, vol. 23(1), pages 1-31.
    14. Sung Jae Jun & Sokbae Lee, 2024. "Causal Inference Under Outcome-Based Sampling with Monotonicity Assumptions," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 42(3), pages 998-1009, July.
    15. Rachel Axelrod & Daniel Nevo, 2023. "A sensitivity analysis approach for the causal hazard ratio in randomized and observational studies," Biometrics, The International Biometric Society, vol. 79(3), pages 2743-2756, September.
    16. Plamen Nikolov & Hongjian Wang & Kevin Acker, 2020. "Wage premium of Communist Party membership: Evidence from China," Pacific Economic Review, Wiley Blackwell, vol. 25(3), pages 309-338, August.
    17. Yiyi Huo & Yingying Fan & Fang Han, 2023. "On the adaptation of causal forests to manifold data," Papers 2311.16486, arXiv.org, revised Dec 2023.
    18. Michael Lechner, 2023. "Causal Machine Learning and its use for public policy," Swiss Journal of Economics and Statistics, Springer;Swiss Society of Economics and Statistics, vol. 159(1), pages 1-15, December.
    19. Jinglong Zhao, 2024. "Experimental Design For Causal Inference Through An Optimization Lens," Papers 2408.09607, arXiv.org, revised Aug 2024.
    20. Miruna Oprescu & Vasilis Syrgkanis & Zhiwei Steven Wu, 2018. "Orthogonal Random Forest for Causal Inference," Papers 1806.03467, arXiv.org, revised Sep 2019.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2410.09027. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.