IDEAS home Printed from https://ideas.repec.org/p/arx/papers/1711.07077.html
   My bibliography  Save this paper

Estimation Considerations in Contextual Bandits

Author

Listed:
  • Maria Dimakopoulou
  • Zhengyuan Zhou
  • Susan Athey
  • Guido Imbens

Abstract

Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models, which can lead to difficult estimation problems along the path of learning. We study a consideration for the exploration vs. exploitation framework that does not arise in multi-armed bandits but is crucial in contextual bandits; the way exploration and exploitation is conducted in the present affects the bias and variance in the potential outcome model estimation in subsequent stages of learning. We develop parametric and non-parametric contextual bandits that integrate balancing methods from the causal inference literature in their estimation to make it less prone to problems of estimation bias. We provide the first regret bound analyses for contextual bandits with balancing in the domain of linear contextual bandits that match the state of the art regret bounds. We demonstrate the strong practical advantage of balanced contextual bandits on a large number of supervised learning datasets and on a synthetic example that simulates model mis-specification and prejudice in the initial training data. Additionally, we develop contextual bandits with simpler assignment policies by leveraging sparse model estimation methods from the econometrics literature and demonstrate empirically that in the early stages they can improve the rate of learning and decrease regret.

Suggested Citation

  • Maria Dimakopoulou & Zhengyuan Zhou & Susan Athey & Guido Imbens, 2017. "Estimation Considerations in Contextual Bandits," Papers 1711.07077, arXiv.org, revised Dec 2018.
  • Handle: RePEc:arx:papers:1711.07077
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/1711.07077
    File Function: Latest version
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Susan Athey & Julie Tibshirani & Stefan Wager, 2016. "Generalized Random Forests," Papers 1610.01271, arXiv.org, revised Apr 2018.
    2. Athey, Susan & Wager, Stefan, 2017. "Efficient Policy Learning," Research Papers 3506, Stanford University, Graduate School of Business.
    3. Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Caio Waisman & Harikesh S. Nair & Carlos Carrion, 2019. "Online Causal Inference for Advertising in Real-Time Bidding Auctions," Papers 1908.08600, arXiv.org, revised Feb 2024.
    2. Yusuke Narita & Shota Yasui & Kohei Yata, 2020. "Debiased Off-Policy Evaluation for Recommendation Systems," Papers 2002.08536, arXiv.org, revised Aug 2021.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Rina Friedberg & Julie Tibshirani & Susan Athey & Stefan Wager, 2018. "Local Linear Forests," Papers 1807.11408, arXiv.org, revised Sep 2020.
    2. Shinde, Nilesh N. & Do Valle, Stella Z. Schons & Maia, Alexandre Gori & Amacher, Gregory S., 2022. "Can an environmental policy contribute to the reduction of land conflict? Evidence from the Rural Environmental Registry (CAR) in the Brazilian Amazon," 2022 Annual Meeting, July 31-August 2, Anaheim, California 322584, Agricultural and Applied Economics Association.
    3. Zhengyuan Zhou & Susan Athey & Stefan Wager, 2023. "Offline Multi-Action Policy Learning: Generalization and Optimization," Operations Research, INFORMS, vol. 71(1), pages 148-183, January.
    4. Susan Athey & Raj Chetty & Guido Imbens, 2020. "Combining Experimental and Observational Data to Estimate Treatment Effects on Long Term Outcomes," Papers 2006.09676, arXiv.org.
    5. Valente, Marica, 2023. "Policy evaluation of waste pricing programs using heterogeneous causal effect estimation," Journal of Environmental Economics and Management, Elsevier, vol. 117(C).
    6. Miruna Oprescu & Vasilis Syrgkanis & Zhiwei Steven Wu, 2018. "Orthogonal Random Forest for Causal Inference," Papers 1806.03467, arXiv.org, revised Sep 2019.
    7. Michael C Knaus, 2022. "Double machine learning-based programme evaluation under unconfoundedness [Econometric methods for program evaluation]," The Econometrics Journal, Royal Economic Society, vol. 25(3), pages 602-627.
    8. Newham, Melissa & Valente, Marica, 2024. "The cost of influence: How gifts to physicians shape prescriptions and drug costs," Journal of Health Economics, Elsevier, vol. 95(C).
    9. Davide Viviano, 2019. "Policy Targeting under Network Interference," Papers 1906.10258, arXiv.org, revised Apr 2024.
    10. Masahiro Kato & Masaaki Imaizumi & Takuya Ishihara & Toru Kitagawa, 2022. "Best Arm Identification with Contextual Information under a Small Gap," Papers 2209.07330, arXiv.org, revised Jan 2023.
    11. Maria Dimakopoulou & Zhimei Ren & Zhengyuan Zhou, 2021. "Online Multi-Armed Bandits with Adaptive Inference," Papers 2102.13202, arXiv.org, revised Jun 2021.
    12. Qingyuan Zhao & Dylan S. Small & Ashkan Ertefaie, 2022. "Selective inference for effect modification via the lasso," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(2), pages 382-413, April.
    13. Vishal Gupta & Brian Rongqing Han & Song-Hee Kim & Hyung Paek, 2020. "Maximizing Intervention Effectiveness," Management Science, INFORMS, vol. 66(12), pages 5576-5598, December.
    14. Nathan Kallus, 2022. "Treatment Effect Risk: Bounds and Inference," Papers 2201.05893, arXiv.org, revised Jul 2022.
    15. Mert Demirer & Vasilis Syrgkanis & Greg Lewis & Victor Chernozhukov, 2019. "Semi-Parametric Efficient Policy Learning with Continuous Actions," CeMMAP working papers CWP34/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    16. Hema Yoganarasimhan & Ebrahim Barzegary & Abhishek Pani, 2020. "Design and Evaluation of Personalized Free Trials," Papers 2006.13420, arXiv.org.
    17. Davide Viviano & Jelena Bradic, 2020. "Fair Policy Targeting," Papers 2005.12395, arXiv.org, revised Jun 2022.
    18. Gabriel Okasa, 2022. "Meta-Learners for Estimation of Causal Effects: Finite Sample Cross-Fit Performance," Papers 2201.12692, arXiv.org.
    19. Bo, Hao & Galiani, Sebastian, 2021. "Assessing external validity," Research in Economics, Elsevier, vol. 75(3), pages 274-285.
    20. Sven Resnjanskij & Jens Ruhose & Simon Wiederhold & Ludger Wößmann, 2021. "Mentoring verbessert die Arbeitsmarktchancen von stark benachteiligten Jugendlichen," ifo Schnelldienst, ifo Institute - Leibniz Institute for Economic Research at the University of Munich, vol. 74(02), pages 31-38, February.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:1711.07077. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.