IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2209.07330.html
   My bibliography  Save this paper

Best Arm Identification with Contextual Information under a Small Gap

Author

Listed:
  • Masahiro Kato
  • Masaaki Imaizumi
  • Takuya Ishihara
  • Toru Kitagawa

Abstract

We study the best-arm identification (BAI) problem with a fixed budget and contextual (covariate) information. In each round of an adaptive experiment, after observing contextual information, we choose a treatment arm using past observations and current context. Our goal is to identify the best treatment arm, which is a treatment arm with the maximal expected reward marginalized over the contextual distribution, with a minimal probability of misidentification. In this study, we consider a class of nonparametric bandit models that converge to location-shift models when the gaps go to zero. First, we derive lower bounds of the misidentification probability for a certain class of strategies and bandit models (probabilistic models of potential outcomes) under a small-gap regime. A small-gap regime is a situation where gaps of the expected rewards between the best and suboptimal treatment arms go to zero, which corresponds to one of the worst cases in identifying the best treatment arm. We then develop the ``Random Sampling (RS)-Augmented Inverse Probability weighting (AIPW) strategy,'' which is asymptotically optimal in the sense that the probability of misidentification under the strategy matches the lower bound when the budget goes to infinity in the small-gap regime. The RS-AIPW strategy consists of the RS rule tracking a target sample allocation ratio and the recommendation rule using the AIPW estimator.

Suggested Citation

  • Masahiro Kato & Masaaki Imaizumi & Takuya Ishihara & Toru Kitagawa, 2022. "Best Arm Identification with Contextual Information under a Small Gap," Papers 2209.07330, arXiv.org, revised Jan 2023.
  • Handle: RePEc:arx:papers:2209.07330
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2209.07330
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Jinyong Hahn & Keisuke Hirano & Dean Karlan, 2011. "Adaptive Experimental Design Using the Propensity Score," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 29(1), pages 96-108, January.
    2. Karlan, Dean & Wood, Daniel H., 2017. "The effect of effectiveness: Donor response to aid effectiveness in a direct mail fundraising experiment," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 66(C), pages 1-8.
    3. Hidehiko Ichimura & Whitney K. Newey, 2022. "The influence function of semiparametric estimators," Quantitative Economics, Econometric Society, vol. 13(1), pages 29-61, January.
    4. Keisuke Hirano & Jack R. Porter, 2009. "Asymptotics for Statistical Treatment Rules," Econometrica, Econometric Society, vol. 77(5), pages 1683-1701, September.
    5. Dehejia, Rajeev H., 2005. "Program evaluation as a decision problem," Journal of Econometrics, Elsevier, vol. 125(1-2), pages 141-173.
    6. Shantanu Gupta & Zachary C. Lipton & David Childers, 2021. "Efficient Online Estimation of Causal Effects by Deciding What to Observe," Papers 2108.09265, arXiv.org, revised Oct 2021.
    7. Maximilian Kasy & Anja Sautmann, 2021. "Adaptive Treatment Assignment in Experiments for Policy Choice," Econometrica, Econometric Society, vol. 89(1), pages 113-132, January.
    8. Toru Kitagawa & Aleksey Tetenov, 2018. "Who Should Be Treated? Empirical Welfare Maximization Methods for Treatment Choice," Econometrica, Econometric Society, vol. 86(2), pages 591-616, March.
    9. Manski, Charles F., 2000. "Identification problems and decisions under ambiguity: Empirical analysis of treatment response and normative analysis of treatment choice," Journal of Econometrics, Elsevier, vol. 95(2), pages 415-442, April.
    10. Maria Dimakopoulou & Zhimei Ren & Zhengyuan Zhou, 2021. "Online Multi-Armed Bandits with Adaptive Inference," Papers 2102.13202, arXiv.org, revised Jun 2021.
    11. Annie Liang & Xiaosheng Mu & Vasilis Syrgkanis, 2019. "Dynamically Aggregating Diverse Information," PIER Working Paper Archive 19-005, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania.
    12. Masahiro Kato & Kaito Ariu, 2021. "The Role of Contextual Information in Best Arm Identification," Papers 2106.14077, arXiv.org, revised Feb 2024.
    13. Masahiro Kato & Kaito Ariu & Masaaki Imaizumi & Masahiro Nomura & Chao Qin, 2022. "Optimal Best Arm Identification in Two-Armed Bandits with a Fixed Budget under a Small Gap," Papers 2201.04469, arXiv.org, revised Dec 2022.
    14. Jinyong Hahn, 1998. "On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects," Econometrica, Econometric Society, vol. 66(2), pages 315-332, March.
    15. Heejung Bang & James M. Robins, 2005. "Doubly Robust Estimation in Missing Data and Causal Inference Models," Biometrics, The International Biometric Society, vol. 61(4), pages 962-973, December.
    16. Athey, Susan & Wager, Stefan, 2017. "Efficient Policy Learning," Research Papers 3506, Stanford University, Graduate School of Business.
    17. Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881.
    18. Kaito Ariu & Masahiro Kato & Junpei Komiyama & Kenichiro McAlinn & Chao Qin, 2021. "Policy Choice and Best Arm Identification: Asymptotic Analysis of Exploration Sampling," Papers 2109.08229, arXiv.org, revised Nov 2021.
    19. Masahiro Kato & Shota Yasui & Kenichiro McAlinn, 2020. "The Adaptive Doubly Robust Estimator for Policy Evaluation in Adaptive Experiments and a Paradox Concerning Logging Policy," Papers 2010.03792, arXiv.org, revised Jun 2021.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Masahiro Kato & Masaaki Imaizumi & Takuya Ishihara & Toru Kitagawa, 2023. "Asymptotically Optimal Fixed-Budget Best Arm Identification with Variance-Dependent Bounds," Papers 2302.02988, arXiv.org, revised Jul 2023.
    2. Davide Viviano & Jelena Bradic, 2020. "Fair Policy Targeting," Papers 2005.12395, arXiv.org, revised Jun 2022.
    3. Masahiro Kato, 2021. "Adaptive Doubly Robust Estimator from Non-stationary Logging Policy under a Convergence of Average Probability," Papers 2102.08975, arXiv.org, revised Mar 2021.
    4. Yuehao Bai & Azeem M. Shaikh & Max Tabord-Meehan, 2024. "A Primer on the Analysis of Randomized Experiments and a Survey of some Recent Advances," Papers 2405.03910, arXiv.org.
    5. Undral Byambadalai, 2022. "Identification and Inference for Welfare Gains without Unconfoundedness," Papers 2207.04314, arXiv.org.
    6. Kock, Anders Bredahl & Preinerstorfer, David & Veliyev, Bezirgen, 2023. "Treatment recommendation with distributional targets," Journal of Econometrics, Elsevier, vol. 234(2), pages 624-646.
    7. Susan Athey & Stefan Wager, 2021. "Policy Learning With Observational Data," Econometrica, Econometric Society, vol. 89(1), pages 133-161, January.
    8. Davide Viviano, 2019. "Policy Targeting under Network Interference," Papers 1906.10258, arXiv.org, revised Apr 2024.
    9. Masahiro Kato, 2023. "Worst-Case Optimal Multi-Armed Gaussian Best Arm Identification with a Fixed Budget," Papers 2310.19788, arXiv.org, revised Mar 2024.
    10. Kyle Colangelo & Ying-Ying Lee, 2019. "Double debiased machine learning nonparametric inference with continuous treatments," CeMMAP working papers CWP72/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    11. Anders Bredahl Kock & Martin Thyrsgaard, 2017. "Optimal sequential treatment allocation," Papers 1705.09952, arXiv.org, revised Aug 2018.
    12. Garbero, Alessandra & Sakos, Grayson & Cerulli, Giovanni, 2023. "Towards data-driven project design: Providing optimal treatment rules for development projects," Socio-Economic Planning Sciences, Elsevier, vol. 89(C).
    13. Susan Athey & Raj Chetty & Guido Imbens, 2020. "Combining Experimental and Observational Data to Estimate Treatment Effects on Long Term Outcomes," Papers 2006.09676, arXiv.org.
    14. Guido W. Imbens & Jeffrey M. Wooldridge, 2009. "Recent Developments in the Econometrics of Program Evaluation," Journal of Economic Literature, American Economic Association, vol. 47(1), pages 5-86, March.
    15. Michael C Knaus, 2022. "Double machine learning-based programme evaluation under unconfoundedness [Econometric methods for program evaluation]," The Econometrics Journal, Royal Economic Society, vol. 25(3), pages 602-627.
    16. Kyle Colangelo & Ying-Ying Lee, 2020. "Double Debiased Machine Learning Nonparametric Inference with Continuous Treatments," Papers 2004.03036, arXiv.org, revised Sep 2023.
    17. Keisuke Hirano & Jack R. Porter, 2016. "Panel Asymptotics and Statistical Decision Theory," The Japanese Economic Review, Japanese Economic Association, vol. 67(1), pages 33-49, March.
    18. Timothy Christensen & Hyungsik Roger Moon & Frank Schorfheide, 2020. "Robust Forecasting," Papers 2011.03153, arXiv.org, revised Dec 2020.
    19. Juliano Assunção & Robert McMillan & Joshua Murphy & Eduardo Souza-Rodrigues, 2019. "Optimal Environmental Targeting in the Amazon Rainforest," NBER Working Papers 25636, National Bureau of Economic Research, Inc.
    20. Anders Bredahl Kock & David Preinerstorfer & Bezirgen Veliyev, 2022. "Functional Sequential Treatment Allocation," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 117(539), pages 1311-1323, September.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2209.07330. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.