IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v180y2023ics0167947322001906.html
   My bibliography  Save this article

Empirical Gittins index strategies with ε-explorations for multi-armed bandit problems

Author

Listed:
  • Li, Xiao
  • Li, Yuqiang
  • Wu, Xianyi

Abstract

The machine learning/statistics literature has so far considered largely multi-armed bandit (MAB) problems in which the rewards from every arm are assumed independent and identically distributed. For more general MAB models in which every arm evolves according to a rewarded Markov process, it is well known the optimal policy is to pull an arm with the highest Gittins index. When the underlying distributions are unknown, an empirical Gittins index rule with ε-exploration (abbreviated as empirical ε-Gittinx index rule) is proposed to solve such MAB problems. This procedure is constructed by combining the idea of ε-exploration (for exploration) and empirical Gittins indices (for exploitation) computed by applying the Largest-Remaining-Index algorithm to the estimated underlying distribution. The convergence of empirical Gittins indices to the true Gittins indices and expected discounted total rewards of the empirical ε-Gittinx index rule to those of the oracle Gittins index rule is provided. A numerical simulation study is demonstrated to show the behavior of the proposed policies, and its performance over the ε-mean reward is discussed.

Suggested Citation

  • Li, Xiao & Li, Yuqiang & Wu, Xianyi, 2023. "Empirical Gittins index strategies with ε-explorations for multi-armed bandit problems," Computational Statistics & Data Analysis, Elsevier, vol. 180(C).
  • Handle: RePEc:eee:csdana:v:180:y:2023:i:c:s0167947322001906
    DOI: 10.1016/j.csda.2022.107610
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947322001906
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2022.107610?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Xiaoqiang Cai & Xianyi Wu & Xian Zhou, 2014. "Optimal Stochastic Scheduling," International Series in Operations Research and Management Science, Springer, edition 127, number 978-1-4899-7405-1, December.
    2. S. Duran & U. Ayesta & I. M. Verloop, 2022. "On the Whittle index of Markov modulated restless bandits," Queueing Systems: Theory and Applications, Springer, vol. 102(3), pages 373-430, December.
    3. Dirk Bergemann & Ulrigh Hege, 2005. "The Financing of Innovation: Learning and Stopping," RAND Journal of Economics, The RAND Corporation, vol. 36(4), pages 719-752, Winter.
    4. Bergemann, Dirk & Hege, Ulrich, 1998. "Venture capital financing, moral hazard, and learning," Journal of Banking & Finance, Elsevier, vol. 22(6-8), pages 703-735, August.
    5. Bank, Peter & Küchler, Christian, 2007. "On Gittins' index theorem in continuous time," Stochastic Processes and their Applications, Elsevier, vol. 117(9), pages 1357-1371, September.
    6. Paat Rusmevichientong & John N. Tsitsiklis, 2010. "Linearly Parameterized Bandits," Mathematics of Operations Research, INFORMS, vol. 35(2), pages 395-411, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Alessandro Spiganti, 2022. "Wealth Inequality and the Exploration of Novel Alternatives," Working Papers 2022:02, Department of Economics, University of Venice "Ca' Foscari".
    2. Johannes Hörner & Larry Samuelson, 2013. "Incentives for experimenting agents," RAND Journal of Economics, RAND Corporation, vol. 44(4), pages 632-663, December.
    3. Dirk Bergemann & Ulrich Hege & Liang Peng, 2008. "Venture Capital and Sequential Investments," Cowles Foundation Discussion Papers 1682R2, Cowles Foundation for Research in Economics, Yale University, revised Oct 2009.
    4. Besanko, David & Tong, Jian & Wu, Jianjun, 2016. "Subsidizing research programs with "if" and "when" uncertainty in the face of severe informational constraints," Discussion Paper Series In Economics And Econometrics 1605, Economics Division, School of Social Sciences, University of Southampton.
    5. Khalil, Fahad & Lawarree, Jacques & Rodivilov, Alexander, 2020. "Learning from failures: Optimal contracts for experimentation and production," Journal of Economic Theory, Elsevier, vol. 190(C).
    6. Arthur Charpentier & Romuald Élie & Carl Remlinger, 2023. "Reinforcement Learning in Economics and Finance," Computational Economics, Springer;Society for Computational Economics, vol. 62(1), pages 425-462, June.
    7. , & ,, 2012. "A principal-agent model of sequential testing," Theoretical Economics, Econometric Society, vol. 7(3), September.
    8. Ramana Nanda & William R. Kerr, 2015. "Financing Innovation," Annual Review of Financial Economics, Annual Reviews, vol. 7(1), pages 445-462, December.
    9. Mikhail Drugov & Rocco Macchiavello, 2014. "Financing Experimentation," American Economic Journal: Microeconomics, American Economic Association, vol. 6(1), pages 315-349, February.
    10. Rodivilov, Alexander, 2022. "Monitoring innovation," Games and Economic Behavior, Elsevier, vol. 135(C), pages 297-326.
    11. Mayer, Simon, 2022. "Financing breakthroughs under failure risk," Journal of Financial Economics, Elsevier, vol. 144(3), pages 807-848.
    12. Xu Tan & Quan Wen, 2020. "Information acquisition and voting with heterogeneous experts," RAND Journal of Economics, RAND Corporation, vol. 51(4), pages 1063-1092, December.
    13. Martin Cripps & Godfrey Keller & Sven Rady, 2000. "Strategic Experimentation: The Case of the Poisson Bandits," Econometric Society World Congress 2000 Contributed Papers 0878, Econometric Society.
    14. Bruno Strulovici, 2010. "Learning While Voting: Determinants of Collective Experimentation," Econometrica, Econometric Society, vol. 78(3), pages 933-971, May.
    15. Dirk Bergemann & Ulrich Hege, 2002. "The Value of Benchmarking," Cowles Foundation Discussion Papers 1379, Cowles Foundation for Research in Economics, Yale University, revised Oct 2002.
    16. Chia-Hui Chen & Junichiro Ishida, 2015. "A Tenure-Clock Problem," ISER Discussion Paper 0919, Institute of Social and Economic Research, Osaka University.
    17. Dahiya, Sandeep & Ray, Korok, 2012. "Staged investments in entrepreneurial financing," Journal of Corporate Finance, Elsevier, vol. 18(5), pages 1193-1216.
    18. Grenadier, Steven R. & Malenko, Andrey & Strebulaev, Ilya A., 2014. "Investment busts, reputation, and the temptation to blend in with the crowd," Journal of Financial Economics, Elsevier, vol. 111(1), pages 137-157.
    19. Alessandro Spiganti, 2020. "Can Starving Start‐ups Beat Fat Labs? A Bandit Model of Innovation with Endogenous Financing Constraint," Scandinavian Journal of Economics, Wiley Blackwell, vol. 122(2), pages 702-731, April.
    20. Arthur Charpentier & Romuald Elie & Carl Remlinger, 2020. "Reinforcement Learning in Economics and Finance," Papers 2003.10014, arXiv.org.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:180:y:2023:i:c:s0167947322001906. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.