Empirical Gittins index strategies with ε-explorations for multi-armed bandit problems

My bibliography Save this article

Empirical Gittins index strategies with ε-explorations for multi-armed bandit problems

Author

Listed:

Li, Xiao
Li, Yuqiang
Wu, Xianyi

Registered:

Abstract

The machine learning/statistics literature has so far considered largely multi-armed bandit (MAB) problems in which the rewards from every arm are assumed independent and identically distributed. For more general MAB models in which every arm evolves according to a rewarded Markov process, it is well known the optimal policy is to pull an arm with the highest Gittins index. When the underlying distributions are unknown, an empirical Gittins index rule with ε-exploration (abbreviated as empirical ε-Gittinx index rule) is proposed to solve such MAB problems. This procedure is constructed by combining the idea of ε-exploration (for exploration) and empirical Gittins indices (for exploitation) computed by applying the Largest-Remaining-Index algorithm to the estimated underlying distribution. The convergence of empirical Gittins indices to the true Gittins indices and expected discounted total rewards of the empirical ε-Gittinx index rule to those of the oracle Gittins index rule is provided. A numerical simulation study is demonstrated to show the behavior of the proposed policies, and its performance over the ε-mean reward is discussed.

Suggested Citation

Li, Xiao & Li, Yuqiang & Wu, Xianyi, 2023. "Empirical Gittins index strategies with ε-explorations for multi-armed bandit problems," Computational Statistics & Data Analysis, Elsevier, vol. 180(C).

Handle: RePEc:eee:csdana:v:180:y:2023:i:c:s0167947322001906
DOI: 10.1016/j.csda.2022.107610

Download full text from publisher

As the access to this document is restricted, you may want to search for a different version of it.

References listed on IDEAS

Xiaoqiang Cai & Xianyi Wu & Xian Zhou, 2014. "Optimal Stochastic Scheduling," International Series in Operations Research and Management Science, Springer, edition 127, number 978-1-4899-7405-1, April.
S. Duran & U. Ayesta & I. M. Verloop, 2022. "On the Whittle index of Markov modulated restless bandits," Queueing Systems: Theory and Applications, Springer, vol. 102(3), pages 373-430, December.
Dirk Bergemann & Ulrigh Hege, 2005. "The Financing of Innovation: Learning and Stopping," RAND Journal of Economics, The RAND Corporation, vol. 36(4), pages 719-752, Winter.
- Bergemann, D. & Hege, U., 2001. "The Financing of Innovation : Learning and Stopping," Discussion Paper 2001-16, Tilburg University, Center for Economic Research.
- Bergemann, D. & Hege, U., 2001. "The Financing of Innovation : Learning and Stopping," Other publications TiSEM 85bb8c47-af02-4c41-88b4-0, Tilburg University, School of Economics and Management.
- Hege, Ulrich & Bergemann, Dirk, 2001. "The Financing of Innovation: Learning and Stopping," CEPR Discussion Papers 2763, C.E.P.R. Discussion Papers.
- Ulrich Hege & D. Bergemann, 2012. "The Financing of Innovation: Learning and Stopping," Working Papers hal-00759793, HAL.
- Ulrich Hege & Dirk Bergemann, 2005. "The Financing of Innovation: Learning and Stopping," Post-Print hal-00459926, HAL.
- Dirk Bergemann & Ulrich Hege, 2001. "The Financing of Innovation: Learning and Stopping," Cowles Foundation Discussion Papers 1292, Cowles Foundation for Research in Economics, Yale University.
- Dirk Bergemann & Ulrich Hege, 2001. "The Financing of Innovation: Learning and Stopping," Cowles Foundation Discussion Papers 1292R, Cowles Foundation for Research in Economics, Yale University, revised Oct 2004.
Bergemann, Dirk & Hege, Ulrich, 1998. "Venture capital financing, moral hazard, and learning," Journal of Banking & Finance, Elsevier, vol. 22(6-8), pages 703-735, August.
- Bergemann, Dirk & Hege, Ulrich, 1997. "Venture Capital Financing, Moral Hazard and Learning," CEPR Discussion Papers 1738, C.E.P.R. Discussion Papers.
- Ulrich Hege & Dirk Bergemann, 1998. "Venture capital financing, moral hazard, and learning," Post-Print hal-00481696, HAL.
- Bergemann, D. & Hege, U., 1997. "Venture Capital Financing, Moral Hazard and Learning," Other publications TiSEM d70119dd-1d85-4dde-9d59-1, Tilburg University, School of Economics and Management.
Bank, Peter & Küchler, Christian, 2007. "On Gittins' index theorem in continuous time," Stochastic Processes and their Applications, Elsevier, vol. 117(9), pages 1357-1371, September.
Paat Rusmevichientong & John N. Tsitsiklis, 2010. "Linearly Parameterized Bandits," Mathematics of Operations Research, INFORMS, vol. 35(2), pages 395-411, May.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Alessandro Spiganti, 2022. "Wealth Inequality and the Exploration of Novel Alternatives," Working Papers 2022:02, Department of Economics, University of Venice "Ca' Foscari".
Johannes Hörner & Larry Samuelson, 2013. "Incentives for experimenting agents," RAND Journal of Economics, RAND Corporation, vol. 44(4), pages 632-663, December.
- Johannes Horner & Larry Samuelson, 2009. "Incentives for Experimenting Agents," Cowles Foundation Discussion Papers 1726R, Cowles Foundation for Research in Economics, Yale University, revised Feb 2012.
- Johannes Horner & Larry Samuelson, 2009. "Incentives for Experimenting Agents," Cowles Foundation Discussion Papers 1726R3, Cowles Foundation for Research in Economics, Yale University, revised Jun 2013.
- Johannes Horner & Larry Samuelson, 2009. "Incentives for Experimenting Agents," Cowles Foundation Discussion Papers 1726, Cowles Foundation for Research in Economics, Yale University.
- Johannes Horner & Larry Samuelson, 2012. "Incentives for Experimenting Agents," Levine's Working Paper Archive 786969000000000418, David K. Levine.
- Johannes Horner & Larry Samuelson, 2013. "Incentives for Experimenting Agents," Levine's Working Paper Archive 786969000000000671, David K. Levine.
- Johannes Horner & Larry Samuelson, 2009. "Incentives for Experimenting Agents," Cowles Foundation Discussion Papers 1726R2, Cowles Foundation for Research in Economics, Yale University, revised Mar 2013.
Dirk Bergemann & Ulrich Hege & Liang Peng, 2008. "Venture Capital and Sequential Investments," Cowles Foundation Discussion Papers 1682R2, Cowles Foundation for Research in Economics, Yale University, revised Oct 2009.
- Ulrich Hege, 2011. "Venture Capital and Sequential Investments," Post-Print hal-00577880, HAL.
- Dirk Bergemann & Ulrich Hege & Liang Peng, 2009. "Venture Capital and Sequential Investments," Levine's Working Paper Archive 814577000000000046, David K. Levine.
- Ulrich Hege, 2009. "Venture Capital and Sequential Investments," Post-Print hal-00496178, HAL.
- Ulrich Hege, 2010. "Venture Capital and Sequential Investments," Post-Print hal-00554148, HAL.
- Ulrich Hege, 2011. "Venture Capital and Sequential Investments," Post-Print hal-00577896, HAL.
- Dirk Bergemann & Ulrich Hege & Liang Peng, 2008. "Venture Capital and Sequential Investments," Cowles Foundation Discussion Papers 1682R, Cowles Foundation for Research in Economics, Yale University, revised Mar 2009.
- Dirk Bergemann & Ulrich Hege & Liang Peng, 2008. "Venture Capital and Sequential Investments," Cowles Foundation Discussion Papers 1682, Cowles Foundation for Research in Economics, Yale University, revised Nov 2008.
- Ulrich Hege & Dirk Bergemann & Liang Peng, 2012. "Venture Capital and Sequential Investments," Working Papers hal-00759784, HAL.
- Ulrich Hege, 2011. "Venture Capital and Sequential Investments," Post-Print hal-00577892, HAL.
Sarah Armitage & Noël Bakhtian & Adam Jaffe, 2024. "Innovation Market Failures and the Design of New Climate Policy Instruments," Environmental and Energy Policy and the Economy, University of Chicago Press, vol. 5(1), pages 4-48.
- Sarah Armitage & Noël Bakhtian & Adam B. Jaffe, 2023. "Innovation Market Failures and the Design of New Climate Policy Instruments," NBER Chapters, in: Environmental and Energy Policy and the Economy, volume 5, pages 4-48, National Bureau of Economic Research, Inc.
- Sarah C. Armitage & Noël Bakhtian & Adam B. Jaffe, 2023. "Innovation Market Failures and the Design of New Climate Policy Instruments," NBER Working Papers 31622, National Bureau of Economic Research, Inc.
Nicolas Klein & Tymofiy Mylovanov, 2011. "Should the Flatterers be Avoided?," 2011 Meeting Papers 1273, Society for Economic Dynamics.
Mikhail Drugov & Rocco Macchiavello, 2014. "Financing Experimentation," American Economic Journal: Microeconomics, American Economic Association, vol. 6(1), pages 315-349, February.
- Macchiavello, Rocco, 2013. "Financing Experimentation," The Warwick Economics Research Paper Series (TWERPS) 1025, University of Warwick, Department of Economics.
- Mikhail Drugov & Rocco Macchiavello, 2014. "Financing experimentation," LSE Research Online Documents on Economics 68219, London School of Economics and Political Science, LSE Library.
Besanko, David & Tong, Jian & Wu, Jianjun, 2016. "Subsidizing research programs with "if" and "when" uncertainty in the face of severe informational constraints," Discussion Paper Series In Economics And Econometrics 1605, Economics Division, School of Social Sciences, University of Southampton.
Heidhues, Paul & Rady, Sven & Strack, Philipp, 2015. "Strategic experimentation with private payoffs," Journal of Economic Theory, Elsevier, vol. 159(PA), pages 531-551.
- Heidhues, Paul & Rady, Sven & Strack, Philipp, 2012. "Strategic Experimentation with Private Payoffs," Discussion Paper Series of SFB/TR 15 Governance and the Efficiency of Economic Systems 387, Free University of Berlin, Humboldt University of Berlin, University of Bonn, University of Mannheim, University of Munich.
- Rady, Sven & Heidhues, Paul & Strack, Philipp, 2015. "Strategic Experimentation with Private Payoffs," CEPR Discussion Papers 10634, C.E.P.R. Discussion Papers.
Khalil, Fahad & Lawarree, Jacques & Rodivilov, Alexander, 2020. "Learning from failures: Optimal contracts for experimentation and production," Journal of Economic Theory, Elsevier, vol. 190(C).
- Fahad Khalil & Jacques Lawarree & Alexander Rodivilov, 2018. "Learning from Failures: Optimal Contract for Experimentation and Production," CESifo Working Paper Series 7310, CESifo.
Klein, Nicolas, 2016. "The importance of being honest," Theoretical Economics, Econometric Society, vol. 11(3), September.
Arthur Charpentier & Romuald Élie & Carl Remlinger, 2023. "Reinforcement Learning in Economics and Finance," Computational Economics, Springer;Society for Computational Economics, vol. 62(1), pages 425-462, June.
, & ,, 2012. "A principal-agent model of sequential testing," Theoretical Economics, Econometric Society, vol. 7(3), September.
- Dino Gerardi & Lucas Maestri, 2008. "A Principal-Agent Model of Sequential Testing," Cowles Foundation Discussion Papers 1680, Cowles Foundation for Research in Economics, Yale University.
- Dino Gerardi & Lucas Maestri, 2009. "A Principal-Agent Model of Sequential Testing," Levine's Working Paper Archive 814577000000000076, David K. Levine.
- Dino Gerardi & Lucas Maestri, 2009. "A Principal-Agent Model of Sequential Testing," Carlo Alberto Notebooks 115, Collegio Carlo Alberto.
Ramana Nanda & William R. Kerr, 2015. "Financing Innovation," Annual Review of Financial Economics, Annual Reviews, vol. 7(1), pages 445-462, December.
- William R. Kerr & Ramana Nanda, 2014. "Financing Innovation," NBER Working Papers 20676, National Bureau of Economic Research, Inc.
- Kerr, William R. & Nanda, Ramana, 2015. "Financing innovation," Research Discussion Papers 28/2015, Bank of Finland.
Mikhail Drugov & Rocco Macchiavello, 2014. "Financing Experimentation," American Economic Journal: Microeconomics, American Economic Association, vol. 6(1), pages 315-349, February.
- Macchiavello, Rocco, 2013. "Financing Experimentation," Economic Research Papers 270434, University of Warwick - Department of Economics.
- Drugov, Mikhail & Macchiavello, Rocco, 2014. "Financing experimentation," LSE Research Online Documents on Economics 68219, London School of Economics and Political Science, LSE Library.
Sorensen, Morten, 2007. "Learning by Investing: Evidence from Venture Capital," SIFR Research Report Series 53, Institute for Financial Research.
Rodivilov, Alexander, 2022. "Monitoring innovation," Games and Economic Behavior, Elsevier, vol. 135(C), pages 297-326.
Makoto WATANABE & Yu Awaya & Hiroki Fukai, 2024. "Transparency vs Privacy in Credit Markets," CIGS Working Paper Series 24-022E, The Canon Institute for Global Studies.
repec:zbw:bofrdp:2015_028 is not listed on IDEAS
Mayer, Simon, 2022. "Financing breakthroughs under failure risk," Journal of Financial Economics, Elsevier, vol. 144(3), pages 807-848.
Kanatas George & Stefanadis Christodoulos, 2010. "Can Venture Capital Be a Curse?," The B.E. Journal of Economic Analysis & Policy, De Gruyter, vol. 10(1), pages 1-28, July.
Godfrey Keller & Sven Rady & Martin Cripps, 2005. "Strategic Experimentation with Exponential Bandits," Econometrica, Econometric Society, vol. 73(1), pages 39-68, January.
- Rady, Sven & Cripps, Martin William & Keller, R Godfrey, 2003. "Strategic Experimentation with Exponential Bandits," CEPR Discussion Papers 3814, C.E.P.R. Discussion Papers.
- Godfrey Keller & Martin Cripps & Olin School of Business & Washington University & Sven Rady & Department of Economics & University of Munich, 2003. "Strategic Experimentation with Exponential Bandits," Economics Series Working Papers 143, University of Oxford, Department of Economics.
- Cripps, Martin & Keller, Godfrey & Rady, Sven, 2003. "Strategic Experimentation with Exponential Bandits," Discussion Papers in Economics 4, University of Munich, Department of Economics.

More about this item

Keywords

Multi-armed bandit problem; Reinforcement learning; Rewarded Markov process; Gittins index; Empirical Gittins index;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:180:y:2023:i:c:s0167947322001906. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Empirical Gittins index strategies with ε-explorations for multi-armed bandit problems

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data