IDEAS home Printed from https://ideas.repec.org/a/eee/ejores/v312y2024i2p627-640.html
   My bibliography  Save this article

Index policy for multiarmed bandit problem with dynamic risk measures

Author

Listed:
  • Malekipirbazari, Milad
  • Çavuş, Özlem

Abstract

The multiarmed bandit problem (MAB) is a classic problem in which a finite amount of resources must be allocated among competing choices with the aim of identifying a policy that maximizes the expected total reward. MAB has a wide range of applications including clinical trials, portfolio design, tuning parameters, internet advertisement, auction mechanisms, adaptive routing in networks, and project management. The classical MAB makes the strong assumption that the decision maker is risk-neutral and indifferent to the variability of the outcome. However, in many real life applications, these assumptions are not met and decision makers are risk-averse. Motivated to resolve this, we study risk-averse control of the multiarmed bandit problem in regard to the concept of dynamic coherent risk measures to determine a policy with the best risk-adjusted total discounted return. In respect of this specific setting, we present a theoretical analysis based on Whittle’s retirement problem and propose a priority-index policy that reduces to the Gittins index when the level of risk-aversion converges to zero. We generalize the restart formulation of the Gittins index to effectively compute these risk-averse allocation indices. Numerical results exhibit the excellent performance of this heuristic approach for two well-known coherent risk measures of first-order mean-semideviation and mean-AVaR. Our experimental studies suggest that there is no guarantee that an index-based optimal policy exists for the risk-averse problem. Nonetheless, our risk-averse allocation indices can achieve optimal or near-optimal policies which in some instances are easier to interpret compared to the exact optimal policy.

Suggested Citation

  • Malekipirbazari, Milad & Çavuş, Özlem, 2024. "Index policy for multiarmed bandit problem with dynamic risk measures," European Journal of Operational Research, Elsevier, vol. 312(2), pages 627-640.
  • Handle: RePEc:eee:ejores:v:312:y:2024:i:2:p:627-640
    DOI: 10.1016/j.ejor.2023.08.004
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0377221723006082
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ejor.2023.08.004?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Riedel, Frank, 2004. "Dynamic coherent risk measures," Stochastic Processes and their Applications, Elsevier, vol. 112(2), pages 185-200, August.
    2. Eric V. Denardo & Haechurl Park & Uriel G. Rothblum, 2007. "Risk-Sensitive and Risk-Neutral Multiarmed Bandits," Mathematics of Operations Research, INFORMS, vol. 32(2), pages 374-394, May.
    3. Andrzej Ruszczyński & Alexander Shapiro, 2006. "Conditional Risk Mappings," Mathematics of Operations Research, INFORMS, vol. 31(3), pages 544-561, August.
    4. Özlem Çavuş & Andrzej Ruszczyński, 2014. "Computational Methods for Risk-Averse Undiscounted Transient Markov Models," Operations Research, INFORMS, vol. 62(2), pages 401-417, April.
    5. Grechuk, Bogdan & Zabarankin, Michael, 2016. "Inverse portfolio problem with coherent risk measures," European Journal of Operational Research, Elsevier, vol. 249(2), pages 740-750.
    6. Jean-Philippe Chancelier & Michel De Lara & André de Palma, 2007. "Risk Aversion, Road Choice, and the One-Armed Bandit Problem," Transportation Science, INFORMS, vol. 41(1), pages 1-14, February.
    7. Sonin, Isaac M., 2008. "A generalized Gittins index for a Markov chain and its recursive calculation," Statistics & Probability Letters, Elsevier, vol. 78(12), pages 1526-1533, September.
    8. Eric Denardo & Eugene Feinberg & Uriel Rothblum, 2013. "The multi-armed bandit, with constraints," Annals of Operations Research, Springer, vol. 208(1), pages 37-62, September.
    9. Talias, Michael A., 2007. "Optimal decision indices for R&D project evaluation in the pharmaceutical industry: Pearson index versus Gittins index," European Journal of Operational Research, Elsevier, vol. 177(2), pages 1105-1112, March.
    10. Philippe Artzner & Freddy Delbaen & Jean‐Marc Eber & David Heath, 1999. "Coherent Measures of Risk," Mathematical Finance, Wiley Blackwell, vol. 9(3), pages 203-228, July.
    11. Powell, Warren B., 2019. "A unified framework for stochastic optimization," European Journal of Operational Research, Elsevier, vol. 275(3), pages 795-821.
    12. Ogryczak, Wlodzimierz & Ruszczynski, Andrzej, 1999. "From stochastic dominance to mean-risk models: Semideviations as risk measures," European Journal of Operational Research, Elsevier, vol. 116(1), pages 33-50, July.
    13. Michael Jong Kim & Andrew E.B. Lim, 2016. "Robust Multiarmed Bandit Problems," Management Science, INFORMS, vol. 62(1), pages 264-285, January.
    14. Andrzej Ruszczyński & Alexander Shapiro, 2006. "Optimization of Convex Risk Functions," Mathematics of Operations Research, INFORMS, vol. 31(3), pages 433-452, August.
    15. Dimitris Bertsimas & José Niño-Mora, 1996. "Conservation Laws, Extended Polymatroids and Multiarmed Bandit Problems; A Polyhedral Approach to Indexable Systems," Mathematics of Operations Research, INFORMS, vol. 21(2), pages 257-306, May.
    16. Michael N. Katehakis & Arthur F. Veinott, 1987. "The Multi-Armed Bandit Problem: Decomposition and Computation," Mathematics of Operations Research, INFORMS, vol. 12(2), pages 262-268, May.
    17. Xiaoguang Huo & Feng Fu, 2017. "Risk-Aware Multi-Armed Bandit Problem with Application to Portfolio Selection," Papers 1709.04415, arXiv.org.
    18. Glazebrook, K. D. & Greatrix, S., 1993. "On scheduling influential stochastic tasks on a single machine," European Journal of Operational Research, Elsevier, vol. 70(3), pages 405-424, November.
    19. Ricardo Collado & Dávid Papp & Andrzej Ruszczyński, 2012. "Scenario decomposition of risk-averse multistage stochastic programming problems," Annals of Operations Research, Springer, vol. 200(1), pages 147-170, November.
    20. Philippe Artzner & Freddy Delbaen & Jean-Marc Eber & David Heath & Hyejin Ku, 2007. "Coherent multiperiod risk adjusted values and Bellman’s principle," Annals of Operations Research, Springer, vol. 152(1), pages 5-22, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sıtkı Gülten & Andrzej Ruszczyński, 2015. "Two-stage portfolio optimization with higher-order conditional measures of risk," Annals of Operations Research, Springer, vol. 229(1), pages 409-427, June.
    2. Mahmutoğulları, Ali İrfan & Çavuş, Özlem & Aktürk, M. Selim, 2018. "Bounds on risk-averse mixed-integer multi-stage stochastic programming problems with mean-CVaR," European Journal of Operational Research, Elsevier, vol. 266(2), pages 595-608.
    3. Ricardo Collado & Dávid Papp & Andrzej Ruszczyński, 2012. "Scenario decomposition of risk-averse multistage stochastic programming problems," Annals of Operations Research, Springer, vol. 200(1), pages 147-170, November.
    4. Schur, Rouven & Gönsch, Jochen & Hassler, Michael, 2019. "Time-consistent, risk-averse dynamic pricing," European Journal of Operational Research, Elsevier, vol. 277(2), pages 587-603.
    5. Zachary Feinstein & Birgit Rudloff, 2018. "Scalar multivariate risk measures with a single eligible asset," Papers 1807.10694, arXiv.org, revised Feb 2021.
    6. Zachary Feinstein & Birgit Rudloff, 2018. "Time consistency for scalar multivariate risk measures," Papers 1810.04978, arXiv.org, revised Nov 2021.
    7. Naomi Miller & Andrzej Ruszczyński, 2011. "Risk-Averse Two-Stage Stochastic Linear Programming: Modeling and Decomposition," Operations Research, INFORMS, vol. 59(1), pages 125-132, February.
    8. Collado, Ricardo & Meisel, Stephan & Priekule, Laura, 2017. "Risk-averse stochastic path detection," European Journal of Operational Research, Elsevier, vol. 260(1), pages 195-211.
    9. Andreas H Hamel, 2018. "Monetary Measures of Risk," Papers 1812.04354, arXiv.org.
    10. Samuel N. Cohen & Tanut Treetanthiploet, 2019. "Gittins' theorem under uncertainty," Papers 1907.05689, arXiv.org, revised Jun 2021.
    11. Özlem Çavuş & Andrzej Ruszczyński, 2014. "Computational Methods for Risk-Averse Undiscounted Transient Markov Models," Operations Research, INFORMS, vol. 62(2), pages 401-417, April.
    12. Esther Frostig & Gideon Weiss, 2016. "Four proofs of Gittins’ multiarmed bandit theorem," Annals of Operations Research, Springer, vol. 241(1), pages 127-165, June.
    13. Zachary Feinstein & Birgit Rudloff, 2012. "Multiportfolio time consistency for set-valued convex and coherent risk measures," Papers 1212.5563, arXiv.org, revised Oct 2014.
    14. Christopher W. Miller & Insoon Yang, 2015. "Optimal Control of Conditional Value-at-Risk in Continuous Time," Papers 1512.05015, arXiv.org, revised Jan 2017.
    15. Henri Gérard & Michel Lara & Jean-Philippe Chancelier, 2020. "Equivalence between time consistency and nested formula," Annals of Operations Research, Springer, vol. 292(2), pages 627-647, September.
    16. Kovacevic Raimund M., 2012. "Conditional risk and acceptability mappings as Banach-lattice valued mappings," Statistics & Risk Modeling, De Gruyter, vol. 29(1), pages 1-18, March.
    17. repec:hum:wpaper:sfb649dp2007-010 is not listed on IDEAS
    18. Qinyu Wu & Fan Yang & Ping Zhang, 2023. "Conditional generalized quantiles based on expected utility model and equivalent characterization of properties," Papers 2301.12420, arXiv.org.
    19. Alois Pichler & Ruben Schlotter, 2020. "Quantification of Risk in Classical Models of Finance," Papers 2004.04397, arXiv.org, revised Feb 2021.
    20. Miller, Naomi & Ruszczynski, Andrzej, 2008. "Risk-adjusted probability measures in portfolio optimization with coherent measures of risk," European Journal of Operational Research, Elsevier, vol. 191(1), pages 193-206, November.
    21. Leippold, Markus & Schärer, Steven, 2017. "Discrete-time option pricing with stochastic liquidity," Journal of Banking & Finance, Elsevier, vol. 75(C), pages 1-16.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:312:y:2024:i:2:p:627-640. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/eor .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.