Optimizing sequential decision-making under risk: Strategic allocation with switching penalties

My bibliography Save this article

Optimizing sequential decision-making under risk: Strategic allocation with switching penalties

Author

Listed:

Malekipirbazari, Milad

Registered:

Abstract

This paper considers the multiarmed bandit (MAB) problem augmented with a critical real-world consideration: the cost implications of switching decisions. Our work distinguishes itself by addressing the largely unexplored domain of risk-averse MAB problems compounded by switching penalties. Such scenarios are not just theoretical constructs but are reflective of numerous practical applications. Our contribution is threefold: firstly, we explore how switching costs and risk aversion influence decision-making in MAB problems. Secondly, we present novel theoretical results, including the development of the Risk-Averse Switching Index (RASI), which addresses the dual challenges of risk aversion and switching costs, demonstrating its near-optimal efficacy. This heuristic solution method is grounded in dynamic coherent risk measures, enabling a time-consistent evaluation of risk and reward. Lastly, through rigorous numerical experiments, we validate our algorithm’s effectiveness and practical applicability, providing decision-makers with valuable insights and tools for navigating the multifaceted landscape of risk-averse environments with inherent switching costs.

Suggested Citation

Malekipirbazari, Milad, 2025. "Optimizing sequential decision-making under risk: Strategic allocation with switching penalties," European Journal of Operational Research, Elsevier, vol. 321(1), pages 160-176.

Handle: RePEc:eee:ejores:v:321:y:2025:i:1:p:160-176
DOI: 10.1016/j.ejor.2024.09.023

Download full text from publisher

As the access to this document is restricted, you may want to search for a different version of it.

References listed on IDEAS

Grechuk, Bogdan & Zabarankin, Michael, 2016. "Inverse portfolio problem with coherent risk measures," European Journal of Operational Research, Elsevier, vol. 249(2), pages 740-750.
Eric Denardo & Eugene Feinberg & Uriel Rothblum, 2013. "The multi-armed bandit, with constraints," Annals of Operations Research, Springer, vol. 208(1), pages 37-62, September.
Harry Markowitz, 1952. "The Utility of Wealth," Journal of Political Economy, University of Chicago Press, vol. 60(2), pages 151-151.
Andrzej Ruszczyński & Alexander Shapiro, 2006. "Optimization of Convex Risk Functions," Mathematics of Operations Research, INFORMS, vol. 31(3), pages 433-452, August.
- Andrzej Ruszczynski & Alexander Shapiro, 2004. "Optimization of Convex Risk Functions," Risk and Insurance 0404001, University Library of Munich, Germany, revised 08 Oct 2005.
Ogryczak, Wlodzimierz & Ruszczynski, Andrzej, 1999. "From stochastic dominance to mean-risk models: Semideviations as risk measures," European Journal of Operational Research, Elsevier, vol. 116(1), pages 33-50, July.
- W. Ogryczak & A. Ruszczynski, 1997. "From Stochastic Dominance to Mean-Risk Models: Semideviations as Risk Measures," Working Papers ir97027, International Institute for Applied Systems Analysis.
Banks, Jeffrey S & Sundaram, Rangarajan K, 1994. "Switching Costs and the Gittins Index," Econometrica, Econometric Society, vol. 62(3), pages 687-694, May.
Riedel, Frank, 2004. "Dynamic coherent risk measures," Stochastic Processes and their Applications, Elsevier, vol. 112(2), pages 185-200, August.
- Frank Riedel, 2003. "Dynamic Coherent Risk Measures," Working Papers 03004, Stanford University, Department of Economics.
Dimitris Bertsimas & Adam J. Mersereau, 2007. "A Learning Approach for Interactive Marketing to a Customer Segment," Operations Research, INFORMS, vol. 55(6), pages 1120-1135, December.
Dinesh Kumar, U. & Saranga, Haritha, 2010. "Optimal selection of obsolescence mitigation strategies using a restless bandit model," European Journal of Operational Research, Elsevier, vol. 200(1), pages 170-180, January.
Powell, Warren B., 2019. "A unified framework for stochastic optimization," European Journal of Operational Research, Elsevier, vol. 275(3), pages 795-821.
Philippe Artzner & Freddy Delbaen & Jean-Marc Eber & David Heath & Hyejin Ku, 2007. "Coherent multiperiod risk adjusted values and Bellman’s principle," Annals of Operations Research, Springer, vol. 152(1), pages 5-22, July.
Eric V. Denardo & Haechurl Park & Uriel G. Rothblum, 2007. "Risk-Sensitive and Risk-Neutral Multiarmed Bandits," Mathematics of Operations Research, INFORMS, vol. 32(2), pages 374-394, May.
Malekipirbazari, Milad & Çavuş, Özlem, 2024. "Index policy for multiarmed bandit problem with dynamic risk measures," European Journal of Operational Research, Elsevier, vol. 312(2), pages 627-640.
Felipe Caro & Jérémie Gallien, 2007. "Dynamic Assortment with Demand Learning for Seasonal Consumer Goods," Management Science, INFORMS, vol. 53(2), pages 276-292, February.
Andrzej Ruszczyński & Alexander Shapiro, 2006. "Conditional Risk Mappings," Mathematics of Operations Research, INFORMS, vol. 31(3), pages 544-561, August.
- Andrzej Ruszczynski & Alexander Shapiro, 2004. "Conditional Risk Mappings," Risk and Insurance 0404002, University Library of Munich, Germany, revised 08 Oct 2005.
Jean-Philippe Chancelier & Michel De Lara & André de Palma, 2007. "Risk Aversion, Road Choice, and the One-Armed Bandit Problem," Transportation Science, INFORMS, vol. 41(1), pages 1-14, February.
Philippe Artzner & Freddy Delbaen & Jean‐Marc Eber & David Heath, 1999. "Coherent Measures of Risk," Mathematical Finance, Wiley Blackwell, vol. 9(3), pages 203-228, July.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Malekipirbazari, Milad & Çavuş, Özlem, 2024. "Index policy for multiarmed bandit problem with dynamic risk measures," European Journal of Operational Research, Elsevier, vol. 312(2), pages 627-640.
Sıtkı Gülten & Andrzej Ruszczyński, 2015. "Two-stage portfolio optimization with higher-order conditional measures of risk," Annals of Operations Research, Springer, vol. 229(1), pages 409-427, June.
Ricardo Collado & Dávid Papp & Andrzej Ruszczyński, 2012. "Scenario decomposition of risk-averse multistage stochastic programming problems," Annals of Operations Research, Springer, vol. 200(1), pages 147-170, November.
Zachary Feinstein & Birgit Rudloff, 2018. "Scalar multivariate risk measures with a single eligible asset," Papers 1807.10694, arXiv.org, revised Feb 2021.
Zachary Feinstein & Birgit Rudloff, 2018. "Time consistency for scalar multivariate risk measures," Papers 1810.04978, arXiv.org, revised Nov 2021.
Naomi Miller & Andrzej Ruszczyński, 2011. "Risk-Averse Two-Stage Stochastic Linear Programming: Modeling and Decomposition," Operations Research, INFORMS, vol. 59(1), pages 125-132, February.
Andreas H Hamel, 2018. "Monetary Measures of Risk," Papers 1812.04354, arXiv.org.
Mahmutoğulları, Ali İrfan & Çavuş, Özlem & Aktürk, M. Selim, 2018. "Bounds on risk-averse mixed-integer multi-stage stochastic programming problems with mean-CVaR," European Journal of Operational Research, Elsevier, vol. 266(2), pages 595-608.
Özlem Çavuş & Andrzej Ruszczyński, 2014. "Computational Methods for Risk-Averse Undiscounted Transient Markov Models," Operations Research, INFORMS, vol. 62(2), pages 401-417, April.
Zachary Feinstein & Birgit Rudloff, 2012. "Multiportfolio time consistency for set-valued convex and coherent risk measures," Papers 1212.5563, arXiv.org, revised Oct 2014.
Schur, Rouven & Gönsch, Jochen & Hassler, Michael, 2019. "Time-consistent, risk-averse dynamic pricing," European Journal of Operational Research, Elsevier, vol. 277(2), pages 587-603.
Christopher W. Miller & Insoon Yang, 2015. "Optimal Control of Conditional Value-at-Risk in Continuous Time," Papers 1512.05015, arXiv.org, revised Jan 2017.
Henri Gérard & Michel Lara & Jean-Philippe Chancelier, 2020. "Equivalence between time consistency and nested formula," Annals of Operations Research, Springer, vol. 292(2), pages 627-647, September.
Kovacevic Raimund M., 2012. "Conditional risk and acceptability mappings as Banach-lattice valued mappings," Statistics & Risk Modeling, De Gruyter, vol. 29(1), pages 1-18, March.
repec:hum:wpaper:sfb649dp2007-010 is not listed on IDEAS
Qinyu Wu & Fan Yang & Ping Zhang, 2023. "Conditional generalized quantiles based on expected utility model and equivalent characterization of properties," Papers 2301.12420, arXiv.org.
Alois Pichler & Ruben Schlotter, 2020. "Quantification of Risk in Classical Models of Finance," Papers 2004.04397, arXiv.org, revised Feb 2021.
Miller, Naomi & Ruszczynski, Andrzej, 2008. "Risk-adjusted probability measures in portfolio optimization with coherent measures of risk," European Journal of Operational Research, Elsevier, vol. 191(1), pages 193-206, November.
Leippold, Markus & Schärer, Steven, 2017. "Discrete-time option pricing with stochastic liquidity," Journal of Banking & Finance, Elsevier, vol. 75(C), pages 1-16.
- Markus Leippold & Steven Schaerer, 2016. "Discrete-Time Option Pricing with Stochastic Liquidity," Swiss Finance Institute Research Paper Series 16-15, Swiss Finance Institute.
Dan A. Iancu & Marek Petrik & Dharmashankar Subramanian, 2015. "Tight Approximations of Dynamic Risk Measures," Mathematics of Operations Research, INFORMS, vol. 40(3), pages 655-682, March.
Acciaio, Beatrice & Föllmer, Hans & Penner, Irina, 2012. "Risk assessment for uncertain cash flows: model ambiguity, discounting ambiguity, and the role of bubbles," LSE Research Online Documents on Economics 50118, London School of Economics and Political Science, LSE Library.

More about this item

Keywords

Stochastic programming; Multiarmed bandit problem; Switching penalties; Risk-averse decision-making; Dynamic coherent risk measures;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:321:y:2025:i:1:p:160-176. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/eor .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Optimizing sequential decision-making under risk: Strategic allocation with switching penalties

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data