Integrating Risk-Averse and Constrained Reinforcement Learning for Robust Decision-Making in High-Stakes Scenarios

My bibliography Save this article

Integrating Risk-Averse and Constrained Reinforcement Learning for Robust Decision-Making in High-Stakes Scenarios

Author

Listed:

Moiz Ahmad
(Department of Industrial and Manufacturing Engineering, University of Engineering and Technology, Lahore 54700, Pakistan)
Muhammad Babar Ramzan
(School of Engineering and Technology, National Textile University, Faisalabad 37610, Pakistan)
Muhammad Omair
(Department of Materials and Production, Aalborg University, 9220 Aalborg Øst, Denmark)
Muhammad Salman Habib
(Institute of Knowledge Services, Center for Creative Convergence Education, Hanyang University ERICA Campus, Ansan-si 15588, Gyeonggi-do, Republic of Korea)

Registered:

Abstract

This paper considers a risk-averse Markov decision process (MDP) with non-risk constraints as a dynamic optimization framework to ensure robustness against unfavorable outcomes in high-stakes sequential decision-making situations such as disaster response. In this regard, strong duality is proved while making no assumptions on the problem’s convexity. This is necessary for some real-world issues, e.g., in the case of deprivation costs in the context of disaster relief, where convexity cannot be ensured. Our theoretical results imply that the problem can be exactly solved in a dual domain where it becomes convex. Based on our duality results, an augmented Lagrangian-based constraint handling mechanism is also developed for risk-averse reinforcement learning algorithms. The mechanism is proved to be theoretically convergent. Finally, we have also empirically established the convergence of the mechanism using a multi-stage disaster response relief allocation problem while using a fixed negative reward scheme as a benchmark.

Suggested Citation

Moiz Ahmad & Muhammad Babar Ramzan & Muhammad Omair & Muhammad Salman Habib, 2024. "Integrating Risk-Averse and Constrained Reinforcement Learning for Robust Decision-Making in High-Stakes Scenarios," Mathematics, MDPI, vol. 12(13), pages 1-32, June.

Handle: RePEc:gam:jmathe:v:12:y:2024:i:13:p:1954-:d:1420914

Download full text from publisher

References listed on IDEAS

Basso, Rafael & Kulcsár, Balázs & Sanchez-Diaz, Ivan & Qu, Xiaobo, 2022. "Dynamic stochastic electric vehicle routing with safe reinforcement learning," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 157(C).
Shapiro, Alexander & Tekaya, Wajdi & da Costa, Joari Paulo & Soares, Murilo Pereira, 2013. "Risk neutral and risk averse Stochastic Dual Dynamic Programming method," European Journal of Operational Research, Elsevier, vol. 224(2), pages 375-391.
kevin dowd & john cotter, 2011. "Spectral Risk Measures and the Choice of Risk Aversion Function," Papers 1103.5668, arXiv.org.
Kang Boda & Jerzy Filar, 2006. "Time Consistent Dynamic Risk Measures," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 63(1), pages 169-186, February.
Guodong Yu & Aijun Liu & Huiping Sun, 2021. "Risk-averse flexible policy on ambulance allocation in humanitarian operations under uncertainty," International Journal of Production Research, Taylor & Francis Journals, vol. 59(9), pages 2588-2610, May.
Lina Yu & Huasheng Yang & Lixin Miao & Canrong Zhang, 2019. "Rollout algorithms for resource allocation in humanitarian logistics," IISE Transactions, Taylor & Francis Journals, vol. 51(8), pages 887-909, August.
L N Van Wassenhove, 2006. "Humanitarian aid logistics: supply chain management in high gear," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 57(5), pages 475-489, May.
Shi, Tao & Xu, Chang & Dong, Wenhao & Zhou, Hangyu & Bokhari, Awais & Klemeš, Jiří Jaromír & Han, Ning, 2023. "Research on energy management of hydrogen electric coupling system based on deep reinforcement learning," Energy, Elsevier, vol. 282(C).
Anthony Coache & Sebastian Jaimungal & 'Alvaro Cartea, 2022. "Conditionally Elicitable Dynamic Risk Measures for Deep Reinforcement Learning," Papers 2206.14666, arXiv.org, revised May 2023.
Duo Wang & Kai Yang & Lixing Yang, 2023. "Risk-averse two-stage distributionally robust optimisation for logistics planning in disaster relief management," International Journal of Production Research, Taylor & Francis Journals, vol. 61(2), pages 668-691, January.
Constantin Waubert de Puiseau & Richard Meyes & Tobias Meisen, 2022. "On reliability of reinforcement learning based production scheduling systems: a comparative survey," Journal of Intelligent Manufacturing, Springer, vol. 33(4), pages 911-927, April.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Liu, Kanglin & Zhang, Hengliang & Zhang, Zhi-Hai, 2021. "The efficiency, equity and effectiveness of location strategies in humanitarian logistics: A robust chance-constrained approach," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 156(C).
Tanzid Hasnain & Irem Sengul Orgut & Julie Simmons Ivy, 2021. "Elicitation of Preference among Multiple Criteria in Food Distribution by Food Banks," Production and Operations Management, Production and Operations Management Society, vol. 30(12), pages 4475-4500, December.
Joakim Dimoski & Stein-Erik Fleten & Nils Löhndorf & Sveinung Nersten, 2023. "Dynamic hedging for the real option management of hydropower production with exchange rate risks," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 45(2), pages 525-554, June.
Rudloff, Birgit & Street, Alexandre & Valladão, Davi M., 2014. "Time consistency and risk averse dynamic decision models: Definition, interpretation and practical consequences," European Journal of Operational Research, Elsevier, vol. 234(3), pages 743-750.
Daniel R. Jiang & Warren B. Powell, 2018. "Risk-Averse Approximate Dynamic Programming with Quantile-Based Risk Measures," Mathematics of Operations Research, INFORMS, vol. 43(2), pages 554-579, May.
CHEN, Helen S.Y., 2020. "Designing Sustainable Humanitarian Supply Chains," OSF Preprints m82ar, Center for Open Science.
Eunae Yoo & Elliot Rabinovich & Bin Gu, 2020. "The Growth of Follower Networks on Social Media Platforms for Humanitarian Operations," Production and Operations Management, Production and Operations Management Society, vol. 29(12), pages 2696-2715, December.
Xiangyu Cui & Xun Li & Duan Li & Yun Shi, 2014. "Time Consistent Behavior Portfolio Policy for Dynamic Mean-Variance Formulation," Papers 1408.6070, arXiv.org, revised Aug 2015.
Abdul Khabir Rahmat, 2024. "Comparative Analysis of Humanitarian Logistics in Food Redistribution and Volunteer Programs: A Case Study of â€œSiswa Care for the Streetâ€ and â€œDapur Raya â€“ Dapur Jalanan Kuala Lumpurâ€," International Journal of Research and Innovation in Social Science, International Journal of Research and Innovation in Social Science (IJRISS), vol. 8(14), pages 112-118, November.
Firas Rifai, 2018. "Transfer of Knowhow and Experiences from Commercial Logistics into Humanitarian Logistics to Improve Rescue Missions in Disaster Areas," Journal of Management and Sustainability, Canadian Center of Science and Education, vol. 8(3), pages 1-63, August.
Dilsu Binnaz Ozkapici & Mustafa Alp Ertem & Haluk Aygüneş, 2016. "Intermodal humanitarian logistics model based on maritime transportation in Istanbul," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 83(1), pages 345-364, August.
Rafiei, Rezvan & Huang, Kai & Verma, Manish, 2022. "Cash versus in-kind transfer programs in humanitarian operations: An optimization program and a case study," Socio-Economic Planning Sciences, Elsevier, vol. 82(PA).
Yagci Sokat, Kezban & Dolinskaya, Irina S. & Smilowitz, Karen & Bank, Ryan, 2018. "Incomplete information imputation in limited data environments with application to disaster response," European Journal of Operational Research, Elsevier, vol. 269(2), pages 466-485.
Lea Stadtler & Luk N. Wassenhove, 2023. "Between Intensity and Diversity: Leveraging the Role of Place in Cross-Sector Partnerships," Journal of Business Ethics, Springer, vol. 184(4), pages 773-791, May.
Zhang, Tianhao & Dong, Zhe & Huang, Xiaojin, 2024. "Multi-objective optimization of thermal power and outlet steam temperature for a nuclear steam supply system with deep reinforcement learning," Energy, Elsevier, vol. 286(C).
Hu, Shaolong & Han, Chuanfeng & Dong, Zhijie Sasha & Meng, Lingpeng, 2019. "A multi-stage stochastic programming model for relief distribution considering the state of road network," Transportation Research Part B: Methodological, Elsevier, vol. 123(C), pages 64-87.
Davis, Lauren B. & Samanlioglu, Funda & Qu, Xiuli & Root, Sarah, 2013. "Inventory planning and coordination in disaster relief efforts," International Journal of Production Economics, Elsevier, vol. 141(2), pages 561-573.
Hu, Yuzhen & Wang, Min & Guo, Xinghai & Lukinykh, Valery F., 2025. "Pre-occurrence location-allocation-configuration of maritime emergency resources considering shipborne unmanned aerial vehicle (UAV)," Omega, Elsevier, vol. 131(C).
Oloruntoba, Richard, 2010. "An analysis of the Cyclone Larry emergency relief chain: Some key success factors," International Journal of Production Economics, Elsevier, vol. 126(1), pages 85-101, July.
Davi Valladão & Thuener Silva & Marcus Poggi, 2019. "Time-consistent risk-constrained dynamic portfolio optimization with transactional costs and time-dependent returns," Annals of Operations Research, Springer, vol. 282(1), pages 379-405, November.

More about this item

Keywords

robust decision-making; dynamic decision-making; non-convexities; constrained reinforcement learning; augmented Lagrangian; Markov risk;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:13:p:1954-:d:1420914. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Integrating Risk-Averse and Constrained Reinforcement Learning for Robust Decision-Making in High-Stakes Scenarios

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data