IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i13p1954-d1420914.html
   My bibliography  Save this article

Integrating Risk-Averse and Constrained Reinforcement Learning for Robust Decision-Making in High-Stakes Scenarios

Author

Listed:
  • Moiz Ahmad

    (Department of Industrial and Manufacturing Engineering, University of Engineering and Technology, Lahore 54700, Pakistan)

  • Muhammad Babar Ramzan

    (School of Engineering and Technology, National Textile University, Faisalabad 37610, Pakistan)

  • Muhammad Omair

    (Department of Materials and Production, Aalborg University, 9220 Aalborg Øst, Denmark)

  • Muhammad Salman Habib

    (Institute of Knowledge Services, Center for Creative Convergence Education, Hanyang University ERICA Campus, Ansan-si 15588, Gyeonggi-do, Republic of Korea)

Abstract

This paper considers a risk-averse Markov decision process (MDP) with non-risk constraints as a dynamic optimization framework to ensure robustness against unfavorable outcomes in high-stakes sequential decision-making situations such as disaster response. In this regard, strong duality is proved while making no assumptions on the problem’s convexity. This is necessary for some real-world issues, e.g., in the case of deprivation costs in the context of disaster relief, where convexity cannot be ensured. Our theoretical results imply that the problem can be exactly solved in a dual domain where it becomes convex. Based on our duality results, an augmented Lagrangian-based constraint handling mechanism is also developed for risk-averse reinforcement learning algorithms. The mechanism is proved to be theoretically convergent. Finally, we have also empirically established the convergence of the mechanism using a multi-stage disaster response relief allocation problem while using a fixed negative reward scheme as a benchmark.

Suggested Citation

  • Moiz Ahmad & Muhammad Babar Ramzan & Muhammad Omair & Muhammad Salman Habib, 2024. "Integrating Risk-Averse and Constrained Reinforcement Learning for Robust Decision-Making in High-Stakes Scenarios," Mathematics, MDPI, vol. 12(13), pages 1-32, June.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:13:p:1954-:d:1420914
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/13/1954/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/13/1954/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Basso, Rafael & Kulcsár, Balázs & Sanchez-Diaz, Ivan & Qu, Xiaobo, 2022. "Dynamic stochastic electric vehicle routing with safe reinforcement learning," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 157(C).
    2. Shapiro, Alexander & Tekaya, Wajdi & da Costa, Joari Paulo & Soares, Murilo Pereira, 2013. "Risk neutral and risk averse Stochastic Dual Dynamic Programming method," European Journal of Operational Research, Elsevier, vol. 224(2), pages 375-391.
    3. kevin dowd & john cotter, 2011. "Spectral Risk Measures and the Choice of Risk Aversion Function," Papers 1103.5668, arXiv.org.
    4. Kang Boda & Jerzy Filar, 2006. "Time Consistent Dynamic Risk Measures," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 63(1), pages 169-186, February.
    5. Guodong Yu & Aijun Liu & Huiping Sun, 2021. "Risk-averse flexible policy on ambulance allocation in humanitarian operations under uncertainty," International Journal of Production Research, Taylor & Francis Journals, vol. 59(9), pages 2588-2610, May.
    6. Lina Yu & Huasheng Yang & Lixin Miao & Canrong Zhang, 2019. "Rollout algorithms for resource allocation in humanitarian logistics," IISE Transactions, Taylor & Francis Journals, vol. 51(8), pages 887-909, August.
    7. L N Van Wassenhove, 2006. "Humanitarian aid logistics: supply chain management in high gear," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 57(5), pages 475-489, May.
    8. Shi, Tao & Xu, Chang & Dong, Wenhao & Zhou, Hangyu & Bokhari, Awais & Klemeš, Jiří Jaromír & Han, Ning, 2023. "Research on energy management of hydrogen electric coupling system based on deep reinforcement learning," Energy, Elsevier, vol. 282(C).
    9. Anthony Coache & Sebastian Jaimungal & 'Alvaro Cartea, 2022. "Conditionally Elicitable Dynamic Risk Measures for Deep Reinforcement Learning," Papers 2206.14666, arXiv.org, revised May 2023.
    10. Duo Wang & Kai Yang & Lixing Yang, 2023. "Risk-averse two-stage distributionally robust optimisation for logistics planning in disaster relief management," International Journal of Production Research, Taylor & Francis Journals, vol. 61(2), pages 668-691, January.
    11. Constantin Waubert de Puiseau & Richard Meyes & Tobias Meisen, 2022. "On reliability of reinforcement learning based production scheduling systems: a comparative survey," Journal of Intelligent Manufacturing, Springer, vol. 33(4), pages 911-927, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Rudloff, Birgit & Street, Alexandre & Valladão, Davi M., 2014. "Time consistency and risk averse dynamic decision models: Definition, interpretation and practical consequences," European Journal of Operational Research, Elsevier, vol. 234(3), pages 743-750.
    2. Daniel R. Jiang & Warren B. Powell, 2018. "Risk-Averse Approximate Dynamic Programming with Quantile-Based Risk Measures," Mathematics of Operations Research, INFORMS, vol. 43(2), pages 554-579, May.
    3. Liu, Kanglin & Zhang, Hengliang & Zhang, Zhi-Hai, 2021. "The efficiency, equity and effectiveness of location strategies in humanitarian logistics: A robust chance-constrained approach," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 156(C).
    4. Tanzid Hasnain & Irem Sengul Orgut & Julie Simmons Ivy, 2021. "Elicitation of Preference among Multiple Criteria in Food Distribution by Food Banks," Production and Operations Management, Production and Operations Management Society, vol. 30(12), pages 4475-4500, December.
    5. Joakim Dimoski & Stein-Erik Fleten & Nils Löhndorf & Sveinung Nersten, 2023. "Dynamic hedging for the real option management of hydropower production with exchange rate risks," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 45(2), pages 525-554, June.
    6. CHEN, Helen S.Y., 2020. "Designing Sustainable Humanitarian Supply Chains," OSF Preprints m82ar, Center for Open Science.
    7. de Queiroz, Anderson Rodrigo, 2016. "Stochastic hydro-thermal scheduling optimization: An overview," Renewable and Sustainable Energy Reviews, Elsevier, vol. 62(C), pages 382-395.
    8. Eunae Yoo & Elliot Rabinovich & Bin Gu, 2020. "The Growth of Follower Networks on Social Media Platforms for Humanitarian Operations," Production and Operations Management, Production and Operations Management Society, vol. 29(12), pages 2696-2715, December.
    9. Xiangyu Cui & Xun Li & Duan Li & Yun Shi, 2014. "Time Consistent Behavior Portfolio Policy for Dynamic Mean-Variance Formulation," Papers 1408.6070, arXiv.org, revised Aug 2015.
    10. Rameshwar Dubey & Nezih Altay & Constantin Blome, 2019. "Swift trust and commitment: The missing links for humanitarian supply chain coordination?," Annals of Operations Research, Springer, vol. 283(1), pages 159-177, December.
    11. A. Anaya-Arenas & J. Renaud & A. Ruiz, 2014. "Relief distribution networks: a systematic review," Annals of Operations Research, Springer, vol. 223(1), pages 53-79, December.
    12. Firas Rifai, 2018. "Transfer of Knowhow and Experiences from Commercial Logistics into Humanitarian Logistics to Improve Rescue Missions in Disaster Areas," Journal of Management and Sustainability, Canadian Center of Science and Education, vol. 8(3), pages 1-63, August.
    13. Dilsu Binnaz Ozkapici & Mustafa Alp Ertem & Haluk Aygüneş, 2016. "Intermodal humanitarian logistics model based on maritime transportation in Istanbul," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 83(1), pages 345-364, August.
    14. Gauvin, Charles & Delage, Erick & Gendreau, Michel, 2017. "Decision rule approximations for the risk averse reservoir management problem," European Journal of Operational Research, Elsevier, vol. 261(1), pages 317-336.
    15. Félicia Saïah & Diego Vega & Harwin de Vries & Joakim Kembro, 2023. "Process modularity, supply chain responsiveness, and moderators: The Médecins Sans Frontières response to the Covid‐19 pandemic," Production and Operations Management, Production and Operations Management Society, vol. 32(5), pages 1490-1511, May.
    16. Rafiei, Rezvan & Huang, Kai & Verma, Manish, 2022. "Cash versus in-kind transfer programs in humanitarian operations: An optimization program and a case study," Socio-Economic Planning Sciences, Elsevier, vol. 82(PA).
    17. Loïc Cohen, 2016. "The outsourcing decision process in humanitarian supply chain management evaluated through the TCE and RBV principles," Post-Print hal-01471643, HAL.
    18. Carland, Corinne & Goentzel, Jarrod & Montibeller, Gilberto, 2018. "Modeling the values of private sector agents in multi-echelon humanitarian supply chains," European Journal of Operational Research, Elsevier, vol. 269(2), pages 532-543.
    19. Yagci Sokat, Kezban & Dolinskaya, Irina S. & Smilowitz, Karen & Bank, Ryan, 2018. "Incomplete information imputation in limited data environments with application to disaster response," European Journal of Operational Research, Elsevier, vol. 269(2), pages 466-485.
    20. Lea Stadtler & Luk N. Wassenhove, 2023. "Between Intensity and Diversity: Leveraging the Role of Place in Cross-Sector Partnerships," Journal of Business Ethics, Springer, vol. 184(4), pages 773-791, May.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:13:p:1954-:d:1420914. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.