Deep Reinforcement Learning for Inventory Optimization with Non-Stationary Uncertain Demand

My bibliography Save this paper

Deep Reinforcement Learning for Inventory Optimization with Non-Stationary Uncertain Demand

Author

Listed:

Dehaybe, Henri
(Université catholique de Louvain, LIDAM/CORE, Belgium)
Catanzaro, Daniele
(Université catholique de Louvain, LIDAM/CORE, Belgium)
Chevalier, Philippe
(Université catholique de Louvain, LIDAM/CORE, Belgium)

Registered:

Philippe Chevalier

Abstract

We consider here a single-item lot sizing problem with fixed costs, lead time, and both backorders and lost sales, and we show that, after an appropriate training in randomly generated environments, Deep Reinforcement Learning (DRL) agents can interpolate in real-time near-optimal dynamic policies on instances with a rolling-horizon, provided a previously unseen demand forecast and without the need to periodically resolve the problem. Extensive computational experiments show that the policies provided by these agents compete, and in some circumstances even outperform by several percentage points of gap, those provided by heuristics based on dynamic programming. These results confirm the importance of DRL in the context of inventory control problems and support its use in solving practical instances featuring realistic assumptions.

Suggested Citation

Dehaybe, Henri & Catanzaro, Daniele & Chevalier, Philippe, 2023. "Deep Reinforcement Learning for Inventory Optimization with Non-Stationary Uncertain Demand," LIDAM Reprints CORE 3270, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).

Handle: RePEc:cor:louvrp:3270
DOI: https://doi.org/10.1016/j.ejor.2023.10.007
Note: In: European Journal of Operational Research, 2023

Download full text from publisher

To our knowledge, this item is not available for download. To find whether it is available, there are three options:
1. Check below whether another version of this item is available online.
2. Check on the provider's web page whether it is in fact available.
3. Perform a search for a similarly titled item that would be available.

Other versions of this item:

Dehaybe, Henri & Catanzaro, Daniele & Chevalier, Philippe, 2024. "Deep Reinforcement Learning for inventory optimization with non-stationary uncertain demand," European Journal of Operational Research, Elsevier, vol. 314(2), pages 433-445.

References listed on IDEAS

De Moor, Bram J. & Gijsbrechts, Joren & Boute, Robert N., 2022. "Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management," European Journal of Operational Research, Elsevier, vol. 301(2), pages 535-545.
Evan L. Porteus, 1971. "On the Optimality of Generalized (s, S) Policies," Management Science, INFORMS, vol. 17(7), pages 411-426, March.
Donald L. Iglehart, 1963. "Optimality of (s, S) Policies in the Infinite Horizon Dynamic Inventory Problem," Management Science, INFORMS, vol. 9(2), pages 259-267, January.
Andrew J. Clark & Herbert Scarf, 2004. "Optimal Policies for a Multi-Echelon Inventory Problem," Management Science, INFORMS, vol. 50(12_supple), pages 1782-1790, December.
- Andrew J. Clark & Herbert Scarf, 1960. "Optimal Policies for a Multi-Echelon Inventory Problem," Management Science, INFORMS, vol. 6(4), pages 475-490, July.
James H. Bookbinder & Jin-Yan Tan, 1988. "Strategies for the Probabilistic Lot-Sizing Problem with Service-Level Constraints," Management Science, INFORMS, vol. 34(9), pages 1096-1108, September.
Srinivas Bollapragada & Thomas E. Morton, 1999. "A Simple Heuristic for Computing Nonstationary (s, S) Policies," Operations Research, INFORMS, vol. 47(4), pages 576-584, August.
Boute, Robert N. & Gijsbrechts, Joren & van Jaarsveld, Willem & Vanvuchelen, Nathalie, 2022. "Deep reinforcement learning for inventory control: A roadmap," European Journal of Operational Research, Elsevier, vol. 298(2), pages 401-412.
Amirhosein Norouzi & Reha Uzsoy, 2014. "Modeling the evolution of dependency between demands, with application to inventory planning," IISE Transactions, Taylor & Francis Journals, vol. 46(1), pages 55-66.
Lingxiu Dong & Hau L. Lee, 2003. "Optimal Policies and Approximations for a Serial Multiechelon Inventory System with Time-Correlated Demand," Operations Research, INFORMS, vol. 51(6), pages 969-980, December.
Dural-Selcuk, Gozdem & Rossi, Roberto & Kilic, Onur A. & Tarim, S. Armagan, 2020. "The benefit of receding horizon control: Near-optimal policies for stochastic inventory control," Omega, Elsevier, vol. 97(C).
Steven Nahmias, 1979. "Simple Approximations for a Variety of Dynamic Leadtime Lost-Sales Inventory Models," Operations Research, INFORMS, vol. 27(5), pages 904-924, October.
Stephen C. Graves, 1999. "A Single-Item Inventory Model for a Nonstationary Demand Process," Manufacturing & Service Operations Management, INFORMS, vol. 1(1), pages 50-61.
Stephen C. Graves, 1999. "Addendum to "A Single-Item Inventory Model for a Nonstationary Demand Process"," Manufacturing & Service Operations Management, INFORMS, vol. 1(2), pages 174-174.
Tetsuo Iida & Paul H. Zipkin, 2006. "Approximate Solutions of a Dynamic Forecast-Inventory Model," Manufacturing & Service Operations Management, INFORMS, vol. 8(4), pages 407-425, October.
Xiang, Mengyuan & Rossi, Roberto & Martin-Barragan, Belen & Tarim, S. Armagan, 2018. "Computing non-stationary (s, S) policies using mixed integer linear programming," European Journal of Operational Research, Elsevier, vol. 271(2), pages 490-500.
Hill, Roger M. & Johansen, Soren Glud, 2006. "Optimal and near-optimal policies for lost sales inventory models with at most one replenishment order outstanding," European Journal of Operational Research, Elsevier, vol. 169(1), pages 111-132, February.
Rossi, Roberto & Kilic, Onur A. & Tarim, S. Armagan, 2015. "Piecewise linear approximations for the static–dynamic uncertainty strategy in stochastic lot-sizing," Omega, Elsevier, vol. 50(C), pages 126-140.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Sarkar, Puja & Khanapuri, Vivekanand B. & Tiwari, Manoj Kumar, 2025. "Integration of prediction and optimization for smart stock portfolio selection," European Journal of Operational Research, Elsevier, vol. 321(1), pages 243-256.
Abada, Ibrahim & Lambin, Xavier & Tchakarov, Nikolay, 2024. "Collusion by mistake: Does algorithmic sophistication drive supra-competitive profits?," European Journal of Operational Research, Elsevier, vol. 318(3), pages 927-953.
Bo Zhang & Wen Jun Tan & Wentong Cai & Allan N. Zhang, 2024. "Leveraging Multi-Agent Reinforcement Learning for Digital Transformation in Supply Chain Inventory Optimization," Sustainability, MDPI, vol. 16(22), pages 1-17, November.
Akkerman, Fabian & Prak, Dennis & Mes, Martijn, 2025. "Dynamic reordering and inspection for the multi-item Inventory Record Inaccuracy problem," European Journal of Operational Research, Elsevier, vol. 321(2), pages 428-444.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Xiang, Mengyuan & Rossi, Roberto & Martin-Barragan, Belen & Tarim, S. Armagan, 2023. "A mathematical programming-based solution method for the nonstationary inventory problem under correlated demand," European Journal of Operational Research, Elsevier, vol. 304(2), pages 515-524.
Xiang, Mengyuan & Rossi, Roberto & Martin-Barragan, Belen & Tarim, S. Armagan, 2018. "Computing non-stationary (s, S) policies using mixed integer linear programming," European Journal of Operational Research, Elsevier, vol. 271(2), pages 490-500.
Chen, Zhen & Rossi, Roberto, 2021. "A dynamic ordering policy for a stochastic inventory problem with cash constraints," Omega, Elsevier, vol. 102(C).
Ma, Xiyuan & Rossi, Roberto & Archibald, Thomas Welsh, 2022. "Approximations for non-stationary stochastic lot-sizing under (s,Q)-type policy," European Journal of Operational Research, Elsevier, vol. 298(2), pages 573-584.
Ren, Ke & Bidkhori, Hoda & Shen, Zuo-Jun Max, 2024. "Data-driven inventory policy: Learning from sequentially observed non-stationary data," Omega, Elsevier, vol. 123(C).
Visentin, Andrea & Prestwich, Steven & Rossi, Roberto & Tarim, S. Armagan, 2021. "Computing optimal (R,s,S) policy parameters by a hybrid of branch-and-bound and stochastic dynamic programming," European Journal of Operational Research, Elsevier, vol. 294(1), pages 91-99.
Wang, Zhaodong & Wang, Xin & Ouyang, Yanfeng, 2015. "Bounded growth of the bullwhip effect under a class of nonlinear ordering policies," European Journal of Operational Research, Elsevier, vol. 247(1), pages 72-82.
Alexandre Forel & Martin Grunow, 2023. "Dynamic stochastic lot sizing with forecast evolution in rolling‐horizon planning," Production and Operations Management, Production and Operations Management Society, vol. 32(2), pages 449-468, February.
Van-Anh Truong, 2014. "Approximation Algorithm for the Stochastic Multiperiod Inventory Problem via a Look-Ahead Optimization Approach," Mathematics of Operations Research, INFORMS, vol. 39(4), pages 1039-1056, November.
Gérard P. Cachon & Marshall Fisher, 2000. "Supply Chain Inventory Management and the Value of Shared Information," Management Science, INFORMS, vol. 46(8), pages 1032-1048, August.
Amar Sapra & Van-Anh Truong & Rachel Q. Zhang, 2010. "How Much Demand Should Be Fulfilled?," Operations Research, INFORMS, vol. 58(3), pages 719-733, June.
Tarim, S. Armagan & Smith, Barbara M., 2008. "Constraint programming for computing non-stationary (R,Â S) inventory policies," European Journal of Operational Research, Elsevier, vol. 189(3), pages 1004-1021, September.
Stephen C. Graves & Sean P. Willems, 2008. "Strategic Inventory Placement in Supply Chains: Nonstationary Demand," Manufacturing & Service Operations Management, INFORMS, vol. 10(2), pages 278-287, March.
Gah-Yi Ban, 2020. "Confidence Intervals for Data-Driven Inventory Policies with Demand Censoring," Operations Research, INFORMS, vol. 68(2), pages 309-326, March.
Emilio Carrizosa & Alba V. Olivares-Nadal & Pepa Ramírez-Cobo, 2020. "Embedding the production policy in location-allocation decisions," 4OR, Springer, vol. 18(3), pages 357-380, September.
Rachel Croson & Karen Donohue, 2006. "Behavioral Causes of the Bullwhip Effect and the Observed Value of Inventory Information," Management Science, INFORMS, vol. 52(3), pages 323-336, March.
John J. Neale & Sean P. Willems, 2009. "Managing Inventory in Supply Chains with Nonstationary Demand," Interfaces, INFORMS, vol. 39(5), pages 388-399, October.
Z Hua & J Yang & F Huang & X Xu, 2009. "A static-dynamic strategy for spare part inventory systems with nonstationary stochastic demand," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 60(9), pages 1254-1263, September.
Kilic, Onur A. & Tarim, S. Armagan, 2024. "A simple heuristic for computing non-stationary inventory policies based on function approximation," European Journal of Operational Research, Elsevier, vol. 316(3), pages 899-905.
Dural-Selcuk, Gozdem & Rossi, Roberto & Kilic, Onur A. & Tarim, S. Armagan, 2020. "The benefit of receding horizon control: Near-optimal policies for stochastic inventory control," Omega, Elsevier, vol. 97(C).

More about this item

Keywords

Inventory ; Lot Sizing ; Forecast Evolution ; Deep Reinforcement Learning ; Non-Stationary Demand;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:cor:louvrp:3270. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Alain GILLIS (email available below). General contact details of provider: https://edirc.repec.org/data/coreebe.html .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Deep Reinforcement Learning for Inventory Optimization with Non-Stationary Uncertain Demand

Author

Abstract

Suggested Citation

Download full text from publisher

Other versions of this item:

References listed on IDEAS

Citations

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data