IDEAS home Printed from https://ideas.repec.org/a/eee/proeco/v266y2023ics092552732300261x.html
   My bibliography  Save this article

Adaptive inventory replenishment using structured reinforcement learning by exploiting a policy structure

Author

Listed:
  • Park, Hyungjun
  • Choi, Dong Gu
  • Min, Daiki

Abstract

We consider an inventory replenishment problem with unknown and non-stationary demand. We design a structured reinforcement learning algorithm that efficiently adapts the replenishment policy to changing demand without any prior knowledge. Our proposed method integrates the known structural properties of a well-performing inventory replenishment policy with reinforcement learning. By exploiting the policy structure, we tune reinforcement learning to characterize the inventory replenishment policy and approximate the value function. In particular, we propose two methods for stochastic approximation on the gradient of the objective function. These novel reinforcement learning algorithms ensure an efficient convergence rate and lower algorithmic complexity for solving practical problems. The numerical results demonstrate that the proposed algorithms adaptively update the policy to changing demand and lower inventory costs compared to various benchmarks. We also conduct a numerical validation for a South Korean retail shop to validate the practical feasibility of the proposed method. Understanding the policy structure is beneficial for designing reinforcement learning algorithms that can address the inventory replenishment problem. These well-designed reinforcement learning algorithms are particularly promising when we require policy updates based on observations without precise knowledge of non-stationary demand. These research findings could be extended to address the various inventory decisions in which policy structures are available.

Suggested Citation

  • Park, Hyungjun & Choi, Dong Gu & Min, Daiki, 2023. "Adaptive inventory replenishment using structured reinforcement learning by exploiting a policy structure," International Journal of Production Economics, Elsevier, vol. 266(C).
  • Handle: RePEc:eee:proeco:v:266:y:2023:i:c:s092552732300261x
    DOI: 10.1016/j.ijpe.2023.109029
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S092552732300261X
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ijpe.2023.109029?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. De Moor, Bram J. & Gijsbrechts, Joren & Boute, Robert N., 2022. "Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management," European Journal of Operational Research, Elsevier, vol. 301(2), pages 535-545.
    2. S G Johansen & P Melchiors, 2003. "Can-order policy for the periodic-review joint replenishment problem," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 54(3), pages 283-290, March.
    3. Deniz Preil & Michael Krapp, 2023. "Genetic multi-armed bandits: a reinforcement learning approach for discrete optimization via simulation," Papers 2302.07695, arXiv.org.
    4. Edward Ignall, 1969. "Optimal Continuous Review Policies for Two Product Inventory Systems with Joint Setup Costs," Management Science, INFORMS, vol. 15(5), pages 278-283, January.
    5. Erhan Bayraktar & Michael Ludkovski, 2010. "Inventory management with partially observed nonstationary demand," Annals of Operations Research, Springer, vol. 176(1), pages 7-39, April.
    6. Girlich, Hans-Joachim & Barche, Volker, 1991. "On optimal strategies in inventory systems with Wiener demand process," International Journal of Production Economics, Elsevier, vol. 23(1-3), pages 105-110, October.
    7. Jing-Sheng Song & Paul Zipkin, 1993. "Inventory Control in a Fluctuating Demand Environment," Operations Research, INFORMS, vol. 41(2), pages 351-370, April.
    8. Boxiao Chen & Xiuli Chao, 2020. "Dynamic Inventory Control with Stockout Substitution and Demand Learning," Management Science, INFORMS, vol. 66(11), pages 5108-5127, November.
    9. Joseph L. Balintfy, 1964. "On a Basic Class of Multi-Item Inventory Problems," Management Science, INFORMS, vol. 10(2), pages 287-297, January.
    10. Giannoccaro, Ilaria & Pontrandolfo, Pierpaolo, 2002. "Inventory management in supply chains: a reinforcement learning approach," International Journal of Production Economics, Elsevier, vol. 78(2), pages 153-161, July.
    11. N. Bora Keskin & Yuexing Li & Jing-Sheng Song, 2022. "Data-Driven Dynamic Pricing and Ordering with Perishable Inventory in a Changing Environment," Management Science, INFORMS, vol. 68(3), pages 1938-1958, March.
    12. Creemers, Stefan & Boute, Robert, 2022. "The joint replenishment problem: Optimal policy and exact evaluation method," European Journal of Operational Research, Elsevier, vol. 302(3), pages 1175-1188.
    13. Boxiao Chen, 2021. "Data‐Driven Inventory Control with Shifting Demand," Production and Operations Management, Production and Operations Management Society, vol. 30(5), pages 1365-1385, May.
    14. Deniz Preil & Michael Krapp, 2022. "Artificial intelligence-based inventory management: a Monte Carlo tree search approach," Annals of Operations Research, Springer, vol. 308(1), pages 415-439, January.
    15. Tamar Cohen-Hillel & Liron Yedidsion, 2018. "The Periodic Joint Replenishment Problem Is Strongly 𝒩𝒫-Hard," Mathematics of Operations Research, INFORMS, vol. 43(4), pages 1269-1289, November.
    16. Sumit Kunnumkal & Huseyin Topaloglu, 2008. "Exploiting the Structural Properties of the Underlying Markov Decision Problem in the Q-Learning Algorithm," INFORMS Journal on Computing, INFORMS, vol. 20(2), pages 288-301, May.
    17. Woonghee Tim Huh & Paat Rusmevichientong, 2009. "A Nonparametric Asymptotic Analysis of Inventory Planning with Censored Demand," Mathematics of Operations Research, INFORMS, vol. 34(1), pages 103-123, February.
    18. Preil, Deniz & Krapp, Michael, 2022. "Bandit-based inventory optimisation: Reinforcement learning in multi-echelon supply chains," International Journal of Production Economics, Elsevier, vol. 252(C).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jiachen Li & Xingfeng Duan & Zhennan Xiong & Peng Yao, 2024. "Tugboat Scheduling Method Based on the NRPER-DDPG Algorithm: An Integrated DDPG Algorithm with Prioritized Experience Replay and Noise Reduction," Sustainability, MDPI, vol. 16(8), pages 1-27, April.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. De Moor, Bram J. & Creemers, Stefan & Boute, Robert N., 2023. "Breaking truck dominance in supply chains: Proactive freight consolidation and modal split transport," International Journal of Production Economics, Elsevier, vol. 257(C).
    2. Boxiao Chen, 2021. "Data‐Driven Inventory Control with Shifting Demand," Production and Operations Management, Production and Operations Management Society, vol. 30(5), pages 1365-1385, May.
    3. Larsen, Christian, 2009. "The Q(s,S) control policy for the joint replenishment problem extended to the case of correlation among item-demands," International Journal of Production Economics, Elsevier, vol. 118(1), pages 292-297, March.
    4. Tsai, Chieh-Yuan & Tsai, Chi-Yang & Huang, Po-Wen, 2009. "An association clustering algorithm for can-order policies in the joint replenishment problem," International Journal of Production Economics, Elsevier, vol. 117(1), pages 30-41, January.
    5. Satya S. Malladi & Alan L. Erera & Chelsea C. White, 2023. "Inventory control with modulated demand and a partially observed modulation process," Annals of Operations Research, Springer, vol. 321(1), pages 343-369, February.
    6. Padilla Tinoco, Silvia Valeria & Creemers, Stefan & Boute, Robert N., 2017. "Collaborative shipping under different cost-sharing agreements," European Journal of Operational Research, Elsevier, vol. 263(3), pages 827-837.
    7. Kouki, Chaaben & Babai, M. Zied & Jemai, Zied & Minner, Stefan, 2016. "A coordinated multi-item inventory system for perishables with random lifetime," International Journal of Production Economics, Elsevier, vol. 181(PA), pages 226-237.
    8. Dellaert, Nico & van de Poel, Erik, 1996. "Global inventory control in an academic hospital," International Journal of Production Economics, Elsevier, vol. 46(1), pages 277-284, December.
    9. Ricardo Montoya & Carlos Gonzalez, 2019. "A Hidden Markov Model to Detect On-Shelf Out-of-Stocks Using Point-of-Sale Data," Manufacturing & Service Operations Management, INFORMS, vol. 21(4), pages 932-948, October.
    10. Jung, Seung Hwan & Yang, Yunsi, 2023. "On the value of operational flexibility in the trailer shipment and assignment problem: Data-driven approaches and reinforcement learning," International Journal of Production Economics, Elsevier, vol. 264(C).
    11. Larsen, Christian, 2007. "The Q(s,S) control policy for the joint replenishment problem extended to the case of correlation among item-demands," CORAL Working Papers L-2007-01, University of Aarhus, Aarhus School of Business, Department of Business Studies.
    12. Manafzadeh Dizbin, Nima & Tan, Barış, 2020. "Optimal control of production-inventory systems with correlated demand inter-arrival and processing times," International Journal of Production Economics, Elsevier, vol. 228(C).
    13. Yee, Hannah & van Staden, Heletjé E. & Boute, Robert N., 2024. "Dual sourcing under non-stationary demand and partial observability," European Journal of Operational Research, Elsevier, vol. 314(1), pages 94-110.
    14. Kiesmüller, G.P., 2010. "Multi-item inventory control with full truckloads: A comparison of aggregate and individual order triggering," European Journal of Operational Research, Elsevier, vol. 200(1), pages 54-62, January.
    15. Xiangyu Gao & Huanan Zhang, 2022. "An efficient learning framework for multiproduct inventory systems with customer choices," Production and Operations Management, Production and Operations Management Society, vol. 31(6), pages 2492-2516, June.
    16. Stefanny Ramirez & Laurence H. Brandenburg & Dario Bauso, 2023. "Coordinated Replenishment Game and Learning Under Time Dependency and Uncertainty of the Parameters," Dynamic Games and Applications, Springer, vol. 13(1), pages 326-352, March.
    17. Khayyati, Siamak & Tan, Barış, 2020. "Data-driven control of a production system by using marking-dependent threshold policy," International Journal of Production Economics, Elsevier, vol. 226(C).
    18. Rong Li & Jing‐Sheng Jeannette Song & Shuxiao Sun & Xiaona Zheng, 2022. "Fight inventory shrinkage: Simultaneous learning of inventory level and shrinkage rate," Production and Operations Management, Production and Operations Management Society, vol. 31(6), pages 2477-2491, June.
    19. Lee, Loo Hay & Chew, Ek Peng, 2005. "A dynamic joint replenishment policy with auto-correlated demand," European Journal of Operational Research, Elsevier, vol. 165(3), pages 729-747, September.
    20. Erkip, Nesim Kohen, 2023. "Can accessing much data reshape the theory? Inventory theory under the challenge of data-driven systems," European Journal of Operational Research, Elsevier, vol. 308(3), pages 949-959.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:proeco:v:266:y:2023:i:c:s092552732300261x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/ijpe .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.