Deep Policy Iteration with Integer Programming for Inventory Management

My bibliography Save this article

Deep Policy Iteration with Integer Programming for Inventory Management

Author

Listed:

Pavithra Harsha
(Thomas J. Watson Research Center, IBM Research, Yorktown Heights, New York 10598)
Ashish Jagmohan
(Merlin Mind, New York, New York 10018)
Jayant Kalagnanam
(Thomas J. Watson Research Center, IBM Research, Yorktown Heights, New York 10598)
Brian Quanz
(Thomas J. Watson Research Center, IBM Research, Yorktown Heights, New York 10598)
Divya Singhvi
(Leonard N. Stern School of Business, New York University, New York, New York 10012)

Registered:

Abstract

Problem definition : In this paper, we present a reinforcement learning (RL)-based framework for optimizing long-term discounted reward problems with large combinatorial action space and state dependent constraints. These characteristics are common to many operations management problems, for example, network inventory replenishment, where managers have to deal with uncertain demand, lost sales, and capacity constraints that results in more complex feasible action spaces. Our proposed programmable actor RL (PARL) uses a deep-policy iteration method that leverages neural networks to approximate the value function and combines it with mathematical programming and sample average approximation to solve the per-step-action optimally while accounting for combinatorial action spaces and state-dependent constraint sets. Methodology/results : We then show how the proposed methodology can be applied to complex inventory replenishment problems where analytical solutions are intractable. We also benchmark the proposed algorithm against state-of-the-art RL algorithms and commonly used replenishment heuristics and find that the proposed algorithm considerably outperforms existing methods by as much as 14.7% on average in various complex supply chain settings. Managerial implications : We find that this improvement in performance of PARL over benchmark algorithms can be directly attributed to better inventory cost management, especially in inventory constrained settings. Furthermore, in the simpler setting where optimal replenishment policy is tractable or known near optimal heuristics exist, we find that the RL-based policies can learn near optimal policies. Finally, to make RL algorithms more accessible for inventory management researchers, we also discuss the development of a modular Python library that can be used to test the performance of RL algorithms with various supply chain structures. This library can spur future research in developing practical and near-optimal algorithms for inventory management problems.

Suggested Citation

Pavithra Harsha & Ashish Jagmohan & Jayant Kalagnanam & Brian Quanz & Divya Singhvi, 2025. "Deep Policy Iteration with Integer Programming for Inventory Management," Manufacturing & Service Operations Management, INFORMS, vol. 27(2), pages 369-388, March.

Handle: RePEc:inm:ormsom:v:27:y:2025:i:2:p:369-388
DOI: 10.1287/msom.2022.0617

Download full text from publisher

More about this item

Keywords

multiechelon inventory management; inventory replenishment; deep reinforcement learning;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormsom:v:27:y:2025:i:2:p:369-388. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

We have no bibliographic references for this item. You can help adding them by using this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Deep Policy Iteration with Integer Programming for Inventory Management

Author

Abstract

Suggested Citation

Download full text from publisher

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data