IDEAS home Printed from https://ideas.repec.org/a/inm/ormsom/v27y2025i2p369-388.html
   My bibliography  Save this article

Deep Policy Iteration with Integer Programming for Inventory Management

Author

Listed:
  • Pavithra Harsha

    (Thomas J. Watson Research Center, IBM Research, Yorktown Heights, New York 10598)

  • Ashish Jagmohan

    (Merlin Mind, New York, New York 10018)

  • Jayant Kalagnanam

    (Thomas J. Watson Research Center, IBM Research, Yorktown Heights, New York 10598)

  • Brian Quanz

    (Thomas J. Watson Research Center, IBM Research, Yorktown Heights, New York 10598)

  • Divya Singhvi

    (Leonard N. Stern School of Business, New York University, New York, New York 10012)

Abstract

Problem definition : In this paper, we present a reinforcement learning (RL)-based framework for optimizing long-term discounted reward problems with large combinatorial action space and state dependent constraints. These characteristics are common to many operations management problems, for example, network inventory replenishment, where managers have to deal with uncertain demand, lost sales, and capacity constraints that results in more complex feasible action spaces. Our proposed programmable actor RL (PARL) uses a deep-policy iteration method that leverages neural networks to approximate the value function and combines it with mathematical programming and sample average approximation to solve the per-step-action optimally while accounting for combinatorial action spaces and state-dependent constraint sets. Methodology/results : We then show how the proposed methodology can be applied to complex inventory replenishment problems where analytical solutions are intractable. We also benchmark the proposed algorithm against state-of-the-art RL algorithms and commonly used replenishment heuristics and find that the proposed algorithm considerably outperforms existing methods by as much as 14.7% on average in various complex supply chain settings. Managerial implications : We find that this improvement in performance of PARL over benchmark algorithms can be directly attributed to better inventory cost management, especially in inventory constrained settings. Furthermore, in the simpler setting where optimal replenishment policy is tractable or known near optimal heuristics exist, we find that the RL-based policies can learn near optimal policies. Finally, to make RL algorithms more accessible for inventory management researchers, we also discuss the development of a modular Python library that can be used to test the performance of RL algorithms with various supply chain structures. This library can spur future research in developing practical and near-optimal algorithms for inventory management problems.

Suggested Citation

  • Pavithra Harsha & Ashish Jagmohan & Jayant Kalagnanam & Brian Quanz & Divya Singhvi, 2025. "Deep Policy Iteration with Integer Programming for Inventory Management," Manufacturing & Service Operations Management, INFORMS, vol. 27(2), pages 369-388, March.
  • Handle: RePEc:inm:ormsom:v:27:y:2025:i:2:p:369-388
    DOI: 10.1287/msom.2022.0617
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/msom.2022.0617
    Download Restriction: no

    File URL: https://libkey.io/10.1287/msom.2022.0617?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormsom:v:27:y:2025:i:2:p:369-388. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.