Author
Listed:
- Boxiao Chen
(College of Business Administration, University of Illinois Chicago, Chicago, Illinois 60607)
- Jiashuo Jiang
(Department of Industrial Engineering and Decision Analytics, Hong Kong University of Science and Technology, Hong Kong 99907)
- Jiawei Zhang
(Stern School of Business, New York University, New York, New York 10012)
- Zhengyuan Zhou
(Stern School of Business, New York University, New York, New York 10012)
Abstract
We consider a stochastic lost-sales inventory control system with lead time L over a planning horizon T . Supply is uncertain, and it is a function of the order quantity (because of random yield/capacity, etc.). We aim to minimize the T -period cost, a problem that is known to be computationally intractable even under known distributions of demand and supply. In this paper, we assume that both the demand and supply distributions are unknown and develop a computationally efficient online learning algorithm. We show that our algorithm achieves a regret (i.e., the performance gap between the cost of our algorithm and that of an optimal policy over T periods) of O ˜ ( L + T ) when L ≥ Ω ( log T ) . We do so by (1) showing that our algorithm’s cost is higher by at most O ˜ ( L + T ) for any L ≥ 0 compared with an optimal constant-order policy under complete information (a widely used algorithm) and (2) leveraging the latter’s known performance guarantee from the existing literature. To the best of our knowledge, a finite sample O ˜ ( T ) (and polynomial in L ) regret bound when benchmarked against an optimal policy is not known before in the online inventory control literature. A key challenge in this learning problem is that both demand and supply data can be censored; hence, only truncated values are observable. We circumvent this challenge by showing that the data generated under an order quantity q 2 allow us to simulate the performance of not only q 2 but also, q 1 for all q 1 < q 2 , a key observation to obtain sufficient information even under data censoring. By establishing a high-probability coupling argument, we are able to evaluate and compare the performance of different order policies at their steady state within a finite time horizon. Because the problem lacks convexity, commonly used learning algorithms, such as stochastic gradient decent and bisection, cannot be applied, and instead, we develop an active elimination method that adaptively rules out suboptimal solutions.
Suggested Citation
Boxiao Chen & Jiashuo Jiang & Jiawei Zhang & Zhengyuan Zhou, 2024.
"Learning to Order for Inventory Systems with Lost Sales and Uncertain Supplies,"
Management Science, INFORMS, vol. 70(12), pages 8631-8646, December.
Handle:
RePEc:inm:ormnsc:v:70:y:2024:i:12:p:8631-8646
DOI: 10.1287/mnsc.2022.02476
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormnsc:v:70:y:2024:i:12:p:8631-8646. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.