Author
Listed:
- David B. Brown
(Fuqua School of Business, Duke University, Durham, North Carolina 27708)
- Jingwei Zhang
(Fuqua School of Business, Duke University, Durham, North Carolina 27708)
Abstract
We consider a sequential decision problem involving shared resources and signals in which a decision maker repeatedly observes some exogenous information (the signal), modeled as a finite-state Markov process, then allocates a limited amount of a shared resource across a set of projects. The framework includes a number of applications and generalizes Markovian multiarmed bandit problems by (a) incorporating exogenous information through the signal and (b) allowing for more general resource allocation decisions. Such problems are naturally formulated as stochastic dynamic programs (DPs), but solving the DP is impractical unless the number of projects is small. In this paper, we develop a Lagrangian relaxation and a DP formulation of the corresponding fluid relaxation—a dynamic fluid relaxation —that provide upper bounds on the optimal value function as well as a feasible policy. We develop an iterative primal-dual algorithm for solving the dynamic fluid relaxation and analyze the performance of the feasible dynamic fluid policy. Our performance analysis implies, under mild conditions, that the dynamic fluid relaxation bound and feasible policy are asymptotically optimal as the number of projects grows large. Our Lagrangian relaxation uses Lagrange multipliers that depend on the history of past signals in each period: we show that the bounds and analogous policies using restricted forms of Lagrange multipliers (e.g., only depending on the current signal state in each period) in general lead to a performance gap that is linear in the number of projects and thus are not asymptotically optimal in the regime of many projects. We demonstrate the model and results in two applications: (i) a dynamic capital budgeting problem and (ii) a multilocation inventory management problem with limited production capacity and demands that are correlated across locations by a changing market state.
Suggested Citation
David B. Brown & Jingwei Zhang, 2022.
"Dynamic Programs with Shared Resources and Signals: Dynamic Fluid Policies and Asymptotic Optimality,"
Operations Research, INFORMS, vol. 70(5), pages 3015-3033, September.
Handle:
RePEc:inm:oropre:v:70:y:2022:i:5:p:3015-3033
DOI: 10.1287/opre.2021.2181
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:oropre:v:70:y:2022:i:5:p:3015-3033. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.