Multi-Gear Bandits, Partial Conservation Laws, and Indexability

My bibliography Save this article

Multi-Gear Bandits, Partial Conservation Laws, and Indexability

Author

Listed:

José Niño-Mora
(Department of Statistics, Carlos III University of Madrid, 28903 Getafe, Spain)

Registered:

Abstract

This paper considers what we propose to call multi-gear bandits , which are Markov decision processes modeling a generic dynamic and stochastic project fueled by a single resource and which admit multiple actions representing gears of operation naturally ordered by their increasing resource consumption. The optimal operation of a multi-gear bandit aims to strike a balance between project performance costs or rewards and resource usage costs, which depend on the resource price. A computationally convenient and intuitive optimal solution is available when such a model is indexable , meaning that its optimal policies are characterized by a dynamic allocation index (DAI), a function of state–action pairs representing critical resource prices. Motivated by the lack of general indexability conditions and efficient index-computing schemes, and focusing on the infinite-horizon finite-state and -action discounted case, we present a verification theorem ensuring that, if a model satisfies two proposed PCL-indexability conditions with respect to a postulated family of structured policies, then it is indexable and such policies are optimal, with its DAI being given by a marginal productivity index computed by a downshift adaptive-greedy algorithm in A N steps, with A + 1 actions and N states. The DAI is further used as the basis of a new index policy for the multi-armed multi-gear bandit problem .

Suggested Citation

José Niño-Mora, 2022. "Multi-Gear Bandits, Partial Conservation Laws, and Indexability," Mathematics, MDPI, vol. 10(14), pages 1-31, July.

Handle: RePEc:gam:jmathe:v:10:y:2022:i:14:p:2497-:d:865645

Download full text from publisher

References listed on IDEAS

Baric{s} Ata & Shiri Shneorson, 2006. "Dynamic Control of an M/M/1 Service System with Adjustable Arrival and Service Rates," Management Science, INFORMS, vol. 52(11), pages 1778-1791, November.
Bruno Scherrer, 2016. "Improved and Generalized Upper Bounds on the Complexity of Policy Iteration," Mathematics of Operations Research, INFORMS, vol. 41(3), pages 758-774, August.
Richard Weber, 2007. "Comments on: Dynamic priority allocation via restless bandit marginal productivity indices," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 15(2), pages 211-216, December.
E. G. Coffman & I. Mitrani, 1980. "A Characterization of Waiting Time Performance Realizable by Single-Server Queues," Operations Research, INFORMS, vol. 28(3-part-ii), pages 810-821, June.
Dinesh Kumar, U. & Saranga, Haritha, 2010. "Optimal selection of obsolescence mitigation strategies using a restless bandit model," European Journal of Operational Research, Elsevier, vol. 200(1), pages 170-180, January.
Yinyu Ye, 2011. "The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate," Mathematics of Operations Research, INFORMS, vol. 36(4), pages 593-603, November.
J. George Shanthikumar & David D. Yao, 1992. "Multiclass Queueing Systems: Polymatroidal Structure and Optimal Scheduling Control," Operations Research, INFORMS, vol. 40(3-supplem), pages 293-299, June.
Dimitris Bertsimas & José Niño-Mora, 1996. "Conservation Laws, Extended Polymatroids and Multiarmed Bandit Problems; A Polyhedral Approach to Indexable Systems," Mathematics of Operations Research, INFORMS, vol. 21(2), pages 257-306, May.
Turgay Ayer & Can Zhang & Anthony Bonifonte & Anne C. Spaulding & Jagpreet Chhatwal, 2019. "Prioritizing Hepatitis C Treatment in U.S. Prisons," Operations Research, INFORMS, vol. 67(3), pages 853-873, May.
Thomas B. Crabill, 1972. "Optimal Control of a Service Facility with Variable Exponential Service Times and Constant Arrival Rate," Management Science, INFORMS, vol. 18(9), pages 560-566, May.
Abderrahmane Abbou & Viliam Makis, 2019. "Group Maintenance: A Restless Bandits Approach," INFORMS Journal on Computing, INFORMS, vol. 31(4), pages 719-731, October.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

José Niño-Mora, 2006. "Restless Bandit Marginal Productivity Indices, Diminishing Returns, and Optimal Control of Make-to-Order/Make-to-Stock M/G/1 Queues," Mathematics of Operations Research, INFORMS, vol. 31(1), pages 50-84, February.
José Niño-Mora, 2020. "Fast Two-Stage Computation of an Index Policy for Multi-Armed Bandits with Setup Delays," Mathematics, MDPI, vol. 9(1), pages 1-36, December.
Dimitris Bertsimas & Velibor V. Mišić, 2016. "Decomposable Markov Decision Processes: A Fluid Optimization Approach," Operations Research, INFORMS, vol. 64(6), pages 1537-1555, December.
José Niño-Mora, 2020. "A Verification Theorem for Threshold-Indexability of Real-State Discounted Restless Bandits," Mathematics of Operations Research, INFORMS, vol. 45(2), pages 465-496, May.
Shaler Stidham, 2002. "Analysis, Design, and Control of Queueing Systems," Operations Research, INFORMS, vol. 50(1), pages 197-216, February.
Dimitris Bertsimas & José Niño-Mora, 1999. "Optimization of Multiclass Queueing Networks with Changeover Times Via the Achievable Region Approach: Part II, The Multi-Station Case," Mathematics of Operations Research, INFORMS, vol. 24(2), pages 331-361, May.
Dimitris Bertsimas & José Niño-Mora, 2000. "Restless Bandits, Linear Programming Relaxations, and a Primal-Dual Index Heuristic," Operations Research, INFORMS, vol. 48(1), pages 80-90, February.
José Niño-Mora, 2020. "A Fast-Pivoting Algorithm for Whittle’s Restless Bandit Index," Mathematics, MDPI, vol. 8(12), pages 1-21, December.
Dimitris Bertsimas & José Niño-Mora, 1996. "Optimization of multiclass queueing networks with changeover times via the achievable region method: Part II, the multi-station case," Economics Working Papers 314, Department of Economics and Business, Universitat Pompeu Fabra, revised Aug 1998.
Bertsimas, Dimitris., 1995. "The achievable region method in the optimal control of queueing systems : formulations, bounds and policies," Working papers 3837-95., Massachusetts Institute of Technology (MIT), Sloan School of Management.
José Niño-Mora, 2000. "On certain greedoid polyhedra, partially indexable scheduling problems and extended restless bandit allocation indices," Economics Working Papers 456, Department of Economics and Business, Universitat Pompeu Fabra.
R. Garbe & K. D. Glazebrook, 1998. "Submodular Returns and Greedy Heuristics for Queueing Scheduling Problems," Operations Research, INFORMS, vol. 46(3), pages 336-346, June.
Santiago R. Balseiro & Ozan Candogan, 2017. "Optimal Contracts for Intermediaries in Online Advertising," Operations Research, INFORMS, vol. 65(4), pages 878-896, August.
Anupam Gupta & Ravishankar Krishnaswamy & Viswanath Nagarajan & R. Ravi, 2015. "Running Errands in Time: Approximation Algorithms for Stochastic Orienteering," Mathematics of Operations Research, INFORMS, vol. 40(1), pages 56-79, February.
Dimitris Bertsimas & José Niño-Mora, 1996. "Optimization of multiclass queueing networks with changeover times via the achievable region approach: Part I, the single-station case," Economics Working Papers 302, Department of Economics and Business, Universitat Pompeu Fabra, revised Jul 1998.
Barιş Ata & Deishin Lee & Erkut Sönmez, 2019. "Dynamic Volunteer Staffing in Multicrop Gleaning Operations," Operations Research, INFORMS, vol. 67(2), pages 295-314, March.
Kevin D. Glazebrook & José Niño-Mora, 2001. "Parallel Scheduling of Multiclass M/M/m Queues: Approximate and Heavy-Traffic Optimization of Achievable Performance," Operations Research, INFORMS, vol. 49(4), pages 609-623, August.
Eugene A. Feinberg & Jefferson Huang, 2019. "On the reduction of total‐cost and average‐cost MDPs to discounted MDPs," Naval Research Logistics (NRL), John Wiley & Sons, vol. 66(1), pages 38-56, February.
Baris Ata & Deishin Lee & Mustafa Hayri Tongarlak, 2024. "A diffusion model of dynamic participant inflow management," Queueing Systems: Theory and Applications, Springer, vol. 108(3), pages 383-414, December.
Bertsimas, Dimitris. & Niño-Mora, Jose., 1994. "Restless bandit, linear programming relaxations and a primal-dual heuristic," Working papers 3727-94., Massachusetts Institute of Technology (MIT), Sloan School of Management.

More about this item

Keywords

Markov decision process; multi-gear bandits; index policies; indexability; index algorithm;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:10:y:2022:i:14:p:2497-:d:865645. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Multi-Gear Bandits, Partial Conservation Laws, and Indexability

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data