IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v11y2023i7p1639-d1109797.html
   My bibliography  Save this article

Markovian Restless Bandits and Index Policies: A Review

Author

Listed:
  • José Niño-Mora

    (Department of Statistics, Carlos III University of Madrid, 28903 Getafe, Madrid, Spain)

Abstract

The restless multi-armed bandit problem is a paradigmatic modeling framework for optimal dynamic priority allocation in stochastic models of wide-ranging applications that has been widely investigated and applied since its inception in a seminal paper by Whittle in the late 1980s. The problem has generated a vast and fast-growing literature from which a significant sample is thematically organized and reviewed in this paper. While the main focus is on priority-index policies due to their intuitive appeal, tractability, asymptotic optimality properties, and often strong empirical performance, other lines of work are also reviewed. Theoretical and algorithmic developments are discussed, along with diverse applications. The main goals are to highlight the remarkable breadth of work that has been carried out on the topic and to stimulate further research in the field.

Suggested Citation

  • José Niño-Mora, 2023. "Markovian Restless Bandits and Index Policies: A Review," Mathematics, MDPI, vol. 11(7), pages 1-27, March.
  • Handle: RePEc:gam:jmathe:v:11:y:2023:i:7:p:1639-:d:1109797
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/11/7/1639/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/11/7/1639/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Vivek S. Borkar & Sarath Pattathil, 2022. "Whittle indexability in egalitarian processor sharing systems," Annals of Operations Research, Springer, vol. 317(2), pages 417-437, October.
    2. Dimitris Bertsimas & José Niño-Mora, 2000. "Restless Bandits, Linear Programming Relaxations, and a Primal-Dual Index Heuristic," Operations Research, INFORMS, vol. 48(1), pages 80-90, February.
    3. Banks, Jeffrey S & Sundaram, Rangarajan K, 1994. "Switching Costs and the Gittins Index," Econometrica, Econometric Society, vol. 62(3), pages 687-694, May.
    4. Roland Fryer & Philipp Harms, 2018. "Two-Armed Restless Bandits with Imperfect Information: Stochastic Control and Indexability," Mathematics of Operations Research, INFORMS, vol. 43(2), pages 399-427, May.
    5. Daniel Adelman & Adam J. Mersereau, 2008. "Relaxations of Weakly Coupled Stochastic Dynamic Programs," Operations Research, INFORMS, vol. 56(3), pages 712-727, June.
    6. Christos H. Papadimitriou & John N. Tsitsiklis, 1999. "The Complexity of Optimal Queuing Network Control," Mathematics of Operations Research, INFORMS, vol. 24(2), pages 293-305, May.
    7. David B. Brown & James E. Smith, 2013. "Optimal Sequential Exploration: Bandits, Clairvoyants, and Wildcats," Operations Research, INFORMS, vol. 61(3), pages 644-665, June.
    8. Richard Weber, 2007. "Comments on: Dynamic priority allocation via restless bandit marginal productivity indices," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 15(2), pages 211-216, December.
    9. David B. Brown & James E. Smith, 2020. "Index Policies and Performance Bounds for Dynamic Selection Problems," Management Science, INFORMS, vol. 66(7), pages 3029-3050, July.
    10. P. S. Ansell & K. D. Glazebrook & J. Niño-Mora & M. O'Keeffe, 2003. "Whittle's index policy for a multi-class queueing system with convex holding costs," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 57(1), pages 21-39, April.
    11. Xu, Jianyu & Chen, Lujie & Tang, Ou, 2021. "An online algorithm for the risk-aware restless bandit," European Journal of Operational Research, Elsevier, vol. 290(2), pages 622-639.
    12. Felipe Caro & Jérémie Gallien, 2007. "Dynamic Assortment with Demand Learning for Seasonal Consumer Goods," Management Science, INFORMS, vol. 53(2), pages 276-292, February.
    13. Meilijson, Isaac & Weiss, Gideon, 1977. "Multiple feedback at a single-server station," Stochastic Processes and their Applications, Elsevier, vol. 5(2), pages 195-205, May.
    14. Dimitris Bertsimas & José Niño-Mora, 1996. "Conservation Laws, Extended Polymatroids and Multiarmed Bandit Problems; A Polyhedral Approach to Indexable Systems," Mathematics of Operations Research, INFORMS, vol. 21(2), pages 257-306, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. José Niño-Mora, 2006. "Restless Bandit Marginal Productivity Indices, Diminishing Returns, and Optimal Control of Make-to-Order/Make-to-Stock M/G/1 Queues," Mathematics of Operations Research, INFORMS, vol. 31(1), pages 50-84, February.
    2. K. D. Glazebrook & R. Minty, 2009. "A Generalized Gittins Index for a Class of Multiarmed Bandits with General Resource Requirements," Mathematics of Operations Research, INFORMS, vol. 34(1), pages 26-44, February.
    3. Santiago R. Balseiro & David B. Brown & Chen Chen, 2021. "Dynamic Pricing of Relocating Resources in Large Networks," Management Science, INFORMS, vol. 67(7), pages 4075-4094, July.
    4. José Niño-Mora, 2020. "Fast Two-Stage Computation of an Index Policy for Multi-Armed Bandits with Setup Delays," Mathematics, MDPI, vol. 9(1), pages 1-36, December.
    5. José Niño-Mora, 2020. "A Verification Theorem for Threshold-Indexability of Real-State Discounted Restless Bandits," Mathematics of Operations Research, INFORMS, vol. 45(2), pages 465-496, May.
    6. Song Lin & Juanjuan Zhang & John R. Hauser, 2015. "Learning from Experience, Simply," Marketing Science, INFORMS, vol. 34(1), pages 1-19, January.
    7. Turgay Ayer & Can Zhang & Anthony Bonifonte & Anne C. Spaulding & Jagpreet Chhatwal, 2019. "Prioritizing Hepatitis C Treatment in U.S. Prisons," Operations Research, INFORMS, vol. 67(3), pages 853-873, May.
    8. Dong Li & Li Ding & Stephen Connor, 2020. "When to Switch? Index Policies for Resource Scheduling in Emergency Response," Production and Operations Management, Production and Operations Management Society, vol. 29(2), pages 241-262, February.
    9. Dimitris Bertsimas & Velibor V. Mišić, 2016. "Decomposable Markov Decision Processes: A Fluid Optimization Approach," Operations Research, INFORMS, vol. 64(6), pages 1537-1555, December.
    10. Vishal Ahuja & John R. Birge, 2020. "An Approximation Approach for Response-Adaptive Clinical Trial Design," INFORMS Journal on Computing, INFORMS, vol. 32(4), pages 877-894, October.
    11. Michael Jong Kim & Andrew E.B. Lim, 2016. "Robust Multiarmed Bandit Problems," Management Science, INFORMS, vol. 62(1), pages 264-285, January.
    12. David B. Brown & James E. Smith, 2020. "Index Policies and Performance Bounds for Dynamic Selection Problems," Management Science, INFORMS, vol. 66(7), pages 3029-3050, July.
    13. Abderrahmane Abbou & Viliam Makis, 2019. "Group Maintenance: A Restless Bandits Approach," INFORMS Journal on Computing, INFORMS, vol. 31(4), pages 719-731, October.
    14. Urtzi Ayesta & Manu K. Gupta & Ina Maria Verloop, 2021. "On the computation of Whittle’s index for Markovian restless bandits," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 93(1), pages 179-208, February.
    15. Andrei Sleptchenko & M. Eric Johnson, 2015. "Maintaining Secure and Reliable Distributed Control Systems," INFORMS Journal on Computing, INFORMS, vol. 27(1), pages 103-117, February.
    16. Deligiannis, Michalis & Liberopoulos, George, 2023. "Dynamic ordering and buyer selection policies when service affects future demand," Omega, Elsevier, vol. 118(C).
    17. Glazebrook, K. D. & Mitchell, H. M. & Ansell, P. S., 2005. "Index policies for the maintenance of a collection of machines by a set of repairmen," European Journal of Operational Research, Elsevier, vol. 165(1), pages 267-284, August.
    18. Ya‐Tang Chuang & Manaf Zargoush & Somayeh Ghazalbash & Saied Samiedaluie & Kerry Kuluski & Sara Guilcher, 2023. "From prediction to decision: Optimizing long‐term care placements among older delayed discharge patients," Production and Operations Management, Production and Operations Management Society, vol. 32(4), pages 1041-1058, April.
    19. Ankur Goel & Genaro J. Gutierrez, 2011. "Multiechelon Procurement and Distribution Policies for Traded Commodities," Management Science, INFORMS, vol. 57(12), pages 2228-2244, December.
    20. Haijian Si & Stylianos Kavadias & Christoph Loch, 2022. "Managing innovation portfolios: From project selection to portfolio design," Production and Operations Management, Production and Operations Management Society, vol. 31(12), pages 4572-4588, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:7:p:1639-:d:1109797. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.