Author
Listed:
- Subhashini Krishnasamy
(Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, Texas 78712)
- Rajat Sen
(Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, Texas 78712)
- Ramesh Johari
(Department of Management Science and Engineering, Stanford University, Stanford, California 94305)
- Sanjay Shakkottai
(Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, Texas 78712)
Abstract
Consider a queueing system consisting of multiple servers. Jobs arrive over time and enter a queue for service; the goal is to minimize the size of this queue. At each opportunity for service, at most one server can be chosen, and at most one job can be served. Service is successful with a probability (the service probability ) that is a priori unknown for each server. An algorithm that knows the service probabilities (the “genie”) can always choose the server of highest service probability. We study algorithms that learn the unknown service probabilities. Our goal is to minimize queue regret : the (expected) difference between the queue lengths obtained by the algorithm and those obtained by the “genie.” Because queue regret cannot be larger than classical regret, results for the standard multiarmed bandit problem give algorithms for which queue regret increases no more than logarithmically in time. Our paper shows surprisingly more complex behavior. In particular, as long as the bandit algorithm’s queues have relatively long regenerative cycles, queue regret is similar to cumulative regret and scales (essentially) logarithmically. However, we show that this “early stage” of the queueing bandit eventually gives way to a “late stage,” where the optimal queue-regret scaling is O (1/ t ). We demonstrate an algorithm that (orderwise) achieves this asymptotic queue regret in the late stage. Our results are developed in a more general model that allows for multiple job classes as well.
Suggested Citation
Subhashini Krishnasamy & Rajat Sen & Ramesh Johari & Sanjay Shakkottai, 2021.
"Learning Unknown Service Rates in Queues: A Multiarmed Bandit Approach,"
Operations Research, INFORMS, vol. 69(1), pages 315-330, January.
Handle:
RePEc:inm:oropre:v:69:y:2021:i:1:p:315-330
DOI: 10.1287/opre.2020.1995
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:oropre:v:69:y:2021:i:1:p:315-330. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.