Author
Listed:
- Wei-Kang Hsu
(School of Electrical and Computer Engineering, Purdue University, West Lafayette, Indiana 47907)
- Jiaming Xu
(The Fuqua School of Business, Duke University, Durham, North Carolina 27708)
- Xiaojun Lin
(School of Electrical and Computer Engineering, Purdue University, West Lafayette, Indiana 47907)
- Mark R. Bell
(School of Electrical and Computer Engineering, Purdue University, West Lafayette, Indiana 47907)
Abstract
We study task assignment in online service platforms, where unlabeled clients arrive according to a stochastic process and each client brings a random number of tasks. As tasks are assigned to servers, they produce client/server-dependent random payoffs. The goal of the system operator is to maximize the expected payoff per unit time subject to the servers’ capacity constraints. However, both the statistics of the dynamic client population and the client-specific payoff vectors are unknown to the operator. Thus, the operator must design task-assignment policies that integrate adaptive control (of the queueing system) with online learning (of the clients’ payoff vectors). A key challenge in such integration is how to account for the nontrivial closed-loop interactions between the queueing process and the learning process, which may significantly degrade system performance. We propose a new utility-guided online learning and task assignment algorithm that seamlessly integrates learning with control to address such difficulty. Our analysis shows that, compared with an oracle that knows all client dynamics and payoff vectors beforehand, the gap of the expected payoff per unit time of our proposed algorithm can be analytically bounded by three terms, which separately capture the impact of the client-dynamic uncertainty, client-server payoff uncertainty, and the loss incurred by backlogged clients in the system. Further, our bound holds for any finite time horizon. Through simulations, we show that our proposed algorithm significantly outperforms a myopic-matching policy and a standard queue-length-based policy that does not explicitly address the closed-loop interactions between queueing and learning.
Suggested Citation
Wei-Kang Hsu & Jiaming Xu & Xiaojun Lin & Mark R. Bell, 2022.
"Integrated Online Learning and Adaptive Control in Queueing Systems with Uncertain Payoffs,"
Operations Research, INFORMS, vol. 70(2), pages 1166-1181, March.
Handle:
RePEc:inm:oropre:v:70:y:2022:i:2:p:1166-1181
DOI: 10.1287/opre.2021.2100
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:oropre:v:70:y:2022:i:2:p:1166-1181. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.