IDEAS home Printed from https://ideas.repec.org/a/inm/oropre/v67y2019i5p1486-1502.html
   My bibliography  Save this article

Bandits with Global Convex Constraints and Objective

Author

Listed:
  • Shipra Agrawal

    (Industrial Engineering and Operations Research, Columbia University, New York, New York 10027)

  • Nikhil R. Devanur

    (Microsoft Research, Redmond, Washington 98052)

Abstract

We consider a very general model for managing the exploration–exploitation trade-off, which allows global convex constraints and concave objective on the aggregate decisions over time in addition to the customary limitation on the time horizon. This model provides a natural framework to study many sequential decision-making problems with long-term convex constraints and concave utility and subsumes the classic multiarmed bandit (MAB) model and the bandits with knapsacks problem as special cases. We demonstrate that a natural extension of the upper confidence bound family of algorithms for MAB provides a polynomial time algorithm with near-optimal regret guarantees for this substantially more general model. We also provide computationally more efficient algorithms by establishing interesting connections between this problem and other well-studied problems/algorithms, such as the Blackwell approachability problem, online convex optimization, and the Frank–Wolfe technique for convex optimization. We give several concrete examples of applications, particularly in risk-sensitive revenue management under unknown demand distributions, in which this more general bandit model of sequential decision making allows for richer formulations and more efficient solutions of the problem.

Suggested Citation

  • Shipra Agrawal & Nikhil R. Devanur, 2019. "Bandits with Global Convex Constraints and Objective," Operations Research, INFORMS, vol. 67(5), pages 1486-1502, September.
  • Handle: RePEc:inm:oropre:v:67:y:2019:i:5:p:1486-1502
    DOI: opre.2019.1840
    as

    Download full text from publisher

    File URL: https://doi.org/opre.2019.1840
    Download Restriction: no

    File URL: https://libkey.io/opre.2019.1840?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Omar Besbes & Assaf Zeevi, 2012. "Blind Network Revenue Management," Operations Research, INFORMS, vol. 60(6), pages 1537-1550, December.
    2. Shipra Agrawal & Zizhuo Wang & Yinyu Ye, 2014. "A Dynamic Near-Optimal Algorithm for Online Linear Programming," Operations Research, INFORMS, vol. 62(4), pages 876-890, August.
    3. H Xiong & J Xie & X Deng, 2011. "Risk-averse decision making in overbooking problem," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 62(9), pages 1655-1665, September.
    4. Omar Besbes & Assaf Zeevi, 2009. "Dynamic Pricing Without Knowing the Demand Function: Risk Bounds and Near-Optimal Algorithms," Operations Research, INFORMS, vol. 57(6), pages 1407-1420, December.
    5. NESTEROV, Yu., 2005. "Smooth minimization of non-smooth functions," LIDAM Reprints CORE 1819, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    6. C. Barz & K. Waldmann, 2007. "Risk-sensitive capacity control in revenue management," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 65(3), pages 565-579, June.
    7. Marguerite Frank & Philip Wolfe, 1956. "An algorithm for quadratic programming," Naval Research Logistics Quarterly, John Wiley & Sons, vol. 3(1‐2), pages 95-110, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ramesh Johari & Vijay Kamble & Yash Kanoria, 2021. "Matching While Learning," Operations Research, INFORMS, vol. 69(2), pages 655-681, March.
    2. Maxime C. Cohen & Ilan Lobel & Renato Paes Leme, 2020. "Feature-Based Dynamic Pricing," Management Science, INFORMS, vol. 66(11), pages 4921-4943, November.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. David Simchi-Levi & Rui Sun & Huanan Zhang, 2022. "Online Learning and Optimization for Revenue Management Problems with Add-on Discounts," Management Science, INFORMS, vol. 68(10), pages 7402-7421, October.
    2. Qi (George) Chen & Stefanus Jasin & Izak Duenyas, 2021. "Technical Note—Joint Learning and Optimization of Multi-Product Pricing with Finite Resource Capacity and Unknown Demand Parameters," Operations Research, INFORMS, vol. 69(2), pages 560-573, March.
    3. Boxiao Chen & Xiuli Chao & Cong Shi, 2021. "Nonparametric Learning Algorithms for Joint Pricing and Inventory Control with Lost Sales and Censored Demand," Mathematics of Operations Research, INFORMS, vol. 46(2), pages 726-756, May.
    4. Athanassios N. Avramidis & Arnoud V. Boer, 2021. "Dynamic pricing with finite price sets: a non-parametric approach," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 94(1), pages 1-34, August.
    5. Yash Kanoria & Hamid Nazerzadeh, 2020. "Dynamic Reserve Prices for Repeated Auctions: Learning from Bids," Papers 2002.07331, arXiv.org.
    6. Yining Wang & Boxiao Chen & David Simchi-Levi, 2021. "Multimodal Dynamic Pricing," Management Science, INFORMS, vol. 67(10), pages 6136-6152, October.
    7. Dipankar Das, 2023. "A Model of Competitive Assortment Planning Algorithm," Papers 2307.09479, arXiv.org.
    8. Athanassios N. Avramidis, 2020. "A pricing problem with unknown arrival rate and price sensitivity," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 92(1), pages 77-106, August.
    9. Yuqing Zhang & Neil Walton, 2019. "Adaptive Pricing in Insurance: Generalized Linear Models and Gaussian Process Regression Approaches," Papers 1907.05381, arXiv.org.
    10. Ya-Feng Liu & Xin Liu & Shiqian Ma, 2019. "On the Nonergodic Convergence Rate of an Inexact Augmented Lagrangian Framework for Composite Convex Programming," Mathematics of Operations Research, INFORMS, vol. 44(2), pages 632-650, May.
    11. Yang, Chaolin & Xiong, Yi, 2020. "Nonparametric advertising budget allocation with inventory constraint," European Journal of Operational Research, Elsevier, vol. 285(2), pages 631-641.
    12. Jianyu Xu & Yu-Xiang Wang, 2022. "Towards Agnostic Feature-based Dynamic Pricing: Linear Policies vs Linear Valuation with Unknown Noise," Papers 2201.11341, arXiv.org, revised Apr 2022.
    13. Zeqi Ye & Hansheng Jiang, 2023. "Smoothness-Adaptive Dynamic Pricing with Nonparametric Demand Learning," Papers 2310.07558, arXiv.org, revised Oct 2023.
    14. Qi (George) Chen & Stefanus Jasin & Izak Duenyas, 2019. "Nonparametric Self-Adjusting Control for Joint Learning and Optimization of Multiproduct Pricing with Finite Resource Capacity," Mathematics of Operations Research, INFORMS, vol. 44(2), pages 601-631, May.
    15. Mila Nambiar & David Simchi-Levi & He Wang, 2019. "Dynamic Learning and Pricing with Model Misspecification," Management Science, INFORMS, vol. 65(11), pages 4980-5000, November.
    16. Ningyuan Chen & Guillermo Gallego, 2021. "Nonparametric Pricing Analytics with Customer Covariates," Operations Research, INFORMS, vol. 69(3), pages 974-984, May.
    17. Dawsen Hwang & Patrick Jaillet & Vahideh Manshadi, 2021. "Online Resource Allocation Under Partially Predictable Demand," Operations Research, INFORMS, vol. 69(3), pages 895-915, May.
    18. Pi, J. & Wang, Honggang & Pardalos, Panos M., 2021. "A dual reformulation and solution framework for regularized convex clustering problems," European Journal of Operational Research, Elsevier, vol. 290(3), pages 844-856.
    19. Yiwei Chen & Cong Shi, 2023. "Network revenue management with online inverse batch gradient descent method," Production and Operations Management, Production and Operations Management Society, vol. 32(7), pages 2123-2137, July.
    20. Ming Chen & Zhi-Long Chen, 2018. "Robust Dynamic Pricing with Two Substitutable Products," Manufacturing & Service Operations Management, INFORMS, vol. 20(2), pages 249-268, May.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:oropre:v:67:y:2019:i:5:p:1486-1502. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.