IDEAS home Printed from https://ideas.repec.org/a/inm/ormnsc/v68y2022i3p1696-1713.html
   My bibliography  Save this article

Hedging the Drift: Learning to Optimize Under Nonstationarity

Author

Listed:
  • Wang Chi Cheung

    (Department of Industrial Systems Engineering and Management, National University of Singapore, Singapore 119077)

  • David Simchi-Levi

    (Institute for Data, Systems, and Society, Department of Civil and Environmental Engineering, and Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139)

  • Ruihao Zhu

    (Supply Chain and Operations Management, Purdue Krannert School of Management, West Lafayette, Indiana 47907)

Abstract

We introduce data-driven decision-making algorithms that achieve state-of-the-art dynamic regret bounds for a collection of nonstationary stochastic bandit settings. These settings capture applications such as advertisement allocation, dynamic pricing, and traffic network routing in changing environments. We show how the difficulty posed by the (unknown a priori and possibly adversarial) nonstationarity can be overcome by an unconventional marriage between stochastic and adversarial bandit learning algorithms. Beginning with the linear bandit setting, we design and analyze a sliding window-upper confidence bound algorithm that achieves the optimal dynamic regret bound when the underlying variation budget is known. This budget quantifies the total amount of temporal variation of the latent environments. Boosted by the novel bandit-over-bandit framework that adapts to the latent changes, our algorithm can further enjoy nearly optimal dynamic regret bounds in a (surprisingly) parameter-free manner. We extend our results to other related bandit problems, namely the multiarmed bandit, generalized linear bandit, and combinatorial semibandit settings, which model a variety of operations research applications. In addition to the classical exploration-exploitation trade-off, our algorithms leverage the power of the “forgetting principle” in the learning processes, which is vital in changing environments. Extensive numerical experiments with synthetic datasets and a dataset of an online auto-loan company during the severe acute respiratory syndrome (SARS) epidemic period demonstrate that our proposed algorithms achieve superior performance compared with existing algorithms.

Suggested Citation

  • Wang Chi Cheung & David Simchi-Levi & Ruihao Zhu, 2022. "Hedging the Drift: Learning to Optimize Under Nonstationarity," Management Science, INFORMS, vol. 68(3), pages 1696-1713, March.
  • Handle: RePEc:inm:ormnsc:v:68:y:2022:i:3:p:1696-1713
    DOI: 10.1287/mnsc.2021.4024
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/mnsc.2021.4024
    Download Restriction: no

    File URL: https://libkey.io/10.1287/mnsc.2021.4024?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Paat Rusmevichientong & John N. Tsitsiklis, 2010. "Linearly Parameterized Bandits," Mathematics of Operations Research, INFORMS, vol. 35(2), pages 395-411, May.
    2. Robert Phillips & A. Serdar Şimşek & Garrett van Ryzin, 2015. "The Effectiveness of Field Price Discretion: Empirical Evidence from Auto Lending," Management Science, INFORMS, vol. 61(8), pages 1741-1759, August.
    3. Daniel Russo & Benjamin Van Roy, 2014. "Learning to Optimize via Posterior Sampling," Mathematics of Operations Research, INFORMS, vol. 39(4), pages 1221-1243, November.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yingfei Wang & Inbal Yahav & Balaji Padmanabhan, 2024. "Smart Testing with Vaccination: A Bandit Algorithm for Active Sampling for Managing COVID-19," Information Systems Research, INFORMS, vol. 35(1), pages 120-144, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hamsa Bastani & David Simchi-Levi & Ruihao Zhu, 2022. "Meta Dynamic Pricing: Transfer Learning Across Experiments," Management Science, INFORMS, vol. 68(3), pages 1865-1881, March.
    2. David Simchi-Levi & Rui Sun & Huanan Zhang, 2022. "Online Learning and Optimization for Revenue Management Problems with Add-on Discounts," Management Science, INFORMS, vol. 68(10), pages 7402-7421, October.
    3. Rong Jin & David Simchi-Levi & Li Wang & Xinshang Wang & Sen Yang, 2021. "Shrinking the Upper Confidence Bound: A Dynamic Product Selection Problem for Urban Warehouses," Management Science, INFORMS, vol. 67(8), pages 4756-4771, August.
    4. Bin Han & Ilya O. Ryzhov & Boris Defourny, 2016. "Optimal Learning in Linear Regression with Combinatorial Feature Selection," INFORMS Journal on Computing, INFORMS, vol. 28(4), pages 721-735, November.
    5. Eric M. Schwartz & Eric T. Bradlow & Peter S. Fader, 2017. "Customer Acquisition via Display Advertising Using Multi-Armed Bandit Experiments," Marketing Science, INFORMS, vol. 36(4), pages 500-522, July.
    6. Zhengyuan Zhou & Susan Athey & Stefan Wager, 2023. "Offline Multi-Action Policy Learning: Generalization and Optimization," Operations Research, INFORMS, vol. 71(1), pages 148-183, January.
    7. Mark Egan & Tomas Philipson, 2016. "Health Care Adherence and Personalized Medicine," Working Papers 2016-H01, Becker Friedman Institute for Research In Economics.
    8. Jan-Peter Kucklick & Jennifer Priefer & Daniel Beverungen & Oliver Müller, 2023. "Elucidating the Predictive Power of Search and Experience Qualities for Pricing of Complex Goods – A Machine Learning-based Study on Real Estate Appraisal," Working Papers Dissertations 112, Paderborn University, Faculty of Business Administration and Economics.
    9. Özalp Özer & Upender Subramanian & Yu Wang, 2018. "Information Sharing, Advice Provision, or Delegation: What Leads to Higher Trust and Trustworthiness?," Management Science, INFORMS, vol. 64(1), pages 474-493, January.
    10. Xiangyu Gao & Stefanus Jasin & Sajjad Najafi & Huanan Zhang, 2022. "Joint Learning and Optimization for Multi-Product Pricing (and Ranking) Under a General Cascade Click Model," Management Science, INFORMS, vol. 68(10), pages 7362-7382, October.
    11. Agrawal, Priyank & Tulabandhula, Theja & Avadhanula, Vashist, 2023. "A tractable online learning algorithm for the multinomial logit contextual bandit," European Journal of Operational Research, Elsevier, vol. 310(2), pages 737-750.
    12. Mengying Zhu & Xiaolin Zheng & Yan Wang & Yuyuan Li & Qianqiao Liang, 2019. "Adaptive Portfolio by Solving Multi-armed Bandit via Thompson Sampling," Papers 1911.05309, arXiv.org, revised Nov 2019.
    13. Haihui Shen & L. Jeff Hong & Xiaowei Zhang, 2021. "Ranking and Selection with Covariates for Personalized Decision Making," INFORMS Journal on Computing, INFORMS, vol. 33(4), pages 1500-1519, October.
    14. Mark Egan & Tomas J. Philipson, 2014. "Health Care Adherence and Personalized Medicine," NBER Working Papers 20330, National Bureau of Economic Research, Inc.
    15. Yining Wang & Boxiao Chen & David Simchi-Levi, 2021. "Multimodal Dynamic Pricing," Management Science, INFORMS, vol. 67(10), pages 6136-6152, October.
    16. T. Law & J. Shawe-Taylor, 2017. "Practical Bayesian support vector regression for financial time series prediction and market condition change detection," Quantitative Finance, Taylor & Francis Journals, vol. 17(9), pages 1403-1416, September.
    17. David B. Brown & James E. Smith, 2013. "Optimal Sequential Exploration: Bandits, Clairvoyants, and Wildcats," Operations Research, INFORMS, vol. 61(3), pages 644-665, June.
    18. Lukas, Moritz, 2017. "Estimating interest rate elasticities in consumer credit," Economics Letters, Elsevier, vol. 156(C), pages 155-158.
    19. Yuqing Zhang & Neil Walton, 2019. "Adaptive Pricing in Insurance: Generalized Linear Models and Gaussian Process Regression Approaches," Papers 1907.05381, arXiv.org.
    20. Lukas, M., 2019. "Relative prices and product substitution: Evidence from shocks to consumer credit interest rates," Journal of Behavioral and Experimental Finance, Elsevier, vol. 21(C), pages 39-49.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormnsc:v:68:y:2022:i:3:p:1696-1713. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.