The Knowledge Gradient Algorithm for a General Class of Online Learning Problems

My bibliography Save this article

The Knowledge Gradient Algorithm for a General Class of Online Learning Problems

Author

Listed:

Ilya O. Ryzhov
(Robert H. Smith School of Business, University of Maryland, College Park, Maryland 20742)
Warren B. Powell
(Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08544)
Peter I. Frazier
(Department of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853)

Registered:

Abstract

We derive a one-period look-ahead policy for finite- and infinite-horizon online optimal learning problems with Gaussian rewards. Our approach is able to handle the case where our prior beliefs about the rewards are correlated, which is not handled by traditional multiarmed bandit methods. Experiments show that our KG policy performs competitively against the best-known approximation to the optimal policy in the classic bandit problem, and it outperforms many learning policies in the correlated case.

Suggested Citation

Ilya O. Ryzhov & Warren B. Powell & Peter I. Frazier, 2012. "The Knowledge Gradient Algorithm for a General Class of Online Learning Problems," Operations Research, INFORMS, vol. 60(1), pages 180-195, February.

Handle: RePEc:inm:oropre:v:60:y:2012:i:1:p:180-195
DOI: 10.1287/opre.1110.0999

Download full text from publisher

References listed on IDEAS

Stephen E. Chick & Noah Gans, 2009. "Economic Analysis of Simulation Selection Problems," Management Science, INFORMS, vol. 55(3), pages 421-437, March.
Ilya O. Ryzhov & Warren B. Powell, 2011. "Information Collection on a Graph," Operations Research, INFORMS, vol. 59(1), pages 188-201, February.
Dimitris Bertsimas & José Niño-Mora, 2000. "Restless Bandits, Linear Programming Relaxations, and a Primal-Dual Index Heuristic," Operations Research, INFORMS, vol. 48(1), pages 80-90, February.
Stephen E. Chick & Jürgen Branke & Christian Schmidt, 2010. "Sequential Sampling to Myopically Maximize the Expected Value of Information," INFORMS Journal on Computing, INFORMS, vol. 22(1), pages 71-80, February.
Monica Brezzi & Tze Leung Lai, 2000. "Incomplete Learning from Endogenous Data in Dynamic Allocation," Econometrica, Econometric Society, vol. 68(6), pages 1511-1516, November.
Brezzi, Monica & Lai, Tze Leung, 2002. "Optimal learning and experimentation in bandit problems," Journal of Economic Dynamics and Control, Elsevier, vol. 27(1), pages 87-108, November.
Peter Frazier & Warren Powell & Savas Dayanik, 2009. "The Knowledge-Gradient Policy for Correlated Normal Beliefs," INFORMS Journal on Computing, INFORMS, vol. 21(4), pages 599-613, November.
Michael N. Katehakis & Arthur F. Veinott, 1987. "The Multi-Armed Bandit Problem: Decomposition and Computation," Mathematics of Operations Research, INFORMS, vol. 12(2), pages 262-268, May.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Juergen Branke & Wen Zhang, 2019. "Identifying efficient solutions via simulation: myopic multi-objective budget allocation for the bi-objective case," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 41(3), pages 831-865, September.
David B. Brown & James E. Smith, 2013. "Optimal Sequential Exploration: Bandits, Clairvoyants, and Wildcats," Operations Research, INFORMS, vol. 61(3), pages 644-665, June.
Jing Xie & Peter I. Frazier, 2013. "Sequential Bayes-Optimal Policies for Multiple Comparisons with a Known Standard," Operations Research, INFORMS, vol. 61(5), pages 1174-1189, October.
Ilya O. Ryzhov & Martijn R. K. Mes & Warren B. Powell & Gerald van den Berg, 2019. "Bayesian Exploration for Approximate Dynamic Programming," Operations Research, INFORMS, vol. 67(1), pages 198-214, January.
Michael Jong Kim & Andrew E.B. Lim, 2016. "Robust Multiarmed Bandit Problems," Management Science, INFORMS, vol. 62(1), pages 264-285, January.
Ilya O. Ryzhov, 2016. "On the Convergence Rates of Expected Improvement Methods," Operations Research, INFORMS, vol. 64(6), pages 1515-1528, December.
Sarang Deo & Seyed Iravani & Tingting Jiang & Karen Smilowitz & Stephen Samuelson, 2013. "Improving Health Outcomes Through Better Capacity Allocation in a Community-Based Chronic Care Model," Operations Research, INFORMS, vol. 61(6), pages 1277-1294, December.
Daniel Russo & Benjamin Van Roy, 2018. "Learning to Optimize via Information-Directed Sampling," Operations Research, INFORMS, vol. 66(1), pages 230-252, January.
Michael Jong Kim, 2020. "Variance Regularization in Sequential Bayesian Optimization," Mathematics of Operations Research, INFORMS, vol. 45(3), pages 966-992, August.
Bolong Cheng & Arta Jamshidi & Warren Powell, 2015. "Optimal learning with a local parametric belief model," Journal of Global Optimization, Springer, vol. 63(2), pages 401-425, October.
Samuel Cohen & Tanut Treetanthiploet, 2021. "Generalised correlated batched bandits via the ARC algorithm with application to dynamic pricing," Papers 2102.04263, arXiv.org, revised Oct 2022.
L. Jeff Hong & Guangxin Jiang, 2019. "Offline Simulation Online Application: A New Framework of Simulation-Based Decision Making," Asia-Pacific Journal of Operational Research (APJOR), World Scientific Publishing Co. Pte. Ltd., vol. 36(06), pages 1-22, December.
Shervin Shahrokhi Tehrani & Andrew T. Ching, 2024. "A Heuristic Approach to Explore: The Value of Perfect Information," Management Science, INFORMS, vol. 70(5), pages 3200-3224, May.
Martinelli, Gabriele & Eidsvik, Jo & Hauge, Ragnar, 2013. "Dynamic decision making for graphical models applied to oil exploration," European Journal of Operational Research, Elsevier, vol. 230(3), pages 688-702.
Satya S. Malladi & Alan L. Erera & Chelsea C. White, 2021. "Managing mobile production-inventory systems influenced by a modulation process," Annals of Operations Research, Springer, vol. 304(1), pages 299-330, September.
Seokhyun Chung & Raed Al Kontar & Zhenke Wu, 2022. "Weakly Supervised Multi-output Regression via Correlated Gaussian Processes," INFORMS Joural on Data Science, INFORMS, vol. 1(2), pages 115-137, October.
Yunxiao Deng & Suvrajeet Sen, 2022. "Predictive stochastic programming," Computational Management Science, Springer, vol. 19(1), pages 65-98, January.
Daniel Russo & Benjamin Van Roy, 2014. "Learning to Optimize via Posterior Sampling," Mathematics of Operations Research, INFORMS, vol. 39(4), pages 1221-1243, November.
Warren B. Powell, 2016. "Perspectives of approximate dynamic programming," Annals of Operations Research, Springer, vol. 241(1), pages 319-356, June.
Yan Li & Kristofer G. Reyes & Jorge Vazquez-Anderson & Yingfei Wang & Lydia M. Contreras & Warren B. Powell, 2018. "A Knowledge Gradient Policy for Sequencing Experiments to Identify the Structure of RNA Molecules Using a Sparse Additive Belief Model," INFORMS Journal on Computing, INFORMS, vol. 30(4), pages 750-767, November.
Yoon, Gyugeun & Chow, Joseph Y.J., 2024. "A sequential transit network design algorithm with optimal learning under correlated beliefs," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 191(C).
Daniel Russo, 2020. "Simple Bayesian Algorithms for Best-Arm Identification," Operations Research, INFORMS, vol. 68(6), pages 1625-1647, November.
Andres Alban & Stephen E. Chick & Martin Forster, 2023. "Value-Based Clinical Trials: Selecting Recruitment Rates and Trial Lengths in Different Regulatory Contexts," Management Science, INFORMS, vol. 69(6), pages 3516-3535, June.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Eric M. Schwartz & Eric T. Bradlow & Peter S. Fader, 2017. "Customer Acquisition via Display Advertising Using Multi-Armed Bandit Experiments," Marketing Science, INFORMS, vol. 36(4), pages 500-522, July.
Stephen E. Chick & Peter Frazier, 2012. "Sequential Sampling with Economics of Selection Procedures," Management Science, INFORMS, vol. 58(3), pages 550-569, March.
Jing Xie & Peter I. Frazier, 2013. "Sequential Bayes-Optimal Policies for Multiple Comparisons with a Known Standard," Operations Research, INFORMS, vol. 61(5), pages 1174-1189, October.
Powell, Warren B., 2019. "A unified framework for stochastic optimization," European Journal of Operational Research, Elsevier, vol. 275(3), pages 795-821.
Ilya O. Ryzhov & Warren B. Powell, 2011. "Information Collection on a Graph," Operations Research, INFORMS, vol. 59(1), pages 188-201, February.
Warren B. Powell, 2016. "Perspectives of approximate dynamic programming," Annals of Operations Research, Springer, vol. 241(1), pages 319-356, June.
Konon, Alexander, 2016. "Career choice under uncertainty," VfS Annual Conference 2016 (Augsburg): Demographic Change 145583, Verein für Socialpolitik / German Economic Association.
Raluca M. Ursu & Qingliang Wang & Pradeep K. Chintagunta, 2020. "Search Duration," Marketing Science, INFORMS, vol. 39(5), pages 849-871, September.
Victor F. Araman & René A. Caldentey, 2022. "Diffusion Approximations for a Class of Sequential Experimentation Problems," Management Science, INFORMS, vol. 68(8), pages 5958-5979, August.
Weiwei Fan & L. Jeff Hong & Xiaowei Zhang, 2020. "Distributionally Robust Selection of the Best," Management Science, INFORMS, vol. 66(1), pages 190-208, January.
Stephen Chick & Martin Forster & Paolo Pertile, 2017. "A Bayesian decision theoretic model of sequential experimentation with delayed response," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(5), pages 1439-1462, November.
- Stephen Chick & Martin Forster & Paolo Pertile, 2015. "A Bayesian Decision-Theoretic Model of Sequential Experimentation with Delayed Response," Discussion Papers 15/09, Department of Economics, University of York.
Emre Barut & Warren Powell, 2014. "Optimal learning for sequential sampling with non-parametric beliefs," Journal of Global Optimization, Springer, vol. 58(3), pages 517-543, March.
Gongbo Zhang & Yijie Peng & Jianghua Zhang & Enlu Zhou, 2023. "Asymptotically Optimal Sampling Policy for Selecting Top- m Alternatives," INFORMS Journal on Computing, INFORMS, vol. 35(6), pages 1261-1285, November.
Diana M. Negoescu & Peter I. Frazier & Warren B. Powell, 2011. "The Knowledge-Gradient Algorithm for Sequencing Experiments in Drug Discovery," INFORMS Journal on Computing, INFORMS, vol. 23(3), pages 346-363, August.
Huashuai Qu & Ilya O. Ryzhov & Michael C. Fu & Zi Ding, 2015. "Sequential Selection with Unknown Correlation Structures," Operations Research, INFORMS, vol. 63(4), pages 931-948, August.
Ilya O. Ryzhov & Martijn R. K. Mes & Warren B. Powell & Gerald van den Berg, 2019. "Bayesian Exploration for Approximate Dynamic Programming," Operations Research, INFORMS, vol. 67(1), pages 198-214, January.
Ilya O. Ryzhov, 2016. "On the Convergence Rates of Expected Improvement Methods," Operations Research, INFORMS, vol. 64(6), pages 1515-1528, December.
Yijie Peng & Chun-Hung Chen & Michael C. Fu & Jian-Qiang Hu, 2016. "Dynamic Sampling Allocation and Design Selection," INFORMS Journal on Computing, INFORMS, vol. 28(2), pages 195-208, May.
David J. Eckman & Shane G. Henderson, 2022. "Posterior-Based Stopping Rules for Bayesian Ranking-and-Selection Procedures," INFORMS Journal on Computing, INFORMS, vol. 34(3), pages 1711-1728, May.
Bin Han & Ilya O. Ryzhov & Boris Defourny, 2016. "Optimal Learning in Linear Regression with Combinatorial Feature Selection," INFORMS Journal on Computing, INFORMS, vol. 28(4), pages 721-735, November.

More about this item

Keywords

multiarmed bandit; optimal learning; online learning; knowledge gradient; Gittins index; index policy;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:oropre:v:60:y:2012:i:1:p:180-195. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

The Knowledge Gradient Algorithm for a General Class of Online Learning Problems

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data