A modern Bayesian look at the multi‐armed bandit

My bibliography Save this article

A modern Bayesian look at the multi‐armed bandit

Author

Listed:

Steven L. Scott

Registered:

Abstract

A multi‐armed bandit is an experiment with the goal of accumulating rewards from a payoff distribution with unknown parameters that are to be learned sequentially. This article describes a heuristic for managing multi‐armed bandits called randomized probability matching, which randomly allocates observations to arms according the Bayesian posterior probability that each arm is optimal. Advances in Bayesian computation have made randomized probability matching easy to apply to virtually any payoff distribution. This flexibility frees the experimenter to work with payoff distributions that correspond to certain classical experimental designs that have the potential to outperform methods that are ‘optimal’ in simpler contexts. I summarize the relationships between randomized probability matching and several related heuristics that have been used in the reinforcement learning literature. Copyright © 2010 John Wiley & Sons, Ltd.

Suggested Citation

Steven L. Scott, 2010. "A modern Bayesian look at the multi‐armed bandit," Applied Stochastic Models in Business and Industry, John Wiley & Sons, vol. 26(6), pages 639-658, November.

Handle: RePEc:wly:apsmbi:v:26:y:2010:i:6:p:639-658
DOI: 10.1002/asmb.874

Download full text from publisher

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

T. Law & J. Shawe-Taylor, 2017. "Practical Bayesian support vector regression for financial time series prediction and market condition change detection," Quantitative Finance, Taylor & Francis Journals, vol. 17(9), pages 1403-1416, September.
Athey, Susan & Imbens, Guido W., 2019. "Machine Learning Methods Economists Should Know About," Research Papers 3776, Stanford University, Graduate School of Business.
- Susan Athey & Guido Imbens, 2019. "Machine Learning Methods Economists Should Know About," Papers 1903.10075, arXiv.org.
Ben Vinod & Richard Ratliff & Vikram Jayaram, 2018. "An approach to offer management: maximizing sales with fare products and ancillaries," Journal of Revenue and Pricing Management, Palgrave Macmillan, vol. 17(2), pages 91-101, April.
A Stefano Caria & Grant Gordon & Maximilian Kasy & Simon Quinn & Soha Osman Shami & Alexander Teytelboym, 2024. "An Adaptive Targeted Field Experiment: Job Search Assistance for Refugees in Jordan," Journal of the European Economic Association, European Economic Association, vol. 22(2), pages 781-836.
- A. Stefano Caria & Grant Gordon & Maximilian Kasy & Simon Quinn & Soha Shami & Alexander Teytelboym, 2020. "An Adaptive Targeted Field Experiment: Job Search Assistance for Refugees in Jordan," CSAE Working Paper Series 2020-20, Centre for the Study of African Economies, University of Oxford.
- Caria, Stefano & Gordon, Grant & Kasy, Maximilian & Quinn, Simon & Shami, Soha & Teytelboym, Alexander, 2021. "An Adaptive Targeted Field Experiment: Job Search Assistance for Refugees in Jordan," CAGE Online Working Paper Series 547, Competitive Advantage in the Global Economy (CAGE).
- Caria, Stefano & Gordon, Grant & Kasy, Maximilian & Quinn, Simon & Shami, Soha & Teytelboym, Alexander, 2021. "An Adaptive Targeted Field Experiment : Job Search Assistance for Refugees in Jordan," The Warwick Economics Research Paper Series (TWERPS) 1335, University of Warwick, Department of Economics.
- Quinn, Simon & Caria, Stefano & Gordon, Grant & Kasy, Maximilian & Shami, Soha & Teytelboym, Alexander, 2020. "An Adaptive Targeted Field Experiment: Job Search Assistance for Refugees in Jordan," CEPR Discussion Papers 15359, C.E.P.R. Discussion Papers.
- Stefano Caria & Grant Gordon & Maximilian Kasy & Simon Quinn & Soha Shami & Alexander Teytelboym, 2020. "An Adaptive Targeted Field Experiment: Job Search Assistance for Refugees in Jordan," CESifo Working Paper Series 8535, CESifo.
Hana Choi & Carl F. Mela & Santiago R. Balseiro & Adam Leary, 2020. "Online Display Advertising Markets: A Literature Review and Future Directions," Information Systems Research, INFORMS, vol. 31(2), pages 556-575, June.
Duflo, Esther & Banerjee, Abhijit & Keniston, Daniel, 2019. "The Efficient Deployment of Police Resources: Theory and New Evidence from a Randomized Drunk Driving Crackdown in India," CEPR Discussion Papers 13981, C.E.P.R. Discussion Papers.
- Abhijit Banerjee & Esther Duflo & Daniel Keniston & Nina Singh, 2019. "The Efficient Deployment of Police Resources: Theory and New Evidence from a Randomized Drunk Driving Crackdown in India," NBER Working Papers 26224, National Bureau of Economic Research, Inc.
Eric M. Schwartz & Eric T. Bradlow & Peter S. Fader, 2017. "Customer Acquisition via Display Advertising Using Multi-Armed Bandit Experiments," Marketing Science, INFORMS, vol. 36(4), pages 500-522, July.
Gui Liberali & Alina Ferecatu, 2022. "Morphing for Consumer Dynamics: Bandits Meet Hidden Markov Models," Marketing Science, INFORMS, vol. 41(4), pages 769-794, July.
Chao Qin & Daniel Russo, 2024. "Optimizing Adaptive Experiments: A Unified Approach to Regret Minimization and Best-Arm Identification," Papers 2402.10592, arXiv.org, revised Jul 2024.
Dean Eckles & Maurits Kaptein, 2019. "Bootstrap Thompson Sampling and Sequential Decision Problems in the Behavioral Sciences," SAGE Open, , vol. 9(2), pages 21582440198, June.
Alina Ferecatu & Arnaud De Bruyn, 2022. "Understanding Managers’ Trade-Offs Between Exploration and Exploitation," Marketing Science, INFORMS, vol. 41(1), pages 139-165, January.
Sareh Nabi & Houssam Nassif & Joseph Hong & Hamed Mamani & Guido Imbens, 2022. "Bayesian Meta-Prior Learning Using Empirical Bayes," Management Science, INFORMS, vol. 68(3), pages 1737-1755, March.
Maria Dimakopoulou & Zhimei Ren & Zhengyuan Zhou, 2021. "Online Multi-Armed Bandits with Adaptive Inference," Papers 2102.13202, arXiv.org, revised Jun 2021.
Guido W. Imbens, 2020. "Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 1129-1179, December.
- Guido Imbens, 2019. "Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics," NBER Working Papers 26104, National Bureau of Economic Research, Inc.
Manini Madireddy & Ramasubramanian Sundararajan & Goda Doreswamy & Meisam Hejazi Nia & Amod Mital, 2017. "Constructing bundled offers for airline customers," Journal of Revenue and Pricing Management, Palgrave Macmillan, vol. 16(6), pages 532-552, December.
Mauersberger, Felix, 2021. "Monetary policy rules in a non-rational world: A macroeconomic experiment," Journal of Economic Theory, Elsevier, vol. 197(C).
Daniel Russo & Benjamin Van Roy, 2018. "Learning to Optimize via Information-Directed Sampling," Operations Research, INFORMS, vol. 66(1), pages 230-252, January.
Kohei Kawaguchi, 2021. "When Will Workers Follow an Algorithm? A Field Experiment with a Retail Business," Management Science, INFORMS, vol. 67(3), pages 1670-1695, March.
Mingyu Joo & Michael L. Thompson & Greg M. Allenby6, 2019. "Optimal Product Design by Sequential Experiments in High Dimensions," Management Science, INFORMS, vol. 65(7), pages 3235-3254, July.
Elea McDonnell Feit & Ron Berman, 2019. "Test & Roll: Profit-Maximizing A/B Tests," Marketing Science, INFORMS, vol. 38(6), pages 1038-1058, November.
Po-Yi Liu & Chi-Hua Wang & Henghsiu Tsai, 2022. "Non-Stationary Dynamic Pricing Via Actor-Critic Information-Directed Pricing," Papers 2208.09372, arXiv.org, revised Sep 2022.
Yixin Tang & Yicong Lin & Navdeep S. Sahni, 2023. "Business Policy Experiments using Fractional Factorial Designs: Consumer Retention on DoorDash," Papers 2311.14698, arXiv.org, revised Nov 2023.

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wly:apsmbi:v:26:y:2010:i:6:p:639-658. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

We have no bibliographic references for this item. You can help adding them by using this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://doi.org/10.1002/(ISSN)1526-4025 .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

A modern Bayesian look at the multi‐armed bandit

Author

Abstract

Suggested Citation

Download full text from publisher

Citations

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data