Optimistic Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds

My bibliography Save this article

Optimistic Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds

Author

Listed:

Daniel R. Jiang
(Department of Industrial Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15213)
Lina Al-Kanj
(Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08544)
Warren B. Powell
(Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08544)

Registered:

Abstract

Monte Carlo tree search (MCTS), most famously used in game-play artificial intelligence (e.g., the game of Go), is a well-known strategy for constructing approximate solutions to sequential decision problems. Its primary innovation is the use of a heuristic, known as a default policy , to obtain Monte Carlo estimates of downstream values for states in a decision tree. This information is used to iteratively expand the tree toward regions of states and actions that an optimal policy might visit. However, to guarantee convergence to the optimal action, MCTS requires the entire tree to be expanded asymptotically. In this paper, we propose a new “optimistic” tree search technique called primal-dual MCTS that uses sampled information relaxation upper bounds on potential actions to make tree expansion decisions, creating the possibility of ignoring parts of the tree that stem from highly suboptimal choices. The core contribution of this paper is to prove that despite converging to a partial decision tree in the limit, the recommended action from primal-dual MCTS is optimal. The new approach shows promise when used to optimize the behavior of a single driver navigating a graph while operating on a ride-sharing platform. Numerical experiments on a real data set of taxi trips in New Jersey suggest that primal-dual MCTS improves on standard MCTS (upper confidence trees) and other policies while exhibiting a reduced sensitivity to the size of the action space.

Suggested Citation

Daniel R. Jiang & Lina Al-Kanj & Warren B. Powell, 2020. "Optimistic Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds," Operations Research, INFORMS, vol. 68(6), pages 1678-1697, November.

Handle: RePEc:inm:oropre:v:68:y:2020:i:6:p:1678-1697
DOI: 10.1287/opre.2019.1939

Download full text from publisher

References listed on IDEAS

David B. Brown & James E. Smith, 2014. "Information Relaxations, Duality, and Convex Stochastic Dynamic Programs," Operations Research, INFORMS, vol. 62(6), pages 1394-1415, December.
Yiwei Chen & Vivek F. Farias, 2013. "Simple Policies for Dynamic Pricing with Imperfect Forecasts," Operations Research, INFORMS, vol. 61(3), pages 612-624, June.
Juliana M. Nascimento & Warren B. Powell, 2009. "An Optimal Approximate Dynamic Programming Algorithm for the Lagged Asset Acquisition Problem," Mathematics of Operations Research, INFORMS, vol. 34(1), pages 210-237, February.
Guoming Lai & Mulan X. Wang & Sunder Kekre & Alan Scheller-Wolf & Nicola Secomandi, 2011. "Valuation of Storage at a Liquefied Natural Gas Terminal," Operations Research, INFORMS, vol. 59(3), pages 602-616, June.
Bertsimas, Dimitris & Griffith, J. Daniel & Gupta, Vishal & Kochenderfer, Mykel J. & Mišić, Velibor V., 2017. "A comparison of Monte Carlo tree search and rolling horizon optimization for large-scale dynamic resource allocation problems," European Journal of Operational Research, Elsevier, vol. 263(2), pages 664-678.
David B. Brown & James E. Smith, 2011. "Dynamic Portfolio Optimization with Transaction Costs: Heuristics and Dual Bounds," Management Science, INFORMS, vol. 57(10), pages 1752-1770, October.
Vijay V. Desai & Vivek F. Farias & Ciamac C. Moallemi, 2012. "Pathwise Optimization for Optimal Stopping Problems," Management Science, INFORMS, vol. 58(12), pages 2292-2308, December.
David B. Brown & James E. Smith & Peng Sun, 2010. "Information Relaxations and Duality in Stochastic Dynamic Programs," Operations Research, INFORMS, vol. 58(4-part-1), pages 785-801, August.
Stefanus Jasin & Sunil Kumar, 2012. "A Re-Solving Heuristic with Bounded Revenue Loss for Network Revenue Management with Customer Choice," Mathematics of Operations Research, INFORMS, vol. 37(2), pages 313-345, May.
Leif Andersen & Mark Broadie, 2004. "Primal-Dual Simulation Algorithm for Pricing Multidimensional American Options," Management Science, INFORMS, vol. 50(9), pages 1222-1234, September.
Hyeong Soo Chang & Michael C. Fu & Jiaqiao Hu & Steven I. Marcus, 2005. "An Adaptive Sampling Algorithm for Solving Markov Decision Processes," Operations Research, INFORMS, vol. 53(1), pages 126-139, February.
Suresh Chand & Vernon Ning Hsu & Suresh Sethi, 2002. "Forecast, Solution, and Rolling Horizons in Operations Management Problems: A Classified Bibliography," Manufacturing & Service Operations Management, INFORMS, vol. 4(1), pages 25-43, September.
Martin B. Haugh & Leonid Kogan, 2004. "Pricing American Options: A Duality Approach," Operations Research, INFORMS, vol. 52(2), pages 258-270, April.
Selvaprabu Nadarajah & François Margot & Nicola Secomandi, 2015. "Relaxations of Approximate Linear Programs for the Real Option Management of Commodity Storage," Management Science, INFORMS, vol. 61(12), pages 3054-3076, December.
David Silver & Aja Huang & Chris J. Maddison & Arthur Guez & Laurent Sifre & George van den Driessche & Julian Schrittwieser & Ioannis Antonoglou & Veda Panneershelvam & Marc Lanctot & Sander Dieleman, 2016. "Mastering the game of Go with deep neural networks and tree search," Nature, Nature, vol. 529(7587), pages 484-489, January.
L. C. G. Rogers, 2002. "Monte Carlo valuation of American options," Mathematical Finance, Wiley Blackwell, vol. 12(3), pages 271-286, July.
Broadie, Mark & Glasserman, Paul, 1997. "Pricing American-style securities using simulation," Journal of Economic Dynamics and Control, Elsevier, vol. 21(8-9), pages 1323-1352, June.
Justin C. Goodson & Barrett W. Thomas & Jeffrey W. Ohlmann, 2016. "Restocking-Based Rollout Policies for the Vehicle Routing Problem with Stochastic Demand and Duration Limits," Transportation Science, INFORMS, vol. 50(2), pages 591-607, May.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Deniz Preil & Michael Krapp, 2022. "Artificial intelligence-based inventory management: a Monte Carlo tree search approach," Annals of Operations Research, Springer, vol. 308(1), pages 415-439, January.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Santiago R. Balseiro & David B. Brown, 2019. "Approximations to Stochastic Dynamic Programs via Information Relaxation Duality," Operations Research, INFORMS, vol. 67(2), pages 577-597, March.
David B. Brown & Martin B. Haugh, 2017. "Information Relaxation Bounds for Infinite Horizon Markov Decision Processes," Operations Research, INFORMS, vol. 65(5), pages 1355-1379, October.
Alessio Trivella & Danial Mohseni-Taheri & Selvaprabu Nadarajah, 2023. "Meeting Corporate Renewable Power Targets," Management Science, INFORMS, vol. 69(1), pages 491-512, January.
Helin Zhu & Fan Ye & Enlu Zhou, 2013. "Fast Estimation of True Bounds on Bermudan Option Prices under Jump-diffusion Processes," Papers 1305.4321, arXiv.org.
Cosma, Antonio & Galluccio, Stefano & Pederzoli, Paola & Scaillet, Olivier, 2020. "Early Exercise Decision in American Options with Dividends, Stochastic Volatility, and Jumps," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 55(1), pages 331-356, February.
- Antonio Cosma & Stefano Galluccio & Paola Pederzoli & Olivier Scaillet, 2016. "Early exercise decision in American options with dividends, stochastic volatility and jumps," Papers 1612.03031, arXiv.org.
- Antonio Cosma & Stefano Galluccio & Paola Pederzoli & O. Scaillet, 2016. "Early Exercise Decision in American Options with Dividends, Stochastic Volatility and Jumps," Swiss Finance Institute Research Paper Series 16-73, Swiss Finance Institute.
Antonio Cosma & Stefano Galluccio & Paola Pederzoli & O. Scaillet, 2012. "Valuing American Options Using Fast Recursive Projections," Swiss Finance Institute Research Paper Series 12-26, Swiss Finance Institute.
- Cosma, Antonio & Galluccio, Stefano & Pederzoli, Paola & Scaillet, Olivier, 2016. "Valuing American options using fast recursive projections," Working Papers unige:82087, University of Geneva, Geneva School of Economics and Management.
- Antonio Cosma & Stefano Galluccio & Paola Pederzoli & Olivier Scaillet, 2015. "Valuing American options using fast recursive projections," DEM Discussion Paper Series 15-20, Department of Economics at the University of Luxembourg.
- Cosma, Antonio & Galluccio, Stefano & Scaillet, Olivier, 2012. "Valuing American options using fast recursive projections," Working Papers unige:41856, University of Geneva, Geneva School of Economics and Management.
Mark Broadie & Weiwei Shen, 2016. "High-Dimensional Portfolio Optimization With Transaction Costs," International Journal of Theoretical and Applied Finance (IJTAF), World Scientific Publishing Co. Pte. Ltd., vol. 19(04), pages 1-49, June.
David B. Brown & James E. Smith, 2013. "Optimal Sequential Exploration: Bandits, Clairvoyants, and Wildcats," Operations Research, INFORMS, vol. 61(3), pages 644-665, June.
Vijay V. Desai & Vivek F. Farias & Ciamac C. Moallemi, 2012. "Pathwise Optimization for Optimal Stopping Problems," Management Science, INFORMS, vol. 58(12), pages 2292-2308, December.
Dragos Florin Ciocan & Velibor V. Mišić, 2022. "Interpretable Optimal Stopping," Management Science, INFORMS, vol. 68(3), pages 1616-1638, March.
Bradley Sturt, 2021. "A nonparametric algorithm for optimal stopping based on robust optimization," Papers 2103.03300, arXiv.org, revised Mar 2023.
Helin Zhu & Fan Ye & Enlu Zhou, 2015. "Fast estimation of true bounds on Bermudan option prices under jump-diffusion processes," Quantitative Finance, Taylor & Francis Journals, vol. 15(11), pages 1885-1900, November.
Christian Bender & Christian Gaertner & Nikolaus Schweizer, 2016. "Pathwise Iteration for Backward SDEs," Papers 1605.07500, arXiv.org, revised Jun 2016.
David B. Brown & James E. Smith, 2014. "Information Relaxations, Duality, and Convex Stochastic Dynamic Programs," Operations Research, INFORMS, vol. 62(6), pages 1394-1415, December.
Christian Bender & Christian Gärtner & Nikolaus Schweizer, 2018. "Pathwise Dynamic Programming," Mathematics of Operations Research, INFORMS, vol. 43(3), pages 965-965, August.
Sebastian Becker & Patrick Cheridito & Arnulf Jentzen & Timo Welti, 2019. "Solving high-dimensional optimal stopping problems using deep learning," Papers 1908.01602, arXiv.org, revised Aug 2021.
Anna Maria Gambaro & Nicola Secomandi, 2021. "A Discussion of Non‐Gaussian Price Processes for Energy and Commodity Operations," Production and Operations Management, Production and Operations Management Society, vol. 30(1), pages 47-67, January.
Secomandi, Nicola & Seppi, Duane J., 2014. "Real Options and Merchant Operations of Energy and Other Commodities," Foundations and Trends(R) in Technology, Information and Operations Management, now publishers, vol. 6(3-4), pages 161-331, July.
Denis Belomestny & Grigori Milstein & Vladimir Spokoiny, 2009. "Regression methods in pricing American and Bermudan options using consumption processes," Quantitative Finance, Taylor & Francis Journals, vol. 9(3), pages 315-327.
- Belomestny, Denis & Milstein, Grigori N. & Spokoiny, Vladimir, 2006. "Regression methods in pricing American and Bermudan options using consumption processes," SFB 649 Discussion Papers 2006-051, Humboldt University Berlin, Collaborative Research Center 649: Economic Risk.
Christian Bender & Nikolaus Schweizer & Jia Zhuo, 2013. "A primal-dual algorithm for BSDEs," Papers 1310.3694, arXiv.org, revised Sep 2014.

More about this item

Keywords

Monte Carlo tree search; dynamic programming; information relaxation;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:oropre:v:68:y:2020:i:6:p:1678-1697. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Optimistic Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data