IDEAS home Printed from https://ideas.repec.org/a/inm/oropre/v53y2005i1p126-139.html
   My bibliography  Save this article

An Adaptive Sampling Algorithm for Solving Markov Decision Processes

Author

Listed:
  • Hyeong Soo Chang

    (Department of Computer Science and Engineering, Sogang University, Seoul, Korea)

  • Michael C. Fu

    (Robert H. Smith School of Business, and Institute for Systems Research, University of Maryland, College Park, Maryland 20742)

  • Jiaqiao Hu

    (Department of Electrical and Computer Engineering, and Institute for Systems Research, University of Maryland, College Park, Maryland 20742)

  • Steven I. Marcus

    (Department of Electrical and Computer Engineering, and Institute for Systems Research, University of Maryland, College Park, Maryland 20742)

Abstract

Based on recent results for multiarmed bandit problems, we propose an adaptive sampling algorithm that approximates the optimal value of a finite-horizon Markov decision process (MDP) with finite state and action spaces. The algorithm adaptively chooses which action to sample as the sampling process proceeds and generates an asymptotically unbiased estimator, whose bias is bounded by a quantity that converges to zero at rate (ln N )/ N , where N is the total number of samples that are used per state sampled in each stage. The worst-case running-time complexity of the algorithm is O (( |A|N ) H ), independent of the size of the state space, where | A | is the size of the action space and H is the horizon length. The algorithm can be used to create an approximate receding horizon control to solve infinite-horizon MDPs. To illustrate the algorithm, computational results are reported on simple examples from inventory control.

Suggested Citation

  • Hyeong Soo Chang & Michael C. Fu & Jiaqiao Hu & Steven I. Marcus, 2005. "An Adaptive Sampling Algorithm for Solving Markov Decision Processes," Operations Research, INFORMS, vol. 53(1), pages 126-139, February.
  • Handle: RePEc:inm:oropre:v:53:y:2005:i:1:p:126-139
    DOI: 10.1287/opre.1040.0145
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/opre.1040.0145
    Download Restriction: no

    File URL: https://libkey.io/10.1287/opre.1040.0145?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Broadie, Mark & Glasserman, Paul, 1997. "Pricing American-style securities using simulation," Journal of Economic Dynamics and Control, Elsevier, vol. 21(8-9), pages 1323-1352, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Woonghee Tim Huh & Paat Rusmevichientong, 2014. "Online Sequential Optimization with Biased Gradients: Theory and Applications to Censored Demand," INFORMS Journal on Computing, INFORMS, vol. 26(1), pages 150-159, February.
    2. Mohammed Shahid Abdulla & Shalabh Bhatnagar, 2016. "Multi-armed bandits based on a variant of Simulated Annealing," Indian Journal of Pure and Applied Mathematics, Springer, vol. 47(2), pages 195-212, June.
    3. Arnoud V. den Boer & Bert Zwart, 2015. "Dynamic Pricing and Learning with Finite Inventories," Operations Research, INFORMS, vol. 63(4), pages 965-978, August.
    4. Michael C. Fu, 2019. "Simulation-Based Algorithms for Markov Decision Processes: Monte Carlo Tree Search from AlphaGo to AlphaZero," Asia-Pacific Journal of Operational Research (APJOR), World Scientific Publishing Co. Pte. Ltd., vol. 36(06), pages 1-25, December.
    5. Rishabh Gupta & Qi Zhang, 2022. "Decomposition and Adaptive Sampling for Data-Driven Inverse Linear Optimization," INFORMS Journal on Computing, INFORMS, vol. 34(5), pages 2720-2735, September.
    6. William L. Cooper & Bharath Rangarajan, 2012. "Performance Guarantees for Empirical Markov Decision Processes with Applications to Multiperiod Inventory Models," Operations Research, INFORMS, vol. 60(5), pages 1267-1281, October.
    7. Oleg Szehr, 2021. "Hedging of Financial Derivative Contracts via Monte Carlo Tree Search," Papers 2102.06274, arXiv.org, revised Apr 2021.
    8. Ronald Ortner, 2013. "Adaptive aggregation for reinforcement learning in average reward Markov decision processes," Annals of Operations Research, Springer, vol. 208(1), pages 321-336, September.
    9. Daniel R. Jiang & Lina Al-Kanj & Warren B. Powell, 2020. "Optimistic Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds," Operations Research, INFORMS, vol. 68(6), pages 1678-1697, November.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. O. Samimi & Z. Mardani & S. Sharafpour & F. Mehrdoust, 2017. "LSM Algorithm for Pricing American Option Under Heston–Hull–White’s Stochastic Volatility Model," Computational Economics, Springer;Society for Computational Economics, vol. 50(2), pages 173-187, August.
    2. Grosen, Anders & Lochte Jorgensen, Peter, 2000. "Fair valuation of life insurance liabilities: The impact of interest rate guarantees, surrender options, and bonus policies," Insurance: Mathematics and Economics, Elsevier, vol. 26(1), pages 37-57, February.
    3. E. Nasakkala & J. Keppo, 2008. "Hydropower with Financial Information," Applied Mathematical Finance, Taylor & Francis Journals, vol. 15(5-6), pages 503-529.
    4. Chi H. Truong, 2014. "A Two Factor Model for Water Prices and Its Implications for Evaluating Real Options and Other Water Price Derivatives," Canadian Journal of Agricultural Economics/Revue canadienne d'agroeconomie, Canadian Agricultural Economics Society/Societe canadienne d'agroeconomie, vol. 62(1), pages 23-45, March.
    5. Augusto Castillo, 2004. "Firm and Corporate Bond Valuation: A Simulation Dynamic Programming Approach," Latin American Journal of Economics-formerly Cuadernos de Economía, Instituto de Economía. Pontificia Universidad Católica de Chile., vol. 41(124), pages 345-360.
    6. Semih Yon & Cafer Erhan Bozdag, 2014. "Test of Log-Normal Process with Importance Sampling for Options Pricing," Proceedings of Economics and Finance Conferences 0401571, International Institute of Social and Economic Sciences.
    7. repec:hum:wpaper:sfb649dp2006-051 is not listed on IDEAS
    8. Anne Laure Bronstein & Gilles Pagès & Jacques Portès, 2013. "Multi-asset American Options and Parallel Quantization," Methodology and Computing in Applied Probability, Springer, vol. 15(3), pages 547-561, September.
    9. Beveridge, Christopher & Joshi, Mark & Tang, Robert, 2013. "Practical policy iteration: Generic methods for obtaining rapid and tight bounds for Bermudan exotic derivatives using Monte Carlo simulation," Journal of Economic Dynamics and Control, Elsevier, vol. 37(7), pages 1342-1361.
    10. Denis Belomestny & Grigori Milstein & Vladimir Spokoiny, 2009. "Regression methods in pricing American and Bermudan options using consumption processes," Quantitative Finance, Taylor & Francis Journals, vol. 9(3), pages 315-327.
    11. David Heath & Eckhard Platen, 2002. "A variance reduction technique based on integral representations," Quantitative Finance, Taylor & Francis Journals, vol. 2(5), pages 362-369.
    12. Lim, Terence & Lo, Andrew W. & Merton, Robert C. & Scholes, Myron S., 2006. "The Derivatives Sourcebook," Foundations and Trends(R) in Finance, now publishers, vol. 1(5–6), pages 365-572, April.
    13. N. Hilber & N. Reich & C. Schwab & C. Winter, 2009. "Numerical methods for Lévy processes," Finance and Stochastics, Springer, vol. 13(4), pages 471-500, September.
    14. Andreas Thomann, 2021. "Multi-asset scenario building for trend-following trading strategies," Annals of Operations Research, Springer, vol. 299(1), pages 293-315, April.
    15. Belomestny, Denis & Kolodko, Anastasia & Schoenmakers, John G. M., 2009. "Regression methods for stochastic control problems and their convergence analysis," SFB 649 Discussion Papers 2009-026, Humboldt University Berlin, Collaborative Research Center 649: Economic Risk.
    16. Jérôme Detemple, 2014. "Optimal Exercise for Derivative Securities," Annual Review of Financial Economics, Annual Reviews, vol. 6(1), pages 459-487, December.
    17. Duy Nguyen, 2018. "A hybrid Markov chain-tree valuation framework for stochastic volatility jump diffusion models," International Journal of Financial Engineering (IJFE), World Scientific Publishing Co. Pte. Ltd., vol. 5(04), pages 1-30, December.
    18. Cosma, Antonio & Galluccio, Stefano & Pederzoli, Paola & Scaillet, Olivier, 2020. "Early Exercise Decision in American Options with Dividends, Stochastic Volatility, and Jumps," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 55(1), pages 331-356, February.
    19. Lars Stentoft, 2004. "Convergence of the Least Squares Monte Carlo Approach to American Option Valuation," Management Science, INFORMS, vol. 50(9), pages 1193-1203, September.
    20. Suresh M. Sundaresan, 2000. "Continuous‐Time Methods in Finance: A Review and an Assessment," Journal of Finance, American Finance Association, vol. 55(4), pages 1569-1622, August.
    21. Li, Chenxu & Ye, Yongxin, 2019. "Pricing and Exercising American Options: an Asymptotic Expansion Approach," Journal of Economic Dynamics and Control, Elsevier, vol. 107(C), pages 1-1.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:oropre:v:53:y:2005:i:1:p:126-139. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.