On the Asymptotic Optimality of Finite Approximations to Markov Decision Processes with Borel Spaces

My bibliography Save this article

On the Asymptotic Optimality of Finite Approximations to Markov Decision Processes with Borel Spaces

Author

Listed:

Naci Saldi
(Coordinated Science Laboratory, University of Illinois, Urbana, Illinois 61801, USA)
Serdar Yüksel
(Department of Mathematics and Statistics, Queen’s University, Kingston, Ontario, Canada, K7L 3N6)
Tamás Linder
(Department of Mathematics and Statistics, Queen’s University, Kingston, Ontario, Canada, K7L 3N6)

Registered:

Abstract

Calculating optimal policies is known to be computationally difficult for Markov decision processes (MDPs) with Borel state and action spaces. This paper studies finite-state approximations of discrete time Markov decision processes with Borel state and action spaces, for both discounted and average costs criteria. The stationary policies thus obtained are shown to approximate the optimal stationary policy with arbitrary precision under quite general conditions for discounted cost and more restrictive conditions for average cost. For compact-state MDPs, we obtain explicit rate of convergence bounds quantifying how the approximation improves as the size of the approximating finite state space increases. Using information theoretic arguments, the order optimality of the obtained convergence rates is established for a large class of problems. We also show that as a pre-processing step, the action space can also be finitely approximated with sufficiently large number points; thereby, well known algorithms, such as value or policy iteration, Q -learning, etc., can be used to calculate near optimal policies.

Suggested Citation

Naci Saldi & Serdar Yüksel & Tamás Linder, 2017. "On the Asymptotic Optimality of Finite Approximations to Markov Decision Processes with Borel Spaces," Mathematics of Operations Research, INFORMS, vol. 42(4), pages 945-978, November.

Handle: RePEc:inm:ormoor:v:42:y:2017:i:4:p:945-978
DOI: 10.1287/moor.2016.0832

Download full text from publisher

References listed on IDEAS

Steven E. Shreve & Dimitri P. Bertsekas, 1979. "Universally Measurable Policies in Dynamic Programming," Mathematics of Operations Research, INFORMS, vol. 4(1), pages 15-30, February.
Eugene A. Feinberg & Pavlo O. Kasyanov & Nina V. Zadoianchuk, 2012. "Average Cost Markov Decision Processes with Weakly Continuous Transition Probabilities," Mathematics of Operations Research, INFORMS, vol. 37(4), pages 591-607, November.
Benjamin Van Roy, 2006. "Performance Loss Bounds for Approximate Value Iteration with State Aggregation," Mathematics of Operations Research, INFORMS, vol. 31(2), pages 234-244, May.
Hans-Joachim Langen, 1981. "Convergence of Dynamic Programming Models," Mathematics of Operations Research, INFORMS, vol. 6(4), pages 493-512, November.
Ward Whitt, 1979. "Approximations of Dynamic Programs, II," Mathematics of Operations Research, INFORMS, vol. 4(2), pages 179-185, May.
K. Hinderer, 2005. "Lipschitz Continuity of Value Functions in Markovian Decision Processes," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 62(1), pages 3-22, September.
Charalambos D. Aliprantis & Kim C. Border, 2006. "Infinite Dimensional Analysis," Springer Books, Springer, edition 0, number 978-3-540-29587-7, December.
Ward Whitt, 1978. "Approximations of Dynamic Programs, I," Mathematics of Operations Research, INFORMS, vol. 3(3), pages 231-243, August.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Insoon Yang, 2020. "A Convex Optimization Approach to Dynamic Programming in Continuous State and Action Spaces," Journal of Optimization Theory and Applications, Springer, vol. 187(1), pages 133-157, October.
Harun Avci & Kagan Gokbayrak & Emre Nadar, 2020. "Structural Results for Average‐Cost Inventory Models with Markov‐Modulated Demand and Partial Information," Production and Operations Management, Production and Operations Management Society, vol. 29(1), pages 156-173, January.
Yi Xiong & Ningyuan Chen & Xuefeng Gao & Xiang Zhou, 2022. "Sublinear regret for learning POMDPs," Production and Operations Management, Production and Operations Management Society, vol. 31(9), pages 3491-3504, September.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Naci Saldi & Tamer Başar & Maxim Raginsky, 2019. "Approximate Nash Equilibria in Partially Observed Stochastic Games with Mean-Field Interactions," Mathematics of Operations Research, INFORMS, vol. 44(3), pages 1006-1033, August.
Thomas M. Russell, 2020. "Policy Transforms and Learning Optimal Policies," Papers 2012.11046, arXiv.org.
Robert Kirkby Author-Email: robertkirkby@gmail.com|, 2017. "Convergence of Discretized Value Function Iteration," Computational Economics, Springer;Society for Computational Economics, vol. 49(1), pages 117-153, January.
Naci Saldi & Tamer Bas¸ ar & Maxim Raginsky, 2020. "Approximate Markov-Nash Equilibria for Discrete-Time Risk-Sensitive Mean-Field Games," Mathematics of Operations Research, INFORMS, vol. 45(4), pages 1596-1620, November.
Ma, Qingyin & Stachurski, John & Toda, Alexis Akira, 2022. "Unbounded dynamic programming via the Q-transform," Journal of Mathematical Economics, Elsevier, vol. 100(C).
- Qingyin Ma & John Stachurski & Alexis Akira Toda, 2020. "Unbounded Dynamic Programming via the Q-Transform," Papers 2012.00219, arXiv.org, revised Mar 2021.
Runggaldier, Wolfgang J., 1998. "Concepts and methods for discrete and continuous time control under uncertainty," Insurance: Mathematics and Economics, Elsevier, vol. 22(1), pages 25-39, May.
Huizhen Yu & Dimitri P. Bertsekas, 2015. "A Mixed Value and Policy Iteration Method for Stochastic Control with Universally Measurable Policies," Mathematics of Operations Research, INFORMS, vol. 40(4), pages 926-968, October.
Anton J. Kleywegt & Vijay S. Nori & Martin W. P. Savelsbergh, 2004. "Dynamic Programming Approximations for a Stochastic Inventory Routing Problem," Transportation Science, INFORMS, vol. 38(1), pages 42-70, February.
Insoon Yang, 2020. "A Convex Optimization Approach to Dynamic Programming in Continuous State and Action Spaces," Journal of Optimization Theory and Applications, Springer, vol. 187(1), pages 133-157, October.
So, Wai Ping & Wan, Yat-wah, 2000. "A multi-grid size dynamic programming approach for the production control of a random-speed machine," International Journal of Production Economics, Elsevier, vol. 63(3), pages 267-275, January.
William B. Haskell & Rahul Jain & Dileep Kalathil, 2016. "Empirical Dynamic Programming," Mathematics of Operations Research, INFORMS, vol. 41(2), pages 402-429, May.
Jie Ning & Matthew J. Sobel, 2019. "Easy Affine Markov Decision Processes," Operations Research, INFORMS, vol. 67(6), pages 1719-1737, November.
Nicola Secomandi & François Margot, 2009. "Reoptimization Approaches for the Vehicle-Routing Problem with Stochastic Demands," Operations Research, INFORMS, vol. 57(1), pages 214-230, February.
Knolmayer, Gerhard, 1981. "A simulation study of simplification strategies in the development of optimization models," Manuskripte aus den Instituten für Betriebswirtschaftslehre der Universität Kiel 96, Christian-Albrechts-Universität zu Kiel, Institut für Betriebswirtschaftslehre.
Ali Fattahi & Sriram Dasu & Reza Ahmadi, 2023. "Peak-Load Energy Management by Direct Load Control Contracts," Management Science, INFORMS, vol. 69(5), pages 2788-2813, May.
Michael Z. Spivey & Warren B. Powell, 2004. "The Dynamic Assignment Problem," Transportation Science, INFORMS, vol. 38(4), pages 399-419, November.
Campi, Luciano & Zabaljauregui, Diego, 2020. "Optimal market making under partial information with general intensities," LSE Research Online Documents on Economics 104612, London School of Economics and Political Science, LSE Library.
Kaido, Hiroaki, 2017. "Asymptotically Efficient Estimation Of Weighted Average Derivatives With An Interval Censored Variable," Econometric Theory, Cambridge University Press, vol. 33(5), pages 1218-1241, October.
- Hiroaki Kaido, 2013. "Asymptotically Efficient Estimation of Weighted Average Derivatives with an Inverval Censored Variable," Boston University - Department of Economics - Working Papers Series 2013-022, Boston University - Department of Economics.
- Hiroaki Kaido, 2014. "Asymptotically efficient estimation of weighted average derivatives with an interval censored variable," CeMMAP working papers 03/14, Institute for Fiscal Studies.
- Hiroaki Kaido, 2014. "Asymptotically efficient estimation of weighted average derivatives with an interval censored variable," CeMMAP working papers CWP03/14, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
Andrea Attar & Thomas Mariotti & François Salanié, 2021. "Entry-Proofness and Discriminatory Pricing under Adverse Selection," American Economic Review, American Economic Association, vol. 111(8), pages 2623-2659, August.
- Attar, Andrea & Mariotti, Thomas & Salanié, François, 2017. "Entry-Proofness and Discriminatory Pricing under Adverse Selection," TSE Working Papers 17-788, Toulouse School of Economics (TSE), revised Jan 2021.
- Andrea Attar & Thomas Mariotti & François Salanié, 2021. "Entry-proofness and discriminatory pricing under adverse selection," Post-Print hal-03353054, HAL.
- Andrea Attar & Thomas Mariotti & François Salanié, 2021. "Entry-proofness and discriminatory pricing under adverse selection," Working Papers hal-03485384, HAL.
Askoura, Youcef & Billot, Antoine, 2021. "Social decision for a measure society," Journal of Mathematical Economics, Elsevier, vol. 94(C).
- Youcef Askoura & Antoine Billot, 2021. "Social decision for a measure society," Post-Print hal-04120433, HAL.

More about this item

Keywords

Markov decision processes; stochastic control; finite state approximation; quantization;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormoor:v:42:y:2017:i:4:p:945-978. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

On the Asymptotic Optimality of Finite Approximations to Markov Decision Processes with Borel Spaces

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data