Polynomial-Time Computation of Strong and n -Present-Value Optimal Policies in Markov Decision Chains

My bibliography Save this article

Polynomial-Time Computation of Strong and n -Present-Value Optimal Policies in Markov Decision Chains

Author

Listed:

Michael O’Sullivan
(Department of Engineering Science, University of Auckland, Auckland 1142, New Zealand)
Arthur F. Veinott, Jr.
(Formerly at the Department of Management Science and Engineering, Stanford University, Stanford California)

Registered:

Abstract

This paper studies the problem of finding a stationary strong present-value optimal and, more generally, an n -present-value optimal, policy for an infinite-horizon stationary finite-state-action substochastic Markov decision chain. A policy is strong present-value optimal if it has maximum present value for all small positive interest rates ρ . A policy is n - present-value optimal if its present value falls below the maximum present value for all small positive ρ by O ( ρ n +1 ). The importance of stationary n -present-value optimal policies arises from the following known facts. The set of such policies diminishes with n and, for n ≥ S , is precisely the set of stationary strong present-value optimal policies. For 0 ≤ n < S , stationary n -present-value optimal policies are nearly strong present-value optimal and are of independent interest. The best algorithms for finding stationary strong present-value optimal policies find stationary n -present-value policies for n = −1,…, S in that order. This paper decomposes the problem of finding a stationary n -present-value optimal policy given a stationary ( n − 1)-present-value optimal policy into a sequence of three subproblems, each entailing either maximizing transient value or reward rate. It is known that both subproblem types can be represented as linear programs and solved in polynomial time, e.g., by interior-point and ellipsoid methods. This paper also establishes the following results. The size of the input to each subproblem is polynomial in the size of the original problem data. A stationary strong present-value (respectively, n -present-value) optimal policy can be found in polynomial time. For the case of unique-transition systems, i.e., each action in a state sends the system to at most one state, a stationary strong present-value (respectively, n -present-value) optimal policy can be found in strongly polynomial time using combinatorial methods on the subproblems. The last case includes standard deterministic dynamic programs.

Suggested Citation

Michael O’Sullivan & Arthur F. Veinott, Jr., 2017. "Polynomial-Time Computation of Strong and n -Present-Value Optimal Policies in Markov Decision Chains," Mathematics of Operations Research, INFORMS, vol. 42(3), pages 577-598, August.

Handle: RePEc:inm:ormoor:v:42:y:2017:i:3:p:577-598
DOI: 10.1287/moor.2016.0812

Download full text from publisher

References listed on IDEAS

B. Curtis Eaves & Arthur F. Veinott, 2014. "Maximum-Stopping-Value Policies in Finite Markov Population Decision Chains," Mathematics of Operations Research, INFORMS, vol. 39(3), pages 597-606, August.
Dimitri P. Bertsekas & John N. Tsitsiklis, 1991. "An Analysis of Stochastic Shortest Path Problems," Mathematics of Operations Research, INFORMS, vol. 16(3), pages 580-595, August.
Eric V. Denardo, 1970. "On Linear Programming in a Markov Decision Problem," Management Science, INFORMS, vol. 16(5), pages 281-288, January.
Alan S. Manne, 1960. "Linear Programming and Sequential Decisions," Management Science, INFORMS, vol. 6(3), pages 259-267, April.
Yinyu Ye & Michael J. Todd & Shinji Mizuno, 1994. "An O(√nL)-Iteration Homogeneous and Self-Dual Linear Programming Algorithm," Mathematics of Operations Research, INFORMS, vol. 19(1), pages 53-67, February.
A. Hordijk & L. C. M. Kallenberg, 1979. "Linear Programming and Markov Decision Chains," Management Science, INFORMS, vol. 25(4), pages 352-362, April.
Cyrus Derman, 1962. "On Sequential Decisions and Markov Chains," Management Science, INFORMS, vol. 9(1), pages 16-24, October.
Mark Hartmann & Cristina Arguelles, 1999. "Transience Bounds for Long Walks," Mathematics of Operations Research, INFORMS, vol. 24(2), pages 414-439, May.
Christos H. Papadimitriou & John N. Tsitsiklis, 1987. "The Complexity of Markov Decision Processes," Mathematics of Operations Research, INFORMS, vol. 12(3), pages 441-450, August.
Keith W. Ross & Ravi Varadarajan, 1991. "Multichain Markov Decision Processes with a Sample Path Constraint: A Decomposition Approach," Mathematics of Operations Research, INFORMS, vol. 16(1), pages 195-207, February.
Erling D. Andersen & Yinyu Ye, 1996. "Combining Interior-Point and Pivoting Algorithms for Linear Programming," Management Science, INFORMS, vol. 42(12), pages 1719-1731, December.
Robert G. Jeroslow, 1973. "Asymptotic Linear Programming," Operations Research, INFORMS, vol. 21(5), pages 1128-1141, October.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

B. Curtis Eaves & Arthur F. Veinott, 2014. "Maximum-Stopping-Value Policies in Finite Markov Population Decision Chains," Mathematics of Operations Research, INFORMS, vol. 39(3), pages 597-606, August.
Lodewijk Kallenberg, 2013. "Derman’s book as inspiration: some results on LP for MDPs," Annals of Operations Research, Springer, vol. 208(1), pages 63-94, September.
Dmitry Krass & O. J. Vrieze, 2002. "Achieving Target State-Action Frequencies in Multichain Average-Reward Markov Decision Processes," Mathematics of Operations Research, INFORMS, vol. 27(3), pages 545-566, August.
Guillot, Matthieu & Stauffer, Gautier, 2020. "The Stochastic Shortest Path Problem: A polyhedral combinatorics perspective," European Journal of Operational Research, Elsevier, vol. 285(1), pages 148-158.
D. P. de Farias & B. Van Roy, 2003. "The Linear Programming Approach to Approximate Dynamic Programming," Operations Research, INFORMS, vol. 51(6), pages 850-865, December.
K. Helmes & R. H. Stockbridge, 2000. "Numerical Comparison of Controls and Verification of Optimality for Stochastic Control Problems," Journal of Optimization Theory and Applications, Springer, vol. 106(1), pages 107-127, July.
Alexander Zadorojniy & Guy Even & Adam Shwartz, 2009. "A Strongly Polynomial Algorithm for Controlled Queues," Mathematics of Operations Research, INFORMS, vol. 34(4), pages 992-1007, November.
Prasenjit Mondal, 2020. "Computing semi-stationary optimal policies for multichain semi-Markov decision processes," Annals of Operations Research, Springer, vol. 287(2), pages 843-865, April.
Guy Even & Alexander Zadorojniy, 2012. "Strong polynomiality of the Gass-Saaty shadow-vertex pivoting rule for controlled random walks," Annals of Operations Research, Springer, vol. 201(1), pages 159-167, December.
Ian Post & Yinyu Ye, 2015. "The Simplex Method is Strongly Polynomial for Deterministic Markov Decision Processes," Mathematics of Operations Research, INFORMS, vol. 40(4), pages 859-868, October.
Yinyu Ye, 2011. "The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate," Mathematics of Operations Research, INFORMS, vol. 36(4), pages 593-603, November.
José Niño-Mora, 2006. "Restless Bandit Marginal Productivity Indices, Diminishing Returns, and Optimal Control of Make-to-Order/Make-to-Stock M/G/1 Queues," Mathematics of Operations Research, INFORMS, vol. 31(1), pages 50-84, February.
Oguzhan Alagoz & Lisa M. Maillart & Andrew J. Schaefer & Mark S. Roberts, 2007. "Determining the Acceptance of Cadaveric Livers Using an Implicit Model of the Waiting List," Operations Research, INFORMS, vol. 55(1), pages 24-36, February.
Raymond K. Cheung & B. Muralidharan, 2000. "Dynamic Routing for Priority Shipments in LTL Service Networks," Transportation Science, INFORMS, vol. 34(1), pages 86-98, February.
Jing Wu & Lijie Su & Gongshu Wang & Yang Yang, 2024. "Approximated Dynamic Programming for Production and Inventory Planning Problem in Cold Rolling Process of Steel Production," Mathematics, MDPI, vol. 12(24), pages 1-17, December.
E. Nikolova & N. E. Stier-Moses, 2014. "A Mean-Risk Model for the Traffic Assignment Problem with Stochastic Travel Times," Operations Research, INFORMS, vol. 62(2), pages 366-382, April.
Jérôme Renault & Xavier Venel, 2017. "Long-Term Values in Markov Decision Processes and Repeated Games, and a New Distance for Probability Spaces," Mathematics of Operations Research, INFORMS, vol. 42(2), pages 349-376, May.
- Jérôme Renault & Xavier Venel, 2017. "Long-term values in Markov Decision Processes and Repeated Games, and a new distance for probability spaces," PSE-Ecole d'économie de Paris (Postprint) hal-01396680, HAL.
- Jérôme Renault & Xavier Venel, 2017. "Long-term values in Markov Decision Processes and Repeated Games, and a new distance for probability spaces," Post-Print hal-01396680, HAL.
- Jérôme Renault & Xavier Venel, 2017. "Long-term values in Markov Decision Processes and Repeated Games, and a new distance for probability spaces," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) hal-01396680, HAL.
Enzo Busseti, 2019. "Derivative of a Conic Problem with a Unique Solution," Papers 1903.05753, arXiv.org, revised Mar 2019.
Luo, Z-Q. & Sturm, J.F. & Zhang, S., 1998. "Conic convex programming and self-dual embedding," Econometric Institute Research Papers EI 9815, Erasmus University Rotterdam, Erasmus School of Economics (ESE), Econometric Institute.
Höfferl, F. & Steinschorn, D., 2009. "A dynamic programming extension to the steady state refinery-LP," European Journal of Operational Research, Elsevier, vol. 197(2), pages 465-474, September.

More about this item

Keywords

dynamic programming; computational complexity; infinite horizon;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormoor:v:42:y:2017:i:3:p:577-598. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Polynomial-Time Computation of Strong and n -Present-Value Optimal Policies in Markov Decision Chains

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data