IDEAS home Printed from https://ideas.repec.org/a/inm/ormoor/v42y2017i3p577-598.html
   My bibliography  Save this article

Polynomial-Time Computation of Strong and n -Present-Value Optimal Policies in Markov Decision Chains

Author

Listed:
  • Michael O’Sullivan

    (Department of Engineering Science, University of Auckland, Auckland 1142, New Zealand)

  • Arthur F. Veinott, Jr.

    (Formerly at the Department of Management Science and Engineering, Stanford University, Stanford California)

Abstract

This paper studies the problem of finding a stationary strong present-value optimal and, more generally, an n -present-value optimal, policy for an infinite-horizon stationary finite-state-action substochastic Markov decision chain. A policy is strong present-value optimal if it has maximum present value for all small positive interest rates ρ . A policy is n - present-value optimal if its present value falls below the maximum present value for all small positive ρ by O ( ρ n +1 ). The importance of stationary n -present-value optimal policies arises from the following known facts. The set of such policies diminishes with n and, for n ≥ S , is precisely the set of stationary strong present-value optimal policies. For 0 ≤ n < S , stationary n -present-value optimal policies are nearly strong present-value optimal and are of independent interest. The best algorithms for finding stationary strong present-value optimal policies find stationary n -present-value policies for n = −1,…, S in that order. This paper decomposes the problem of finding a stationary n -present-value optimal policy given a stationary ( n − 1)-present-value optimal policy into a sequence of three subproblems, each entailing either maximizing transient value or reward rate. It is known that both subproblem types can be represented as linear programs and solved in polynomial time, e.g., by interior-point and ellipsoid methods. This paper also establishes the following results. The size of the input to each subproblem is polynomial in the size of the original problem data. A stationary strong present-value (respectively, n -present-value) optimal policy can be found in polynomial time. For the case of unique-transition systems, i.e., each action in a state sends the system to at most one state, a stationary strong present-value (respectively, n -present-value) optimal policy can be found in strongly polynomial time using combinatorial methods on the subproblems. The last case includes standard deterministic dynamic programs.

Suggested Citation

  • Michael O’Sullivan & Arthur F. Veinott, Jr., 2017. "Polynomial-Time Computation of Strong and n -Present-Value Optimal Policies in Markov Decision Chains," Mathematics of Operations Research, INFORMS, vol. 42(3), pages 577-598, August.
  • Handle: RePEc:inm:ormoor:v:42:y:2017:i:3:p:577-598
    DOI: 10.1287/moor.2016.0812
    as

    Download full text from publisher

    File URL: https://doi.org/10.1287/moor.2016.0812
    Download Restriction: no

    File URL: https://libkey.io/10.1287/moor.2016.0812?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Dimitri P. Bertsekas & John N. Tsitsiklis, 1991. "An Analysis of Stochastic Shortest Path Problems," Mathematics of Operations Research, INFORMS, vol. 16(3), pages 580-595, August.
    2. Christos H. Papadimitriou & John N. Tsitsiklis, 1987. "The Complexity of Markov Decision Processes," Mathematics of Operations Research, INFORMS, vol. 12(3), pages 441-450, August.
    3. Keith W. Ross & Ravi Varadarajan, 1991. "Multichain Markov Decision Processes with a Sample Path Constraint: A Decomposition Approach," Mathematics of Operations Research, INFORMS, vol. 16(1), pages 195-207, February.
    4. B. Curtis Eaves & Arthur F. Veinott, 2014. "Maximum-Stopping-Value Policies in Finite Markov Population Decision Chains," Mathematics of Operations Research, INFORMS, vol. 39(3), pages 597-606, August.
    5. Eric V. Denardo, 1970. "On Linear Programming in a Markov Decision Problem," Management Science, INFORMS, vol. 16(5), pages 281-288, January.
    6. Alan S. Manne, 1960. "Linear Programming and Sequential Decisions," Management Science, INFORMS, vol. 6(3), pages 259-267, April.
    7. Yinyu Ye & Michael J. Todd & Shinji Mizuno, 1994. "An O(√nL)-Iteration Homogeneous and Self-Dual Linear Programming Algorithm," Mathematics of Operations Research, INFORMS, vol. 19(1), pages 53-67, February.
    8. A. Hordijk & L. C. M. Kallenberg, 1979. "Linear Programming and Markov Decision Chains," Management Science, INFORMS, vol. 25(4), pages 352-362, April.
    9. Cyrus Derman, 1962. "On Sequential Decisions and Markov Chains," Management Science, INFORMS, vol. 9(1), pages 16-24, October.
    10. Mark Hartmann & Cristina Arguelles, 1999. "Transience Bounds for Long Walks," Mathematics of Operations Research, INFORMS, vol. 24(2), pages 414-439, May.
    11. Erling D. Andersen & Yinyu Ye, 1996. "Combining Interior-Point and Pivoting Algorithms for Linear Programming," Management Science, INFORMS, vol. 42(12), pages 1719-1731, December.
    12. Robert G. Jeroslow, 1973. "Asymptotic Linear Programming," Operations Research, INFORMS, vol. 21(5), pages 1128-1141, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. B. Curtis Eaves & Arthur F. Veinott, 2014. "Maximum-Stopping-Value Policies in Finite Markov Population Decision Chains," Mathematics of Operations Research, INFORMS, vol. 39(3), pages 597-606, August.
    2. Lodewijk Kallenberg, 2013. "Derman’s book as inspiration: some results on LP for MDPs," Annals of Operations Research, Springer, vol. 208(1), pages 63-94, September.
    3. Dmitry Krass & O. J. Vrieze, 2002. "Achieving Target State-Action Frequencies in Multichain Average-Reward Markov Decision Processes," Mathematics of Operations Research, INFORMS, vol. 27(3), pages 545-566, August.
    4. Guillot, Matthieu & Stauffer, Gautier, 2020. "The Stochastic Shortest Path Problem: A polyhedral combinatorics perspective," European Journal of Operational Research, Elsevier, vol. 285(1), pages 148-158.
    5. D. P. de Farias & B. Van Roy, 2003. "The Linear Programming Approach to Approximate Dynamic Programming," Operations Research, INFORMS, vol. 51(6), pages 850-865, December.
    6. K. Helmes & R. H. Stockbridge, 2000. "Numerical Comparison of Controls and Verification of Optimality for Stochastic Control Problems," Journal of Optimization Theory and Applications, Springer, vol. 106(1), pages 107-127, July.
    7. Alexander Zadorojniy & Guy Even & Adam Shwartz, 2009. "A Strongly Polynomial Algorithm for Controlled Queues," Mathematics of Operations Research, INFORMS, vol. 34(4), pages 992-1007, November.
    8. Prasenjit Mondal, 2020. "Computing semi-stationary optimal policies for multichain semi-Markov decision processes," Annals of Operations Research, Springer, vol. 287(2), pages 843-865, April.
    9. Guy Even & Alexander Zadorojniy, 2012. "Strong polynomiality of the Gass-Saaty shadow-vertex pivoting rule for controlled random walks," Annals of Operations Research, Springer, vol. 201(1), pages 159-167, December.
    10. Ian Post & Yinyu Ye, 2015. "The Simplex Method is Strongly Polynomial for Deterministic Markov Decision Processes," Mathematics of Operations Research, INFORMS, vol. 40(4), pages 859-868, October.
    11. Yinyu Ye, 2011. "The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate," Mathematics of Operations Research, INFORMS, vol. 36(4), pages 593-603, November.
    12. José Niño-Mora, 2006. "Restless Bandit Marginal Productivity Indices, Diminishing Returns, and Optimal Control of Make-to-Order/Make-to-Stock M/G/1 Queues," Mathematics of Operations Research, INFORMS, vol. 31(1), pages 50-84, February.
    13. Oguzhan Alagoz & Lisa M. Maillart & Andrew J. Schaefer & Mark S. Roberts, 2007. "Determining the Acceptance of Cadaveric Livers Using an Implicit Model of the Waiting List," Operations Research, INFORMS, vol. 55(1), pages 24-36, February.
    14. Raymond K. Cheung & B. Muralidharan, 2000. "Dynamic Routing for Priority Shipments in LTL Service Networks," Transportation Science, INFORMS, vol. 34(1), pages 86-98, February.
    15. E. Nikolova & N. E. Stier-Moses, 2014. "A Mean-Risk Model for the Traffic Assignment Problem with Stochastic Travel Times," Operations Research, INFORMS, vol. 62(2), pages 366-382, April.
    16. Jérôme Renault & Xavier Venel, 2017. "Long-Term Values in Markov Decision Processes and Repeated Games, and a New Distance for Probability Spaces," Mathematics of Operations Research, INFORMS, vol. 42(2), pages 349-376, May.
    17. Enzo Busseti, 2019. "Derivative of a Conic Problem with a Unique Solution," Papers 1903.05753, arXiv.org, revised Mar 2019.
    18. Luo, Z-Q. & Sturm, J.F. & Zhang, S., 1998. "Conic convex programming and self-dual embedding," Econometric Institute Research Papers EI 9815, Erasmus University Rotterdam, Erasmus School of Economics (ESE), Econometric Institute.
    19. Höfferl, F. & Steinschorn, D., 2009. "A dynamic programming extension to the steady state refinery-LP," European Journal of Operational Research, Elsevier, vol. 197(2), pages 465-474, September.
    20. Eric A. Hansen, 2017. "Error bounds for stochastic shortest path problems," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 86(1), pages 1-27, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormoor:v:42:y:2017:i:3:p:577-598. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.