IDEAS home Printed from https://ideas.repec.org/a/inm/oropre/v19y1971i1p244-248.html
   My bibliography  Save this article

Technical Note—On the Asymptotic Convergence Rate of Cost Differences for Markovian Decision Processes

Author

Listed:
  • Thomas E. Morton

    (Carnegie-Mellon University, Pittsburgh, Pennsylvania)

Abstract

The modified method of successive approximations for solving Markovian decision problems as formulated by White, Schweitzer, MacQueen, and Odoni, concentrates attention on cost differences either between successive stages in the same state, or relative to a base state in the same stage, rather than on the total cost function. The former bound the (discounted) gain of the optimal policy, while the latter relative-cost function determines the policy to be chosen at each stage. While these authors have demonstrated that these modified constructs converge to the gain and the optimal relative-cost function under rather general circumstances (undiscounted, single-chain, aperiodic processes), little is known about the rates of convergence. [Note that convergence of the relative-cost function guarantees optimality of a currently repeating policy, as noted by Howard.) A great deal of insight into this mathematically difficult question may be gained by working out the actual asymptotic convergence rates of these constructs for the special case of a single fixed policy. This is an easy exercise via Howward's methods, but very suggestive, since the policy will be asymptotically constant for a well-behaved problem. (In particular, if there is a unique optimal policy it will eventually repeat.) Convergence for both constructs for the fixed-policy case is very powerful even for discount rates greater than 1.0, depending principally on the dominant eigenvalue of the transition matrix. This note discusses the intuitive implications of this fact for the relative efficiencies of modified value iteration, policy iteration, policy iteration via successive approximations, or possible hybrids.

Suggested Citation

  • Thomas E. Morton, 1971. "Technical Note—On the Asymptotic Convergence Rate of Cost Differences for Markovian Decision Processes," Operations Research, INFORMS, vol. 19(1), pages 244-248, February.
  • Handle: RePEc:inm:oropre:v:19:y:1971:i:1:p:244-248
    DOI: 10.1287/opre.19.1.244
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/opre.19.1.244
    Download Restriction: no

    File URL: https://libkey.io/10.1287/opre.19.1.244?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Tianji Yang & Chao Fu & Xinbao Liu & Jun Pei & Lin Liu & Panos M. Pardalos, 2018. "Closed-loop supply chain inventory management with recovery information of reusable containers," Journal of Combinatorial Optimization, Springer, vol. 35(1), pages 266-292, January.
    2. Ahiska, S. Sebnem & Appaji, Samyuktha R. & King, Russell E. & Warsing, Donald P., 2013. "A Markov decision process-based policy characterization approach for a stochastic inventory control problem with unreliable sourcing," International Journal of Production Economics, Elsevier, vol. 144(2), pages 485-496.
    3. Robert L. Bray, 2019. "Markov Decision Processes with Exogenous Variables," Management Science, INFORMS, vol. 65(10), pages 4598-4606, October.
    4. Sebnem Ahiska, S. & King, Russell E., 2010. "Inventory optimization in a one product recoverable manufacturing system," International Journal of Production Economics, Elsevier, vol. 124(1), pages 11-19, March.
    5. Ahiska, S. Sebnem & King, Russell E., 2010. "Life cycle inventory policy characterizations for a single-product recoverable system," International Journal of Production Economics, Elsevier, vol. 124(1), pages 51-61, March.
    6. Srinivas Bollapragada & Thomas E. Morton, 1999. "Myopic Heuristics for the Random Yield Problem," Operations Research, INFORMS, vol. 47(5), pages 713-722, October.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:oropre:v:19:y:1971:i:1:p:244-248. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.