IDEAS home Printed from https://ideas.repec.org/a/inm/oropre/v37y1989i5p780-790.html
   My bibliography  Save this article

Markov Decision Processes with Sample Path Constraints: The Communicating Case

Author

Listed:
  • Keith W. Ross

    (University of Pennsylvania, Philadelphia, Pennsylvania)

  • Ravi Varadarajan

    (University of Florida, Gainesville, Florida)

Abstract

We consider time-average Markov Decision Processes (MDPs), which accumulate a reward and cost at each decision epoch. A policy meets the sample-path constraint if the time-average cost is below a specified value with probability one. The optimization problem is to maximize the expected average reward over all policies that meet the sample-path constraint. The sample-path constraint is compared with the more commonly studied constraint of requiring the average expected cost to be less than a specified value. Although the two criteria are equivalent for certain classes of MDPs, their feasible and optimal policies differ for many nontrivial problems. In general, there does not exist optimal or nearly optimal stationary policies when the expected average-cost constraint is employed. Assuming that a policy exists that meets the sample-path constraint, we establish that there exist nearly optimal stationary policies for communicating MDPs. A parametric linear programming algorithm is given to construct nearly optimal stationary policies. The discussion relies on well known results from the theory of stochastic processes and linear programming. The techniques lead to simple proofs of the existence of optimal and nearly optimal stationary policies for unichain and deterministic MDPs, respectively.

Suggested Citation

  • Keith W. Ross & Ravi Varadarajan, 1989. "Markov Decision Processes with Sample Path Constraints: The Communicating Case," Operations Research, INFORMS, vol. 37(5), pages 780-790, October.
  • Handle: RePEc:inm:oropre:v:37:y:1989:i:5:p:780-790
    DOI: 10.1287/opre.37.5.780
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/opre.37.5.780
    Download Restriction: no

    File URL: https://libkey.io/10.1287/opre.37.5.780?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Q. Zhu, 2007. "Sample-path optimality and variance-maximization for Markov decision processes," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 65(3), pages 519-538, June.
    2. Yasemin Serin & Zeynep Muge Avsar, 1997. "Markov decision processes with restricted observations: Finite horizon case," Naval Research Logistics (NRL), John Wiley & Sons, vol. 44(5), pages 439-456, August.
    3. Golan, Michal & Shimkin, Nahum, 2024. "Markov decision processes with burstiness constraints," European Journal of Operational Research, Elsevier, vol. 312(3), pages 877-889.
    4. Ohlmann, Jeffrey W. & Bean, James C., 2009. "Resource-constrained management of heterogeneous assets with stochastic deterioration," European Journal of Operational Research, Elsevier, vol. 199(1), pages 198-208, November.
    5. Dmitry Krass & O. J. Vrieze, 2002. "Achieving Target State-Action Frequencies in Multichain Average-Reward Markov Decision Processes," Mathematics of Operations Research, INFORMS, vol. 27(3), pages 545-566, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:oropre:v:37:y:1989:i:5:p:780-790. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.