Partially Observable Markov Decision Processes: A Geometric Technique and Analysis

My bibliography Save this article

Partially Observable Markov Decision Processes: A Geometric Technique and Analysis

Author

Listed:

Hao Zhang
(Marshall School of Business, University of Southern California, Los Angeles, California 90089)

Registered:

Abstract

This paper presents a novel framework for studying partially observable Markov decision processes (POMDPs) with finite state, action, observation sets, and discounted rewards. The new framework is solely based on future-reward vectors associated with future policies, which is more parsimonious than the traditional framework based on belief vectors. It reveals the connection between the POMDP problem and two computational geometry problems, i.e., finding the vertices of a convex hull and finding the Minkowski sum of convex polytopes, which can help solve the POMDP problem more efficiently. The new framework can clarify some existing algorithms over both finite and infinite horizons and shed new light on them. It also facilitates the comparison of POMDPs with respect to their degree of observability, as a useful structural result.

Suggested Citation

Hao Zhang, 2010. "Partially Observable Markov Decision Processes: A Geometric Technique and Analysis," Operations Research, INFORMS, vol. 58(1), pages 214-228, February.

Handle: RePEc:inm:oropre:v:58:y:2010:i:1:p:214-228
DOI: 10.1287/opre.1090.0697

Download full text from publisher

References listed on IDEAS

William S. Lovejoy, 1991. "Computationally Feasible Bounds for Partially Observed Markov Decision Processes," Operations Research, INFORMS, vol. 39(1), pages 162-175, February.
Christos H. Papadimitriou & John N. Tsitsiklis, 1987. "The Complexity of Markov Decision Processes," Mathematics of Operations Research, INFORMS, vol. 12(3), pages 441-450, August.
Richard D. Smallwood & Edward J. Sondik, 1973. "The Optimal Control of Partially Observable Markov Processes over a Finite Horizon," Operations Research, INFORMS, vol. 21(5), pages 1071-1088, October.
James T. Treharne & Charles R. Sox, 2002. "Adaptive Inventory Control for Nonstationary Demand and Partial Information," Management Science, INFORMS, vol. 48(5), pages 607-624, May.
George E. Monahan, 1982. "State of the Art---A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms," Management Science, INFORMS, vol. 28(1), pages 1-16, January.
William S. Lovejoy, 1987. "Some Monotonicity Results for Partially Observed Markov Decision Processes," Operations Research, INFORMS, vol. 35(5), pages 736-743, October.
Edward J. Sondik, 1978. "The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs," Operations Research, INFORMS, vol. 26(2), pages 282-304, April.
Daniel E. Lane, 1989. "A Partially Observable Model of Decision Making by Fishermen," Operations Research, INFORMS, vol. 37(2), pages 240-254, April.
White, Chelsea C., 1980. "Monotone control laws for noisy, countable-state Markov chains," European Journal of Operational Research, Elsevier, vol. 5(2), pages 124-132, August.
J. K. Satia & R. E. Lave, 1973. "Markovian Decision Processes with Probabilistic Observation of States," Management Science, INFORMS, vol. 20(1), pages 1-13, September.
Chelsea C. White & William T. Scherer, 1994. "Finite-Memory Suboptimal Design for Partially Observed Markov Decision Processes," Operations Research, INFORMS, vol. 42(3), pages 439-455, June.
Sulganik, Eyal, 1995. "On the structure of Blackwell's equivalence classes of information systems," Mathematical Social Sciences, Elsevier, vol. 29(3), pages 213-223, June.
White, Chelsea C. & White, Douglas J., 1989. "Markov decision processes," European Journal of Operational Research, Elsevier, vol. 39(1), pages 1-16, March.
Shoshana Anily & Abraham Grosfeld-Nir, 2006. "An Optimal Lot-Sizing and Offline Inspection Policy in the Case of Nonrigid Demand," Operations Research, INFORMS, vol. 54(2), pages 311-323, April.
Chelsea C. White & William T. Scherer, 1989. "Solution Procedures for Partially Observed Markov Decision Processes," Operations Research, INFORMS, vol. 37(5), pages 791-797, October.
James E. Eckles, 1968. "Optimum Maintenance with Incomplete Information," Operations Research, INFORMS, vol. 16(5), pages 1058-1067, October.
Sheldon M. Ross, 1971. "Quality Control under Markovian Deterioration," Management Science, INFORMS, vol. 17(9), pages 587-596, May.
Chelsea C. White, 1977. "A Markov Quality Control Process Subject to Partial Observation," Management Science, INFORMS, vol. 23(8), pages 843-852, April.
Abraham Grosfeld-Nir, 1996. "A Two-State Partially Observable Markov Decision Process with Uniformly Distributed Observations," Operations Research, INFORMS, vol. 44(3), pages 458-463, June.
Grosfeld-Nir, Abraham, 2007. "Control limits for two-state partially observable Markov decision processes," European Journal of Operational Research, Elsevier, vol. 182(1), pages 300-304, October.
Huizhen Yu & Dimitri P. Bertsekas, 2008. "On Near Optimality of the Set of Finite-State Controllers for Average Cost POMDP," Mathematics of Operations Research, INFORMS, vol. 33(1), pages 1-11, February.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Tianhu Deng & Zuo-Jun Max Shen & J. George Shanthikumar, 2014. "Statistical Learning of Service-Dependent Demand in a Multiperiod Newsvendor Setting," Operations Research, INFORMS, vol. 62(5), pages 1064-1076, October.
Hongmin Li & Hao Zhang & Charles H. Fine, 2013. "Dynamic Business Share Allocation in a Supply Chain with Competing Suppliers," Operations Research, INFORMS, vol. 61(2), pages 280-297, April.
Hao Zhang, 2012. "Solving an Infinite Horizon Adverse Selection Model Through Finite Policy Graphs," Operations Research, INFORMS, vol. 60(4), pages 850-864, August.
Hao Zhang, 2022. "Analytical Solution to a Discrete-Time Model for Dynamic Learning and Decision Making," Management Science, INFORMS, vol. 68(8), pages 5924-5957, August.
Yanling Chang & Alan Erera & Chelsea White, 2015. "A leader–follower partially observed, multiobjective Markov game," Annals of Operations Research, Springer, vol. 235(1), pages 103-128, December.
Yanling Chang & Alan Erera & Chelsea White, 2015. "Value of information for a leader–follower partially observed Markov game," Annals of Operations Research, Springer, vol. 235(1), pages 129-153, December.
Bren, Austin & Saghafian, Soroush, 2018. "Data-Driven Percentile Optimization for Multi-Class Queueing Systems with Model Ambiguity: Theory and Application," Working Paper Series rwp18-008, Harvard University, John F. Kennedy School of Government.
Chiel van Oosterom & Lisa M. Maillart & Jeffrey P. Kharoufeh, 2017. "Optimal maintenance policies for a safety‐critical system and its deteriorating sensor," Naval Research Logistics (NRL), John Wiley & Sons, vol. 64(5), pages 399-417, August.
Hao Zhang & Weihua Zhang, 2023. "Analytical Solution to a Partially Observable Machine Maintenance Problem with Obvious Failures," Management Science, INFORMS, vol. 69(7), pages 3993-4015, July.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Chernonog, Tatyana & Avinadav, Tal, 2016. "A two-state partially observable Markov decision process with three actionsAuthor-Name: Ben-Zvi, Tal," European Journal of Operational Research, Elsevier, vol. 254(3), pages 957-967.
Hao Zhang & Weihua Zhang, 2023. "Analytical Solution to a Partially Observable Machine Maintenance Problem with Obvious Failures," Management Science, INFORMS, vol. 69(7), pages 3993-4015, July.
Abraham Grosfeld‐Nir & Eyal Cohen & Yigal Gerchak, 2007. "Production to order and off‐line inspection when the production process is partially observable," Naval Research Logistics (NRL), John Wiley & Sons, vol. 54(8), pages 845-858, December.
Yanling Chang & Alan Erera & Chelsea White, 2015. "Value of information for a leader–follower partially observed Markov game," Annals of Operations Research, Springer, vol. 235(1), pages 129-153, December.
Yanling Chang & Alan Erera & Chelsea White, 2015. "A leader–follower partially observed, multiobjective Markov game," Annals of Operations Research, Springer, vol. 235(1), pages 103-128, December.
Gong, Linguo & Tang, Kwei, 1997. "Monitoring machine operations using on-line sensors," European Journal of Operational Research, Elsevier, vol. 96(3), pages 479-492, February.
Yossi Aviv & Amit Pazgal, 2005. "A Partially Observed Markov Decision Process for Dynamic Pricing," Management Science, INFORMS, vol. 51(9), pages 1400-1416, September.
Shoshana Anily & Abraham Grosfeld-Nir, 2006. "An Optimal Lot-Sizing and Offline Inspection Policy in the Case of Nonrigid Demand," Operations Research, INFORMS, vol. 54(2), pages 311-323, April.
James T. Treharne & Charles R. Sox, 2002. "Adaptive Inventory Control for Nonstationary Demand and Partial Information," Management Science, INFORMS, vol. 48(5), pages 607-624, May.
Chiel van Oosterom & Lisa M. Maillart & Jeffrey P. Kharoufeh, 2017. "Optimal maintenance policies for a safety‐critical system and its deteriorating sensor," Naval Research Logistics (NRL), John Wiley & Sons, vol. 64(5), pages 399-417, August.
Saghafian, Soroush, 2018. "Ambiguous partially observable Markov decision processes: Structural results and applications," Journal of Economic Theory, Elsevier, vol. 178(C), pages 1-35.
Williams, Byron K., 2011. "Resolving structural uncertainty in natural resources management using POMDP approaches," Ecological Modelling, Elsevier, vol. 222(5), pages 1092-1102.
Givon, Moshe & Grosfeld-Nir, Abraham, 2008. "Using partially observed Markov processes to select optimal termination time of TV shows," Omega, Elsevier, vol. 36(3), pages 477-485, June.
V. Makis & X. Jiang, 2003. "Optimal Replacement Under Partial Observations," Mathematics of Operations Research, INFORMS, vol. 28(2), pages 382-394, May.
Jue Wang & Chi-Guhn Lee, 2015. "Multistate Bayesian Control Chart Over a Finite Horizon," Operations Research, INFORMS, vol. 63(4), pages 949-964, August.
Abhijit Gosavi, 2009. "Reinforcement Learning: A Tutorial Survey and Recent Advances," INFORMS Journal on Computing, INFORMS, vol. 21(2), pages 178-192, May.
Williams, Byron K., 2009. "Markov decision processes in natural resources management: Observability and uncertainty," Ecological Modelling, Elsevier, vol. 220(6), pages 830-840.
Grosfeld-Nir, Abraham, 2007. "Control limits for two-state partially observable Markov decision processes," European Journal of Operational Research, Elsevier, vol. 182(1), pages 300-304, October.
Jue Wang, 2016. "Minimizing the false alarm rate in systems with transient abnormality," Naval Research Logistics (NRL), John Wiley & Sons, vol. 63(4), pages 320-334, June.
Serin, Yasemin, 1995. "A nonlinear programming model for partially observable Markov decision processes: Finite horizon case," European Journal of Operational Research, Elsevier, vol. 86(3), pages 549-564, November.

More about this item

Keywords

dynamic programming; Markov; analysis of algorithms; computational complexity; mathematics; combinatorics; computers/computer science; artificial intelligence;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:oropre:v:58:y:2010:i:1:p:214-228. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Partially Observable Markov Decision Processes: A Geometric Technique and Analysis

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data