HSVI Can Solve Zero-Sum Partially Observable Stochastic Games

My bibliography Save this article

HSVI Can Solve Zero-Sum Partially Observable Stochastic Games

Author

Listed:

Aurélien Delage
(Universite de Lyon)
Olivier Buffet
(Universite de Lorraine)
Jilles S. Dibangoye
(University of Groningen)
Abdallah Saffidine
(University of New South Wales)

Registered:

Abstract

State-of-the-art methods for solving 2-player zero-sum imperfect information games rely on linear programming or regret minimization, though not on dynamic programming (DP) or heuristic search (HS), while the latter are often at the core of state-of-the-art solvers for other sequential decision-making problems. In partially observable or collaborative settings (e.g., POMDPs and Dec-POMDPs), DP and HS require introducing an appropriate statistic that induces a fully observable problem as well as bounding (convex) approximators of the optimal value function. This approach has succeeded in some subclasses of 2-player zero-sum partially observable stochastic games (zs-POSGs) as well, but how to apply it in the general case still remains an open question. We answer it by (i) rigorously defining an equivalent game to work with, (ii) proving mathematical properties of the optimal value function that allow deriving bounds that come with solution strategies, (iii) proposing for the first time an HSVI-like solver that provably converges to an $$\epsilon $$ ϵ -optimal solution in finite time, and (iv) empirically analyzing it. This opens the door to a novel family of promising approaches complementing those relying on linear programming or iterative methods.

Suggested Citation

Aurélien Delage & Olivier Buffet & Jilles S. Dibangoye & Abdallah Saffidine, 2024. "HSVI Can Solve Zero-Sum Partially Observable Stochastic Games," Dynamic Games and Applications, Springer, vol. 14(4), pages 751-805, September.

Handle: RePEc:spr:dyngam:v:14:y:2024:i:4:d:10.1007_s13235-023-00519-6
DOI: 10.1007/s13235-023-00519-6

Download full text from publisher

As the access to this document is restricted, you may want to search for a different version of it.

References listed on IDEAS

Cole, Harold L. & Kocherlakota, Narayana, 2001. "Dynamic Games with Hidden Actions and Hidden States," Journal of Economic Theory, Elsevier, vol. 98(1), pages 114-126, May.
- Harold L. Cole & Narayana R. Kocherlakota, 1997. "Dynamic games with hidden actions and hidden states," Working Papers 583, Federal Reserve Bank of Minneapolis.
- Harold L. Cole & Narayana R. Kocherlakota, 1998. "Dynamic games with hidden actions and hidden states," Staff Report 254, Federal Reserve Bank of Minneapolis.
von Stengel, Bernhard, 1996. "Efficient Computation of Behavior Strategies," Games and Economic Behavior, Elsevier, vol. 14(2), pages 220-246, June.
Daniel S. Bernstein & Robert Givan & Neil Immerman & Shlomo Zilberstein, 2002. "The Complexity of Decentralized Control of Markov Decision Processes," Mathematics of Operations Research, INFORMS, vol. 27(4), pages 819-840, November.
Koller, Daphne & Megiddo, Nimrod & von Stengel, Bernhard, 1996. "Efficient Computation of Equilibria for Extensive Two-Person Games," Games and Economic Behavior, Elsevier, vol. 14(2), pages 247-259, June.
Samid Hoda & Andrew Gilpin & Javier Peña & Tuomas Sandholm, 2010. "Smoothing Techniques for Computing Nash Equilibria of Sequential Games," Mathematics of Operations Research, INFORMS, vol. 35(2), pages 494-512, May.
M. K. Ghosh & D. McDonald & S. Sinha, 2004. "Zero-Sum Stochastic Games with Partial Information," Journal of Optimization Theory and Applications, Springer, vol. 121(1), pages 99-118, April.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Corine M. Laan & Ana Isabel Barros & Richard J. Boucherie & Herman Monsuur & Judith Timmer, 2019. "Solving partially observable agent‐intruder games with an application to border security problems," Naval Research Logistics (NRL), John Wiley & Sons, vol. 66(2), pages 174-190, March.
Bernhard von Stengel & Antoon van den Elzen & Dolf Talman, 2002. "Computing Normal Form Perfect Equilibria for Extensive Two-Person Games," Econometrica, Econometric Society, vol. 70(2), pages 693-715, March.
- von Stengel, B. & van den Elzen, A.H. & Talman, A.J.J., 1997. "Computing normal form perfect equilibria for extensive two-person games," Other publications TiSEM 4487e2bf-5bc1-47d3-819f-2, Tilburg University, School of Economics and Management.
- von Stengel, B. & van den Elzen, A.H. & Talman, A.J.J., 2002. "Computing normal form perfect equilibria for extensive two-person games," Other publications TiSEM 9f112346-b587-47f3-ad2e-6, Tilburg University, School of Economics and Management.
- von Stengel, B. & van den Elzen, A.H. & Talman, A.J.J., 1997. "Computing normal form perfect equilibria for extensive two-person games," Research Memorandum 752, Tilburg University, School of Economics and Management.
Pahl, Lucas, 2023. "Polytope-form games and index/degree theories for extensive-form games," Games and Economic Behavior, Elsevier, vol. 141(C), pages 444-471.
Srihari Govindan & Robert Wilson, 2008. "Metastable Equilibria," Mathematics of Operations Research, INFORMS, vol. 33(4), pages 787-820, November.
- Srihari Govindan & Robert Wilson, 2006. "Metastable Equilibria," Levine's Bibliography 122247000000001211, UCLA Department of Economics.
- Govindan, Srihari & Wilson, Robert B., 2007. "Metastable Equilibria," Research Papers 1934r, Stanford University, Graduate School of Business.
Samid Hoda & Andrew Gilpin & Javier Peña & Tuomas Sandholm, 2010. "Smoothing Techniques for Computing Nash Equilibria of Sequential Games," Mathematics of Operations Research, INFORMS, vol. 35(2), pages 494-512, May.
Etessami, Kousha, 2021. "The complexity of computing a (quasi-)perfect equilibrium for an n-player extensive form game," Games and Economic Behavior, Elsevier, vol. 125(C), pages 107-140.
Bernhard von Stengel & Françoise Forges, 2008. "Extensive-Form Correlated Equilibrium: Definition and Computational Complexity," Mathematics of Operations Research, INFORMS, vol. 33(4), pages 1002-1022, November.
- Francoise Forges & Bernhard von Stengel, 2008. "Extensive form correlated equilibrium: definition and computational complexity," Post-Print hal-00360729, HAL.
F. Forges & B. von Stengel, 2002. "Computionally Efficient Coordination in Games Trees," THEMA Working Papers 2002-05, THEMA (THéorie Economique, Modélisation et Applications), Université de Cergy-Pontoise.
Yanling Chang & Alan Erera & Chelsea White, 2015. "A leader–follower partially observed, multiobjective Markov game," Annals of Operations Research, Springer, vol. 235(1), pages 103-128, December.
Govindan, Srihari & Wilson, Robert B., 2007. "Stable Outcomes of Generic Games in Extensive Form," Research Papers 1933r, Stanford University, Graduate School of Business.
Shimoji, Makoto & Watson, Joel, 1998. "Conditional Dominance, Rationalizability, and Game Forms," Journal of Economic Theory, Elsevier, vol. 83(2), pages 161-195, December.
Conitzer, Vincent & Sandholm, Tuomas, 2008. "New complexity results about Nash equilibria," Games and Economic Behavior, Elsevier, vol. 63(2), pages 621-641, July.
Rosenbaum, Janet, 2002. "The Computational Complexity of Nash Equilibria," SocArXiv h63mz_v1, Center for Open Science.
Sung, Shao-Chin & Dimitrov, Dinko, 2010. "Computational complexity in additive hedonic games," European Journal of Operational Research, Elsevier, vol. 203(3), pages 635-639, June.
- Sung, Shao-Chin & Dimitrov, Dinko, 2008. "Computational Complexity in Additive Hedonic Games," Discussion Papers in Economics 6430, University of Munich, Department of Economics.
- Dinko Dimitrov & Shao-Chin Sung, 2008. "Computational Complexity in Additive Hedonic Games," Working Papers 2008.98, Fondazione Eni Enrico Mattei.
- Sung, Shao Chin & Dimitrov, Dinko, 2008. "Computational Complexity in Additive Hedonic Games," Coalition Theory Network Working Papers 46655, Fondazione Eni Enrico Mattei (FEEM).
Yanling Chang & Alan Erera & Chelsea White, 2015. "Value of information for a leader–follower partially observed Markov game," Annals of Operations Research, Springer, vol. 235(1), pages 129-153, December.
Fan Wu & Wei Bian, 2023. "Smoothing Accelerated Proximal Gradient Method with Fast Convergence Rate for Nonsmooth Convex Optimization Beyond Differentiability," Journal of Optimization Theory and Applications, Springer, vol. 197(2), pages 539-572, May.
Kimmo Berg, 2016. "Elementary Subpaths in Discounted Stochastic Games," Dynamic Games and Applications, Springer, vol. 6(3), pages 304-323, September.
Johannes Hörner & Satoru Takahashi & Nicolas Vieille, 2015. "Truthful Equilibria in Dynamic Bayesian Games," Econometrica, Econometric Society, vol. 83(5), pages 1795-1848, September.
- Johannes Horner & Satoru Takahashi & Nicolas Vieille, 2013. "Truthful Equilibria in Dynamic Bayesian Games," Cowles Foundation Discussion Papers 1933R, Cowles Foundation for Research in Economics, Yale University, revised Jan 2015.
- Johannes Horner & Satoru Takahashi & Nicolas Vieille, 2014. "Truthful Equilibria in Dynamic Bayesian Games," Levine's Working Paper Archive 786969000000000881, David K. Levine.
- Johannes Horner & Satoru Takahashi & Nicolas Vieille, 2013. "Truthful Equilibria in Dynamic Bayesian Games," Cowles Foundation Discussion Papers 1933, Cowles Foundation for Research in Economics, Yale University.
Susan Athey & Kyle Bagwell, 2008. "Collusion With Persistent Cost Shocks," Econometrica, Econometric Society, vol. 76(3), pages 493-540, May.
- Susan Athey & Kyle Bagwell, 2004. "Collusion with Persistent Cost Shocks," Levine's Bibliography 122247000000000334, UCLA Department of Economics.
- Susan Athey & Kyle Bagwell, 2007. "Collusion with Persistent Cost Shocks," Levine's Bibliography 321307000000000898, UCLA Department of Economics.
Hao Zhang & Stefanos Zenios, 2008. "A Dynamic Principal-Agent Model with Hidden Information: Sequential Optimality Through Truthful State Revelation," Operations Research, INFORMS, vol. 56(3), pages 681-696, June.

More about this item

Keywords

Game theory; zs-POSGs; Multi-agent systems; Heuristic search; Dynamic programming;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:dyngam:v:14:y:2024:i:4:d:10.1007_s13235-023-00519-6. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

HSVI Can Solve Zero-Sum Partially Observable Stochastic Games

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data