IDEAS home Printed from https://ideas.repec.org/a/spr/dyngam/v14y2024i4d10.1007_s13235-023-00519-6.html
   My bibliography  Save this article

HSVI Can Solve Zero-Sum Partially Observable Stochastic Games

Author

Listed:
  • Aurélien Delage

    (Universite de Lyon)

  • Olivier Buffet

    (Universite de Lorraine)

  • Jilles S. Dibangoye

    (University of Groningen)

  • Abdallah Saffidine

    (University of New South Wales)

Abstract

State-of-the-art methods for solving 2-player zero-sum imperfect information games rely on linear programming or regret minimization, though not on dynamic programming (DP) or heuristic search (HS), while the latter are often at the core of state-of-the-art solvers for other sequential decision-making problems. In partially observable or collaborative settings (e.g., POMDPs and Dec-POMDPs), DP and HS require introducing an appropriate statistic that induces a fully observable problem as well as bounding (convex) approximators of the optimal value function. This approach has succeeded in some subclasses of 2-player zero-sum partially observable stochastic games (zs-POSGs) as well, but how to apply it in the general case still remains an open question. We answer it by (i) rigorously defining an equivalent game to work with, (ii) proving mathematical properties of the optimal value function that allow deriving bounds that come with solution strategies, (iii) proposing for the first time an HSVI-like solver that provably converges to an $$\epsilon $$ ϵ -optimal solution in finite time, and (iv) empirically analyzing it. This opens the door to a novel family of promising approaches complementing those relying on linear programming or iterative methods.

Suggested Citation

  • Aurélien Delage & Olivier Buffet & Jilles S. Dibangoye & Abdallah Saffidine, 2024. "HSVI Can Solve Zero-Sum Partially Observable Stochastic Games," Dynamic Games and Applications, Springer, vol. 14(4), pages 751-805, September.
  • Handle: RePEc:spr:dyngam:v:14:y:2024:i:4:d:10.1007_s13235-023-00519-6
    DOI: 10.1007/s13235-023-00519-6
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s13235-023-00519-6
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s13235-023-00519-6?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Cole, Harold L. & Kocherlakota, Narayana, 2001. "Dynamic Games with Hidden Actions and Hidden States," Journal of Economic Theory, Elsevier, vol. 98(1), pages 114-126, May.
    2. Samid Hoda & Andrew Gilpin & Javier Peña & Tuomas Sandholm, 2010. "Smoothing Techniques for Computing Nash Equilibria of Sequential Games," Mathematics of Operations Research, INFORMS, vol. 35(2), pages 494-512, May.
    3. von Stengel, Bernhard, 1996. "Efficient Computation of Behavior Strategies," Games and Economic Behavior, Elsevier, vol. 14(2), pages 220-246, June.
    4. Daniel S. Bernstein & Robert Givan & Neil Immerman & Shlomo Zilberstein, 2002. "The Complexity of Decentralized Control of Markov Decision Processes," Mathematics of Operations Research, INFORMS, vol. 27(4), pages 819-840, November.
    5. Koller, Daphne & Megiddo, Nimrod & von Stengel, Bernhard, 1996. "Efficient Computation of Equilibria for Extensive Two-Person Games," Games and Economic Behavior, Elsevier, vol. 14(2), pages 247-259, June.
    6. M. K. Ghosh & D. McDonald & S. Sinha, 2004. "Zero-Sum Stochastic Games with Partial Information," Journal of Optimization Theory and Applications, Springer, vol. 121(1), pages 99-118, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Corine M. Laan & Ana Isabel Barros & Richard J. Boucherie & Herman Monsuur & Judith Timmer, 2019. "Solving partially observable agent‐intruder games with an application to border security problems," Naval Research Logistics (NRL), John Wiley & Sons, vol. 66(2), pages 174-190, March.
    2. Bernhard von Stengel & Antoon van den Elzen & Dolf Talman, 2002. "Computing Normal Form Perfect Equilibria for Extensive Two-Person Games," Econometrica, Econometric Society, vol. 70(2), pages 693-715, March.
    3. Yanling Chang & Alan Erera & Chelsea White, 2015. "A leader–follower partially observed, multiobjective Markov game," Annals of Operations Research, Springer, vol. 235(1), pages 103-128, December.
    4. Pahl, Lucas, 2023. "Polytope-form games and index/degree theories for extensive-form games," Games and Economic Behavior, Elsevier, vol. 141(C), pages 444-471.
    5. Srihari Govindan & Robert Wilson, 2008. "Metastable Equilibria," Mathematics of Operations Research, INFORMS, vol. 33(4), pages 787-820, November.
    6. Samid Hoda & Andrew Gilpin & Javier Peña & Tuomas Sandholm, 2010. "Smoothing Techniques for Computing Nash Equilibria of Sequential Games," Mathematics of Operations Research, INFORMS, vol. 35(2), pages 494-512, May.
    7. Govindan, Srihari & Wilson, Robert B., 2007. "Stable Outcomes of Generic Games in Extensive Form," Research Papers 1933r, Stanford University, Graduate School of Business.
    8. Etessami, Kousha, 2021. "The complexity of computing a (quasi-)perfect equilibrium for an n-player extensive form game," Games and Economic Behavior, Elsevier, vol. 125(C), pages 107-140.
    9. Bernhard von Stengel & Françoise Forges, 2008. "Extensive-Form Correlated Equilibrium: Definition and Computational Complexity," Mathematics of Operations Research, INFORMS, vol. 33(4), pages 1002-1022, November.
    10. Shimoji, Makoto & Watson, Joel, 1998. "Conditional Dominance, Rationalizability, and Game Forms," Journal of Economic Theory, Elsevier, vol. 83(2), pages 161-195, December.
    11. Conitzer, Vincent & Sandholm, Tuomas, 2008. "New complexity results about Nash equilibria," Games and Economic Behavior, Elsevier, vol. 63(2), pages 621-641, July.
    12. F. Forges & B. von Stengel, 2002. "Computionally Efficient Coordination in Games Trees," THEMA Working Papers 2002-05, THEMA (THéorie Economique, Modélisation et Applications), Université de Cergy-Pontoise.
    13. Sung, Shao-Chin & Dimitrov, Dinko, 2010. "Computational complexity in additive hedonic games," European Journal of Operational Research, Elsevier, vol. 203(3), pages 635-639, June.
    14. Mitri Kitti, 2013. "Conditional Markov equilibria in discounted dynamic games," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 78(1), pages 77-100, August.
    15. Arpad Abraham & Nicola Pavoni, 2008. "Efficient Allocations with Moral Hazard and Hidden Borrowing and Lending: A Recursive Formulation," Review of Economic Dynamics, Elsevier for the Society for Economic Dynamics, vol. 11(4), pages 781-803, October.
    16. Subhamay Saha, 2014. "Zero-Sum Stochastic Games with Partial Information and Average Payoff," Journal of Optimization Theory and Applications, Springer, vol. 160(1), pages 344-354, January.
    17. Yanling Chang & Alan Erera & Chelsea White, 2015. "Value of information for a leader–follower partially observed Markov game," Annals of Operations Research, Springer, vol. 235(1), pages 129-153, December.
    18. Fershtman, Chaim & Pakes, Ariel, 2005. "Finite State Dynamic Games with Asymmetric Information: A Framework for Applied Work," CEPR Discussion Papers 5024, C.E.P.R. Discussion Papers.
    19. Gatti, Nicola & Gilli, Mario & Marchesi, Alberto, 2020. "A characterization of quasi-perfect equilibria," Games and Economic Behavior, Elsevier, vol. 122(C), pages 240-255.
    20. Peter Godfrey-Smith & Manolo Martínez, 2013. "Communication and Common Interest," PLOS Computational Biology, Public Library of Science, vol. 9(11), pages 1-6, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:dyngam:v:14:y:2024:i:4:d:10.1007_s13235-023-00519-6. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.