IDEAS home Printed from https://ideas.repec.org/a/inm/oropre/v72y2024i6p2430-2445.html
   My bibliography  Save this article

Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games

Author

Listed:
  • Yuling Yan

    (Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142)

  • Gen Li

    (Department of Statistics, The Chinese University of Hong Kong, Hong Kong Special Administrative Region, China)

  • Yuxin Chen

    (Department of Statistics and Data Science, Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104)

  • Jianqing Fan

    (Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08544)

Abstract

This paper makes progress toward learning Nash equilibria in two-player, zero-sum Markov games from offline data. Specifically, consider a γ -discounted, infinite-horizon Markov game with S states, in which the max-player has A actions and the min-player has B actions. We propose a pessimistic model–based algorithm with Bernstein-style lower confidence bounds—called the value iteration with lower confidence bounds for zero-sum Markov games—that provably finds an ε -approximate Nash equilibrium with a sample complexity no larger than C clipped ⋆ S ( A + B ) ( 1 − γ ) 3 ε 2 (up to some log factor). Here, C clipped ⋆ is some unilateral clipped concentrability coefficient that reflects the coverage and distribution shift of the available data (vis-à-vis the target data), and the target accuracy ε can be any value within ( 0 , 1 1 − γ ] . Our sample complexity bound strengthens prior art by a factor of min { A , B } , achieving minimax optimality for a broad regime of interest. An appealing feature of our result lies in its algorithmic simplicity, which reveals the unnecessity of variance reduction and sample splitting in achieving sample optimality.

Suggested Citation

  • Yuling Yan & Gen Li & Yuxin Chen & Jianqing Fan, 2024. "Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games," Operations Research, INFORMS, vol. 72(6), pages 2430-2445, November.
  • Handle: RePEc:inm:oropre:v:72:y:2024:i:6:p:2430-2445
    DOI: 10.1287/opre.2022.0342
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/opre.2022.0342
    Download Restriction: no

    File URL: https://libkey.io/10.1287/opre.2022.0342?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:oropre:v:72:y:2024:i:6:p:2430-2445. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.