IDEAS home Printed from https://ideas.repec.org/a/inm/ormoor/v50y2025i1p633-655.html
   My bibliography  Save this article

Fast Rates for the Regret of Offline Reinforcement Learning

Author

Listed:
  • Yichun Hu

    (Cornell University, New York, New York 10044)

  • Nathan Kallus

    (Cornell University, New York, New York 10044)

  • Masatoshi Uehara

    (Cornell University, New York, New York 10044)

Abstract

We study the regret of offline reinforcement learning in an infinite-horizon discounted Markov decision process (MDP). While existing analyses of common approaches, such as fitted Q -iteration (FQI), suggest root- n convergence for regret, empirical behavior exhibits much faster convergence. In this paper, we present a finer regret analysis that exactly characterizes this phenomenon by providing fast rates for the regret convergence. First, we show that given any estimate for the optimal quality function, the regret of the policy it defines converges at a rate given by the exponentiation of the estimate’s pointwise convergence rate, thus speeding up the rate. The level of exponentiation depends on the level of noise in the decision-making problem, rather than the estimation problem. We establish such noise levels for linear and tabular MDPs as examples. Second, we provide new analyses of FQI and Bellman residual minimization to establish the correct pointwise convergence guarantees. As specific cases, our results imply one-over- n rates in linear cases and exponential-in- n rates in tabular cases. We extend our findings to general function approximation by extending our results to regret guarantees based on L p -convergence rates for estimating the optimal quality function rather than pointwise rates, where L 2 guarantees for nonparametric estimation can be ensured under mild conditions.

Suggested Citation

  • Yichun Hu & Nathan Kallus & Masatoshi Uehara, 2025. "Fast Rates for the Regret of Offline Reinforcement Learning," Mathematics of Operations Research, INFORMS, vol. 50(1), pages 633-655, February.
  • Handle: RePEc:inm:ormoor:v:50:y:2025:i:1:p:633-655
    DOI: 10.1287/moor.2021.0167
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/moor.2021.0167
    Download Restriction: no

    File URL: https://libkey.io/10.1287/moor.2021.0167?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormoor:v:50:y:2025:i:1:p:633-655. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.