IDEAS home Printed from https://ideas.repec.org/a/inm/oropre/v70y2022i6p3282-3302.html
   My bibliography  Save this article

Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning

Author

Listed:
  • Nathan Kallus

    (Cornell University, New York, New York 10044)

  • Masatoshi Uehara

    (Cornell University, New York, New York 10044)

Abstract

Off-policy evaluation (OPE) in reinforcement learning is notoriously difficult in long- and infinite-horizon settings due to diminishing overlap between behavior and target policies. In this paper, we study the role of Markovian and time-invariant structure in efficient OPE. We first derive the efficiency bounds and efficient influence functions for OPE when one assumes each of these structures. This precisely characterizes the curse of horizon: in time-variant processes, OPE is only feasible in the near-on-policy setting, where behavior and target policies are sufficiently similar. But, in time-invariant Markov decision processes, our bounds show that truly off-policy evaluation is feasible, even with only just one dependent trajectory, and provide the limits of how well we could hope to do. We develop a new estimator based on double reinforcement learning (DRL) that leverages this structure for OPE. Our DRL estimator simultaneously uses estimated stationary density ratios and q -functions and remains efficient when both are estimated at slow, nonparametric rates and remains consistent when either is estimated consistently. We investigate these properties and the performance benefits of leveraging the problem structure for more efficient OPE.

Suggested Citation

  • Nathan Kallus & Masatoshi Uehara, 2022. "Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning," Operations Research, INFORMS, vol. 70(6), pages 3282-3302, November.
  • Handle: RePEc:inm:oropre:v:70:y:2022:i:6:p:3282-3302
    DOI: 10.1287/opre.2021.2249
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/opre.2021.2249
    Download Restriction: no

    File URL: https://libkey.io/10.1287/opre.2021.2249?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:oropre:v:70:y:2022:i:6:p:3282-3302. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.