IDEAS home Printed from https://ideas.repec.org/a/inm/oropre/v72y2024i4p1689-1709.html
   My bibliography  Save this article

Uncertainty Quantification and Exploration for Reinforcement Learning

Author

Listed:
  • Yi Zhu

    (Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois 60208)

  • Jing Dong

    (Division, Risk and Operations Division, Columbia Business School, New York, New York 10027)

  • Henry Lam

    (Department of Industrial Engineering and Operations Research, Columbia University, New York, New York 10027)

Abstract

We investigate statistical uncertainty quantification for reinforcement learning (RL) and its implications in exploration policy. Despite ever-growing literature on RL applications, fundamental questions about inference and error quantification, such as large-sample behaviors, appear to remain quite open. In this paper, we fill in the literature gap by studying the central limit theorem behaviors of estimated Q-values and value functions under various RL settings. In particular, we explicitly identify closed-form expressions of the asymptotic variances, which allow us to efficiently construct asymptotically valid confidence regions for key RL quantities. Furthermore, we utilize these asymptotic expressions to design an effective exploration strategy, which we call Q-value-based Optimal Computing Budget Allocation (Q-OCBA). The policy relies on maximizing the relative discrepancies among the Q-value estimates. Numerical experiments show superior performances of our exploration strategy than other benchmark policies.

Suggested Citation

  • Yi Zhu & Jing Dong & Henry Lam, 2024. "Uncertainty Quantification and Exploration for Reinforcement Learning," Operations Research, INFORMS, vol. 72(4), pages 1689-1709, July.
  • Handle: RePEc:inm:oropre:v:72:y:2024:i:4:p:1689-1709
    DOI: 10.1287/opre.2023.2436
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/opre.2023.2436
    Download Restriction: no

    File URL: https://libkey.io/10.1287/opre.2023.2436?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:oropre:v:72:y:2024:i:4:p:1689-1709. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.