IDEAS home Printed from https://ideas.repec.org/p/ehl/lserod/122740.html
   My bibliography  Save this paper

Robust offline reinforcement learning with heavy-tailed rewards

Author

Listed:
  • Zhu, Jin
  • Wan, Runzhe
  • Qi, Zhengling
  • Luo, Shikai
  • Shi, Chengchun

Abstract

This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications. We propose two algorithmic frameworks, ROAM and ROOM, for robust off-policy evaluation and offline policy optimization (OPO), respectively. Central to our frameworks is the strategic incorporation of the median-of-means method with offline RL, enabling straightforward uncertainty estimation for the value function estimator. This not only adheres to the principle of pessimism in OPO but also adeptly manages heavytailed rewards. Theoretical results and extensive experiments demonstrate that our two frameworks outperform existing methods on the logged dataset exhibits heavytailed reward distributions. The implementation of the proposal is available at https://github.com/Mamba413/ROOM.

Suggested Citation

  • Zhu, Jin & Wan, Runzhe & Qi, Zhengling & Luo, Shikai & Shi, Chengchun, 2024. "Robust offline reinforcement learning with heavy-tailed rewards," LSE Research Online Documents on Economics 122740, London School of Economics and Political Science, LSE Library.
  • Handle: RePEc:ehl:lserod:122740
    as

    Download full text from publisher

    File URL: http://eprints.lse.ac.uk/122740/
    File Function: Open access version.
    Download Restriction: no
    ---><---

    More about this item

    Keywords

    Rights Retention;

    JEL classification:

    • C1 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ehl:lserod:122740. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: LSERO Manager (email available below). General contact details of provider: https://edirc.repec.org/data/lsepsuk.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.