IDEAS home Printed from https://ideas.repec.org/a/inm/oropre/v70y2022i4p2563-2578.html
   My bibliography  Save this article

Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization

Author

Listed:
  • Shicong Cen

    (Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213)

  • Chen Cheng

    (Department of Statistics, Stanford University, Stanford, California 94305)

  • Yuxin Chen

    (Department of Electrical and Computer Engineering, Princeton University, Princeton, New Jersey 08544)

  • Yuting Wei

    (Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104)

  • Yuejie Chi

    (Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213)

Abstract

Natural policy gradient (NPG) methods are among the most widely used policy optimization algorithms in contemporary reinforcement learning. This class of methods is often applied in conjunction with entropy regularization—an algorithmic scheme that encourages exploration—and is closely related to soft policy iteration and trust region policy optimization. Despite the empirical success, the theoretical underpinnings for NPG methods remain limited even for the tabular setting. This paper develops nonasymptotic convergence guarantees for entropy-regularized NPG methods under softmax parameterization, focusing on discounted Markov decision processes (MDPs). Assuming access to exact policy evaluation, we demonstrate that the algorithm converges linearly—even quadratically, once it enters a local region around the optimal policy—when computing optimal value functions of the regularized MDP. Moreover, the algorithm is provably stable vis-à-vis inexactness of policy evaluation. Our convergence results accommodate a wide range of learning rates and shed light upon the role of entropy regularization in enabling fast convergence.

Suggested Citation

  • Shicong Cen & Chen Cheng & Yuxin Chen & Yuting Wei & Yuejie Chi, 2022. "Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization," Operations Research, INFORMS, vol. 70(4), pages 2563-2578, July.
  • Handle: RePEc:inm:oropre:v:70:y:2022:i:4:p:2563-2578
    DOI: 10.1287/opre.2021.2151
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/opre.2021.2151
    Download Restriction: no

    File URL: https://libkey.io/10.1287/opre.2021.2151?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:oropre:v:70:y:2022:i:4:p:2563-2578. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.