IDEAS home Printed from https://ideas.repec.org/a/spr/joptap/v203y2024i3d10.1007_s10957-024-02513-3.html
   My bibliography  Save this article

Gradient Descent Provably Escapes Saddle Points in the Training of Shallow ReLU Networks

Author

Listed:
  • Patrick Cheridito

    (ETH Zurich)

  • Arnulf Jentzen

    (The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen)
    University of Münster)

  • Florian Rossmannek

    (ETH Zurich
    Nanyang Technological University)

Abstract

Dynamical systems theory has recently been applied in optimization to prove that gradient descent algorithms bypass so-called strict saddle points of the loss function. However, in many modern machine learning applications, the required regularity conditions are not satisfied. In this paper, we prove a variant of the relevant dynamical systems result, a center-stable manifold theorem, in which we relax some of the regularity requirements. We explore its relevance for various machine learning tasks, with a particular focus on shallow rectified linear unit (ReLU) and leaky ReLU networks with scalar input. Building on a detailed examination of critical points of the square integral loss function for shallow ReLU and leaky ReLU networks relative to an affine target function, we show that gradient descent circumvents most saddle points. Furthermore, we prove convergence to global minima under favourable initialization conditions, quantified by an explicit threshold on the limiting loss.

Suggested Citation

  • Patrick Cheridito & Arnulf Jentzen & Florian Rossmannek, 2024. "Gradient Descent Provably Escapes Saddle Points in the Training of Shallow ReLU Networks," Journal of Optimization Theory and Applications, Springer, vol. 203(3), pages 2617-2648, December.
  • Handle: RePEc:spr:joptap:v:203:y:2024:i:3:d:10.1007_s10957-024-02513-3
    DOI: 10.1007/s10957-024-02513-3
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10957-024-02513-3
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10957-024-02513-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Pierre Frankel & Guillaume Garrigos & Juan Peypouquet, 2015. "Splitting Methods with Variable Metric for Kurdyka–Łojasiewicz Functions and General Convergence Rates," Journal of Optimization Theory and Applications, Springer, vol. 165(3), pages 874-900, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. J. X. Cruz Neto & P. R. Oliveira & A. Soubeyran & J. C. O. Souza, 2020. "A generalized proximal linearized algorithm for DC functions with application to the optimal size of the firm problem," Annals of Operations Research, Springer, vol. 289(2), pages 313-339, June.
    2. Masoud Ahookhosh & Le Thi Khanh Hien & Nicolas Gillis & Panagiotis Patrinos, 2021. "A Block Inertial Bregman Proximal Algorithm for Nonsmooth Nonconvex Problems with Application to Symmetric Nonnegative Matrix Tri-Factorization," Journal of Optimization Theory and Applications, Springer, vol. 190(1), pages 234-258, July.
    3. Franck Iutzeler & Jérôme Malick, 2018. "On the Proximal Gradient Algorithm with Alternated Inertia," Journal of Optimization Theory and Applications, Springer, vol. 176(3), pages 688-710, March.
    4. Yaohua Hu & Chong Li & Kaiwen Meng & Xiaoqi Yang, 2021. "Linear convergence of inexact descent method and inexact proximal gradient algorithms for lower-order regularization problems," Journal of Global Optimization, Springer, vol. 79(4), pages 853-883, April.
    5. Radu Ioan Boţ & Ernö Robert Csetnek & Szilárd Csaba László, 2016. "An inertial forward–backward algorithm for the minimization of the sum of two nonconvex functions," EURO Journal on Computational Optimization, Springer;EURO - The Association of European Operational Research Societies, vol. 4(1), pages 3-25, February.
    6. Thomas Kerdreux & Alexandre d’Aspremont & Sebastian Pokutta, 2022. "Restarting Frank–Wolfe: Faster Rates under Hölderian Error Bounds," Journal of Optimization Theory and Applications, Springer, vol. 192(3), pages 799-829, March.
    7. Masaru Ito & Bruno F. Lourenço, 2024. "Eigenvalue programming beyond matrices," Computational Optimization and Applications, Springer, vol. 89(2), pages 361-384, November.
    8. Maryam Yashtini, 2021. "Multi-block Nonconvex Nonsmooth Proximal ADMM: Convergence and Rates Under Kurdyka–Łojasiewicz Property," Journal of Optimization Theory and Applications, Springer, vol. 190(3), pages 966-998, September.
    9. Hao Wang & Hao Zeng & Jiashan Wang, 2022. "An extrapolated iteratively reweighted $$\ell _1$$ ℓ 1 method with complexity analysis," Computational Optimization and Applications, Springer, vol. 83(3), pages 967-997, December.
    10. Maryam Yashtini, 2022. "Convergence and rate analysis of a proximal linearized ADMM for nonconvex nonsmooth optimization," Journal of Global Optimization, Springer, vol. 84(4), pages 913-939, December.
    11. Silvia Bonettini & Peter Ochs & Marco Prato & Simone Rebegoldi, 2023. "An abstract convergence framework with application to inertial inexact forward–backward methods," Computational Optimization and Applications, Springer, vol. 84(2), pages 319-362, March.
    12. Emilie Chouzenoux & Jean-Christophe Pesquet & Audrey Repetti, 2016. "A block coordinate variable metric forward–backward algorithm," Journal of Global Optimization, Springer, vol. 66(3), pages 457-485, November.
    13. Radu Ioan Bot & Dang-Khoa Nguyen, 2020. "The Proximal Alternating Direction Method of Multipliers in the Nonconvex Setting: Convergence Analysis and Rates," Mathematics of Operations Research, INFORMS, vol. 45(2), pages 682-712, May.
    14. S. Bonettini & M. Prato & S. Rebegoldi, 2018. "A block coordinate variable metric linesearch based proximal gradient method," Computational Optimization and Applications, Springer, vol. 71(1), pages 5-52, September.
    15. Lei Yang, 2024. "Proximal Gradient Method with Extrapolation and Line Search for a Class of Non-convex and Non-smooth Problems," Journal of Optimization Theory and Applications, Springer, vol. 200(1), pages 68-103, January.
    16. Szilárd Csaba László, 2023. "A Forward–Backward Algorithm With Different Inertial Terms for Structured Non-Convex Minimization Problems," Journal of Optimization Theory and Applications, Springer, vol. 198(1), pages 387-427, July.
    17. Zehui Jia & Xue Gao & Xingju Cai & Deren Han, 2021. "Local Linear Convergence of the Alternating Direction Method of Multipliers for Nonconvex Separable Optimization Problems," Journal of Optimization Theory and Applications, Springer, vol. 188(1), pages 1-25, January.
    18. Bonettini, S. & Prato, M. & Rebegoldi, S., 2021. "New convergence results for the inexact variable metric forward–backward method," Applied Mathematics and Computation, Elsevier, vol. 392(C).
    19. Daoli Zhu & Sien Deng & Minghua Li & Lei Zhao, 2021. "Level-Set Subdifferential Error Bounds and Linear Convergence of Bregman Proximal Gradient Method," Journal of Optimization Theory and Applications, Springer, vol. 189(3), pages 889-918, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:joptap:v:203:y:2024:i:3:d:10.1007_s10957-024-02513-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.