IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v10y2022i6p863-d766964.html
   My bibliography  Save this article

aSGD: Stochastic Gradient Descent with Adaptive Batch Size for Every Parameter

Author

Listed:
  • Haoze Shi

    (College of Global Change and Earth System Science, Beijing Normal University, Beijing 100875, China)

  • Naisen Yang

    (Environment Research Institute, Shandong University, Qingdao 266237, China)

  • Hong Tang

    (State Key Laboratory of Remote Sensing Science, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China)

  • Xin Yang

    (College of Global Change and Earth System Science, Beijing Normal University, Beijing 100875, China)

Abstract

In recent years, deep neural networks (DNN) have been widely used in many fields. Lots of effort has been put into training due to their numerous parameters in a deep network. Some complex optimizers with many hyperparameters have been utilized to accelerate the process of network training and improve its generalization ability. It often is a trial-and-error process to tune these hyperparameters in a complex optimizer. In this paper, we analyze the different roles of training samples on a parameter update, visually, and find that a training sample contributes differently to the parameter update. Furthermore, we present a variant of the batch stochastic gradient decedent for a neural network using the ReLU as the activation function in the hidden layers, which is called adaptive stochastic gradient descent (aSGD). Different from the existing methods, it calculates the adaptive batch size for each parameter in the model and uses the mean effective gradient as the actual gradient for parameter updates. Experimental results over MNIST show that aSGD can speed up the optimization process of DNN and achieve higher accuracy without extra hyperparameters. Experimental results over synthetic datasets show that it can find redundant nodes effectively, which is helpful for model compression.

Suggested Citation

  • Haoze Shi & Naisen Yang & Hong Tang & Xin Yang, 2022. "aSGD: Stochastic Gradient Descent with Adaptive Batch Size for Every Parameter," Mathematics, MDPI, vol. 10(6), pages 1-15, March.
  • Handle: RePEc:gam:jmathe:v:10:y:2022:i:6:p:863-:d:766964
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/10/6/863/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/10/6/863/
    Download Restriction: no
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Xiaojun Zhou & Chunna Zhao & Yaqun Huang, 2023. "A Deep Learning Optimizer Based on Grünwald–Letnikov Fractional Order Definition," Mathematics, MDPI, vol. 11(2), pages 1-15, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:10:y:2022:i:6:p:863-:d:766964. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.