The effectiveness of stochastic gradient methods strongly depends on a suitable selection of the hyperparameters which define them. Particularly, in the context of large-scale optimization problems often arising in machine learning applications, to properly fix both the learning rate and the mini-batch size in the definition of the stochastic directions is crucial for obtaining fast and efficient learning procedures. In a recent paper [1], the authors propose to define these hyperparameters by combining an adaptive subsampling strategy and a line search scheme. The aim of this work is to adapt this idea to both the stochastic gradient algorithm with momentum and the AdaM method in order to exploit the good numerical behaviour of the momentum-like stochastic gradient methods and the automatic technique to select the hyperparameters discussed in [1]. An extensive numerical experimentation carried out on convex functions, with different data sets, highlights that such combined hyperparameters technique makes the tuning of the hyperparameters computationally less expensive than the selection of suitable constant learning rate and mini-batch size and this is significant from the perspective of GreenAI. Furthermore, the proposed versions of the stochastic gradient method with momentum and AdaM have promising convergence behaviour compared to the original counterparts.

Automatic Setting of Learning Rate and Mini-batch Size in Momentum and AdaM Stochastic Gradient Methods / Franchini, G.; Porta, F.. - 3094:1(2024). (Intervento presentato al convegno International Conference of Numerical Analysis and Applied Mathematics 2022, ICNAAM 2022 tenutosi a Heraklion, GREECE nel 2022) [10.1063/5.0210874].

Automatic Setting of Learning Rate and Mini-batch Size in Momentum and AdaM Stochastic Gradient Methods

Franchini G.;Porta F.
2024

Abstract

The effectiveness of stochastic gradient methods strongly depends on a suitable selection of the hyperparameters which define them. Particularly, in the context of large-scale optimization problems often arising in machine learning applications, to properly fix both the learning rate and the mini-batch size in the definition of the stochastic directions is crucial for obtaining fast and efficient learning procedures. In a recent paper [1], the authors propose to define these hyperparameters by combining an adaptive subsampling strategy and a line search scheme. The aim of this work is to adapt this idea to both the stochastic gradient algorithm with momentum and the AdaM method in order to exploit the good numerical behaviour of the momentum-like stochastic gradient methods and the automatic technique to select the hyperparameters discussed in [1]. An extensive numerical experimentation carried out on convex functions, with different data sets, highlights that such combined hyperparameters technique makes the tuning of the hyperparameters computationally less expensive than the selection of suitable constant learning rate and mini-batch size and this is significant from the perspective of GreenAI. Furthermore, the proposed versions of the stochastic gradient method with momentum and AdaM have promising convergence behaviour compared to the original counterparts.
2024
International Conference of Numerical Analysis and Applied Mathematics 2022, ICNAAM 2022
Heraklion, GREECE
2022
3094
Franchini, G.; Porta, F.
Automatic Setting of Learning Rate and Mini-batch Size in Momentum and AdaM Stochastic Gradient Methods / Franchini, G.; Porta, F.. - 3094:1(2024). (Intervento presentato al convegno International Conference of Numerical Analysis and Applied Mathematics 2022, ICNAAM 2022 tenutosi a Heraklion, GREECE nel 2022) [10.1063/5.0210874].
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1362249
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact