The effectiveness of stochastic gradient methods strongly depends on a suitable selection of the hyperparameters which define them. Particularly, in the context of large-scale optimization problems often arising in machine learning applications, to properly fix both the learning rate and the mini-batch size in the definition of the stochastic directions is crucial for obtaining fast and efficient learning procedures. In a recent paper [1], the authors propose to define these hyperparameters by combining an adaptive subsampling strategy and a line search scheme. The aim of this work is to adapt this idea to both the stochastic gradient algorithm with momentum and the AdaM method in order to exploit the good numerical behaviour of the momentum-like stochastic gradient methods and the automatic technique to select the hyperparameters discussed in [1]. An extensive numerical experimentation carried out on convex functions, with different data sets, highlights that such combined hyperparameters technique makes the tuning of the hyperparameters computationally less expensive than the selection of suitable constant learning rate and mini-batch size and this is significant from the perspective of GreenAI. Furthermore, the proposed versions of the stochastic gradient method with momentum and AdaM have promising convergence behaviour compared to the original counterparts.
Automatic Setting of Learning Rate and Mini-batch Size in Momentum and AdaM Stochastic Gradient Methods / Franchini, G.; Porta, F.. - 3094:1(2024). (Intervento presentato al convegno International Conference of Numerical Analysis and Applied Mathematics 2022, ICNAAM 2022 tenutosi a Heraklion, GREECE nel 2022) [10.1063/5.0210874].
Automatic Setting of Learning Rate and Mini-batch Size in Momentum and AdaM Stochastic Gradient Methods
Franchini G.;Porta F.
2024
Abstract
The effectiveness of stochastic gradient methods strongly depends on a suitable selection of the hyperparameters which define them. Particularly, in the context of large-scale optimization problems often arising in machine learning applications, to properly fix both the learning rate and the mini-batch size in the definition of the stochastic directions is crucial for obtaining fast and efficient learning procedures. In a recent paper [1], the authors propose to define these hyperparameters by combining an adaptive subsampling strategy and a line search scheme. The aim of this work is to adapt this idea to both the stochastic gradient algorithm with momentum and the AdaM method in order to exploit the good numerical behaviour of the momentum-like stochastic gradient methods and the automatic technique to select the hyperparameters discussed in [1]. An extensive numerical experimentation carried out on convex functions, with different data sets, highlights that such combined hyperparameters technique makes the tuning of the hyperparameters computationally less expensive than the selection of suitable constant learning rate and mini-batch size and this is significant from the perspective of GreenAI. Furthermore, the proposed versions of the stochastic gradient method with momentum and AdaM have promising convergence behaviour compared to the original counterparts.Pubblicazioni consigliate
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris