Automatic Setting of Learning Rate and Mini-batch Size in Momentum and AdaM Stochastic Gradient Methods

The effectiveness of stochastic gradient methods strongly depends on a suitable selection of the hyperparameters which define them. Particularly, in the context of large-scale optimization problems often arising in machine learning applications, to properly fix both the learning rate and the mini-batch size in the definition of the stochastic directions is crucial for obtaining fast and efficient learning procedures. In a recent paper [1], the authors propose to define these hyperparameters by combining an adaptive subsampling strategy and a line search scheme. The aim of this work is to adapt this idea to both the stochastic gradient algorithm with momentum and the AdaM method in order to exploit the good numerical behaviour of the momentum-like stochastic gradient methods and the automatic technique to select the hyperparameters discussed in [1]. An extensive numerical experimentation carried out on convex functions, with different data sets, highlights that such combined hyperparameters technique makes the tuning of the hyperparameters computationally less expensive than the selection of suitable constant learning rate and mini-batch size and this is significant from the perspective of GreenAI. Furthermore, the proposed versions of the stochastic gradient method with momentum and AdaM have promising convergence behaviour compared to the original counterparts.

Automatic Setting of Learning Rate and Mini-batch Size in Momentum and AdaM Stochastic Gradient Methods / Franchini, G.; Porta, F.. - 3094:1(2024). (Intervento presentato al convegno International Conference of Numerical Analysis and Applied Mathematics 2022, ICNAAM 2022 tenutosi a Heraklion, GREECE nel 2022) [10.1063/5.0210874].

Automatic Setting of Learning Rate and Mini-batch Size in Momentum and AdaM Stochastic Gradient Methods

Franchini G.;Porta F.

2024

Abstract

The effectiveness of stochastic gradient methods strongly depends on a suitable selection of the hyperparameters which define them. Particularly, in the context of large-scale optimization problems often arising in machine learning applications, to properly fix both the learning rate and the mini-batch size in the definition of the stochastic directions is crucial for obtaining fast and efficient learning procedures. In a recent paper [1], the authors propose to define these hyperparameters by combining an adaptive subsampling strategy and a line search scheme. The aim of this work is to adapt this idea to both the stochastic gradient algorithm with momentum and the AdaM method in order to exploit the good numerical behaviour of the momentum-like stochastic gradient methods and the automatic technique to select the hyperparameters discussed in [1]. An extensive numerical experimentation carried out on convex functions, with different data sets, highlights that such combined hyperparameters technique makes the tuning of the hyperparameters computationally less expensive than the selection of suitable constant learning rate and mini-batch size and this is significant from the perspective of GreenAI. Furthermore, the proposed versions of the stochastic gradient method with momentum and AdaM have promising convergence behaviour compared to the original counterparts.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2024
			
	Titolo del Convegno
	
				International Conference of Numerical Analysis and Applied Mathematics 2022, ICNAAM 2022
			
	Luogo del Convegno
	
				Heraklion, GREECE
			
	Data del Convegno
	
				2022
			
	Codice DOI
	
				https://dx.doi.org/10.1063/5.0210874
			
	Codice WoS
	
				WOS:001244923000213
			
	Codice Scopus
	
				2-s2.0-85196531848
			
	Serie
	
				AIP CONFERENCE PROCEEDINGS
			
	N° del Volume
	
				3094
			
	Tutti gli autori
	
						Franchini, G.; Porta, F.
					
	Citazione
	
				Automatic Setting of Learning Rate and Mini-batch Size in Momentum and AdaM Stochastic Gradient Methods / Franchini, G.; Porta, F.. - 3094:1(2024). (Intervento presentato al  convegno International Conference of Numerical Analysis and Applied Mathematics 2022, ICNAAM 2022 tenutosi a Heraklion, GREECE nel 2022) [10.1063/5.0210874].
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1362249

Citazioni

ND

0

0

social impact