Leveraging Automated Mixed-Low-Precision Quantization for Tiny Edge Microcontrollers

Rusci, M.; Fariselli, M.; Capotondi, A.; Benini, L.

doi:10.1007/978-3-030-66770-2_22

The severe on-chip memory limitations are currently preventing the deployment of the most accurate Deep Neural Network (DNN) models on tiny MicroController Units (MCUs), even if leveraging an effective 8-bit quantization scheme. To tackle this issue, in this paper we present an automated mixed-precision quantization flow based on the HAQ framework but tailored for the memory and computational characteristics of MCU devices. Specifically, a Reinforcement Learning agent searches for the best uniform quantization levels, among 2, 4, 8 bits, of individual weight and activation tensors, under the tight constraints on RAM and FLASH embedded memory sizes. We conduct an experimental analysis on MobileNetV1, MobileNetV2 and MNasNet models for Imagenet classification. Concerning the quantization policy search, the RL agent selects quantization policies that maximize the memory utilization. Given an MCU-class memory bound of 2 MB for weight-only quantization, the compressed models produced by the mixed-precision engine result as accurate as the state-of-the-art solutions quantized with a non-uniform function, which is not tailored for CPUs featuring integer-only arithmetic. This denotes the viability of uniform quantization, required for MCU deployments, for deep weights compression. When also limiting the activation memory budget to 512 kB, the best MobileNetV1 model scores up to 68.4% on Imagenet thanks to the found quantization policy, resulting to be 4% more accurate than the other 8-bit networks fitting the same memory constraints.

Leveraging Automated Mixed-Low-Precision Quantization for Tiny Edge Microcontrollers / Rusci, M., Fariselli, M., Capotondi, A., Benini, L.. - 1325:(2020), pp. 296-308. (2nd International Workshop on IoT Streams for Data-Driven Predictive Maintenance, IoT Streams 2020, and 1st International Workshop on IoT, Edge, and Mobile for Embedded Machine Learning, ITEM 2020, co-located with ECML/PKDD 2020 Ghent, bel 14 - 18 September 2020) [10.1007/978-3-030-66770-2_22].

Leveraging Automated Mixed-Low-Precision Quantization for Tiny Edge Microcontrollers

Rusci M.;Fariselli M.;Capotondi A.;Benini L.

2020

Abstract

The severe on-chip memory limitations are currently preventing the deployment of the most accurate Deep Neural Network (DNN) models on tiny MicroController Units (MCUs), even if leveraging an effective 8-bit quantization scheme. To tackle this issue, in this paper we present an automated mixed-precision quantization flow based on the HAQ framework but tailored for the memory and computational characteristics of MCU devices. Specifically, a Reinforcement Learning agent searches for the best uniform quantization levels, among 2, 4, 8 bits, of individual weight and activation tensors, under the tight constraints on RAM and FLASH embedded memory sizes. We conduct an experimental analysis on MobileNetV1, MobileNetV2 and MNasNet models for Imagenet classification. Concerning the quantization policy search, the RL agent selects quantization policies that maximize the memory utilization. Given an MCU-class memory bound of 2 MB for weight-only quantization, the compressed models produced by the mixed-precision engine result as accurate as the state-of-the-art solutions quantized with a non-uniform function, which is not tailored for CPUs featuring integer-only arithmetic. This denotes the viability of uniform quantization, required for MCU deployments, for deep weights compression. When also limiting the activation memory budget to 512 kB, the best MobileNetV1 model scores up to 68.4% on Imagenet thanks to the found quantization policy, resulting to be 4% more accurate than the other 8-bit networks fitting the same memory constraints.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2020
			
	Titolo del Convegno
	
				2nd International Workshop on IoT Streams for Data-Driven Predictive Maintenance, IoT Streams 2020, and 1st International Workshop on IoT, Edge, and Mobile for Embedded Machine Learning, ITEM 2020, co-located with ECML/PKDD 2020
			
	Luogo del Convegno
	
				Ghent, bel
			
	Data del Convegno
	
				14 - 18 September 2020
			
	Codice DOI
	
				https://dx.doi.org/10.1007/978-3-030-66770-2_22
			
	Codice Scopus
	
				2-s2.0-85101585132
			
	Serie
	
				COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE
			
	N° del Volume
	
				1325
			
	Pagina iniziale
	
				296
			
	Pagina finale
	
				308
			
	Tutti gli autori
	
						Rusci, M.; Fariselli, M.; Capotondi, A.; Benini, L.
					
	Citazione
	
				Leveraging Automated Mixed-Low-Precision Quantization for Tiny Edge Microcontrollers / Rusci, M., Fariselli, M., Capotondi, A., Benini, L.. - 1325:(2020), pp. 296-308. (2nd International Workshop on IoT Streams for Data-Driven Predictive Maintenance, IoT Streams 2020, and 1st International Workshop on IoT, Edge, and Mobile for Embedded Machine Learning, ITEM 2020, co-located with ECML/PKDD 2020 Ghent, bel 14 - 18 September 2020) [10.1007/978-3-030-66770-2_22].
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

File	Dimensione	Formato
Leveraging Automated Mixed-Low-Precision_PostPrint.pdf Open access Tipologia: AAM - Versione dell'autore revisionata e accettata per la pubblicazione Dimensione 813.22 kB Formato Adobe PDF Visualizza/Apri	813.22 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris