Unleashing Fine-Grained Parallelism on Embedded Many-Core Accelerators with Lightweight OpenMP Tasking

In recent years, programmable many-core accelerators (PMCAs) have been introduced in embedded systems to satisfy stringent performance/Watt requirements. This has increased the urge for programming models capable of effectively leveraging hundreds to thousands of processors. Task-based parallelism has the potential to provide such capabilities, offering high-level abstractions to outline abundant and irregular parallelism in embedded applications. However, efficiently supporting this programming paradigm on embedded PMCAs is challenging, due to the large time and space overheads it introduces. In this paper we describe a lightweight OpenMP tasking runtime environment (RTE) design for a state-of-the-art embedded PMCA, the Kalray MPPA 256. We provide an exhaustive characterization of the costs of our RTE, considering both synthetic workload and real programs, and we compare to several other tasking RTEs. Experimental results confirm that our solution achieves near-ideal parallelization speedups for tasks as small as 5K cycles, and an average speedup of 12 × for real benchmarks, which is 60% higher than what we observe with the original Kalray OpenMP implementation.

Unleashing Fine-Grained Parallelism on Embedded Many-Core Accelerators with Lightweight OpenMP Tasking / Tagliavini, G; Cesarini, D; Marongiu, A. - In: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS. - ISSN 1045-9219. - STAMPA. - 29:9(2018), pp. 2150-2163. [10.1109/TPDS.2018.2814602]

Unleashing Fine-Grained Parallelism on Embedded Many-Core Accelerators with Lightweight OpenMP Tasking

TAGLIAVINI G;CESARINI D;MARONGIU A

2018

Abstract

In recent years, programmable many-core accelerators (PMCAs) have been introduced in embedded systems to satisfy stringent performance/Watt requirements. This has increased the urge for programming models capable of effectively leveraging hundreds to thousands of processors. Task-based parallelism has the potential to provide such capabilities, offering high-level abstractions to outline abundant and irregular parallelism in embedded applications. However, efficiently supporting this programming paradigm on embedded PMCAs is challenging, due to the large time and space overheads it introduces. In this paper we describe a lightweight OpenMP tasking runtime environment (RTE) design for a state-of-the-art embedded PMCA, the Kalray MPPA 256. We provide an exhaustive characterization of the costs of our RTE, considering both synthetic workload and real programs, and we compare to several other tasking RTEs. Experimental results confirm that our solution achieves near-ideal parallelization speedups for tasks as small as 5K cycles, and an average speedup of 12 × for real benchmarks, which is 60% higher than what we observe with the original Kalray OpenMP implementation.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2018
			
	Rivista
	
				IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
			
	N° del Volume
	
				29
			
	Fascicolo
	
				9
			
	Pagina iniziale
	
				2150
			
	Pagina finale
	
				2163
			
	Codice DOI
	
				https://dx.doi.org/10.1109/TPDS.2018.2814602
			
	Codice WoS
	
				WOS:000441445500017
			
	Codice Scopus
	
				2-s2.0-85043454916
			
	Citazione
	
				Unleashing Fine-Grained Parallelism on Embedded Many-Core Accelerators with Lightweight OpenMP Tasking / Tagliavini, G; Cesarini, D; Marongiu, A. - In: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS. - ISSN 1045-9219. - STAMPA. - 29:9(2018), pp. 2150-2163. [10.1109/TPDS.2018.2814602]
			
	Tutti gli autori
	
						Tagliavini, G; Cesarini, D; Marongiu, A
					
	Tipologia
	
				Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
tagliavini_TPDS2018.pdf Open access Tipologia: AAM - Versione dell'autore revisionata e accettata per la pubblicazione Dimensione 4 MB Formato Adobe PDF Visualizza/Apri	4 MB	Adobe PDF	Visualizza/Apri
VQR_Unleashing.pdf Accesso riservato Tipologia: VOR - Versione pubblicata dall'editore Dimensione 2 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	2 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1171846

Citazioni

ND

28

25

social impact