Manycore accelerators have recently proven a promising solution for increasingly powerful and energy efficient computing systems. This raises the need for parallel programming models capable of effectively leveraging hundreds to thousands of processors. Task-based parallelism has the potential to provide such capabilities, offering flexible support to fine-grained and irregular parallelism. However, efficiently supporting this programming paradigm on resource-constrained parallel accelerators is a challenging task. In this paper, we present an optimized implementation of the OpenMP tasking model for embedded parallel accelerators, discussing the key design solution that guarantee small memory (footprint) and minimize performance overheads. We validate our design by comparing to several state-of-the-art tasking implementations, using the most representative parallelization patterns. The experimental results confirm that our solution achieves near-ideal speedups for tasks as small as 5K cycles.

An optimized task-based runtime system for resource-constrained parallel accelerators / Cesarini, D; Marongiu, A; Benini, L. - ELETTRONICO. - (2016), pp. 1261-1266. (Intervento presentato al convegno 19th Design, Automation and Test in Europe Conference and Exhibition, DATE 2016 tenutosi a Dresden nel 14-18 March 2016) [10.3850/9783981537079_0607].

An optimized task-based runtime system for resource-constrained parallel accelerators

MARONGIU A;
2016

Abstract

Manycore accelerators have recently proven a promising solution for increasingly powerful and energy efficient computing systems. This raises the need for parallel programming models capable of effectively leveraging hundreds to thousands of processors. Task-based parallelism has the potential to provide such capabilities, offering flexible support to fine-grained and irregular parallelism. However, efficiently supporting this programming paradigm on resource-constrained parallel accelerators is a challenging task. In this paper, we present an optimized implementation of the OpenMP tasking model for embedded parallel accelerators, discussing the key design solution that guarantee small memory (footprint) and minimize performance overheads. We validate our design by comparing to several state-of-the-art tasking implementations, using the most representative parallelization patterns. The experimental results confirm that our solution achieves near-ideal speedups for tasks as small as 5K cycles.
2016
19th Design, Automation and Test in Europe Conference and Exhibition, DATE 2016
Dresden
14-18 March 2016
1261
1266
Cesarini, D; Marongiu, A; Benini, L
An optimized task-based runtime system for resource-constrained parallel accelerators / Cesarini, D; Marongiu, A; Benini, L. - ELETTRONICO. - (2016), pp. 1261-1266. (Intervento presentato al convegno 19th Design, Automation and Test in Europe Conference and Exhibition, DATE 2016 tenutosi a Dresden nel 14-18 March 2016) [10.3850/9783981537079_0607].
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1171844
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact