Several recent many-core accelerators have been architected as fabrics of tightly-coupled shared memory clusters. A hierarchical interconnection system is used – with a crossbarlike medium inside each cluster and a network-on-chip (NoC) at the global level – which make memory operations nonuniform (NUMA). Nested parallelism represents a powerful programming abstraction for these architectures, where a first level of parallelism can be used to distribute coarse-grained tasks to clusters, and additional levels of fine-grained parallelism can be distributed to processors within a cluster. This paper presents a lightweight and highly optimized support for nested parallelism on cluster-based embedded many-cores. We assess the costs to enable multi-level parallelization and demonstrate that our techniques allow to extract high degrees of parallelism.

Fast and lightweight support for nested parallelism on cluster-based embedded many-cores / Marongiu, Andrea; Burgio, Paolo; Benini, Luca. - STAMPA. - (2012), pp. 105-110. (Intervento presentato al convegno Design, Automation & Test in Europe Conference & Exhibition (DATE), 2012 tenutosi a Dresden nel 12-16 March 2012) [10.1109/DATE.2012.6176441].

Fast and lightweight support for nested parallelism on cluster-based embedded many-cores

MARONGIU, ANDREA;BURGIO, PAOLO;
2012

Abstract

Several recent many-core accelerators have been architected as fabrics of tightly-coupled shared memory clusters. A hierarchical interconnection system is used – with a crossbarlike medium inside each cluster and a network-on-chip (NoC) at the global level – which make memory operations nonuniform (NUMA). Nested parallelism represents a powerful programming abstraction for these architectures, where a first level of parallelism can be used to distribute coarse-grained tasks to clusters, and additional levels of fine-grained parallelism can be distributed to processors within a cluster. This paper presents a lightweight and highly optimized support for nested parallelism on cluster-based embedded many-cores. We assess the costs to enable multi-level parallelization and demonstrate that our techniques allow to extract high degrees of parallelism.
2012
Design, Automation & Test in Europe Conference & Exhibition (DATE), 2012
Dresden
12-16 March 2012
105
110
Marongiu, Andrea; Burgio, Paolo; Benini, Luca
Fast and lightweight support for nested parallelism on cluster-based embedded many-cores / Marongiu, Andrea; Burgio, Paolo; Benini, Luca. - STAMPA. - (2012), pp. 105-110. (Intervento presentato al convegno Design, Automation & Test in Europe Conference & Exhibition (DATE), 2012 tenutosi a Dresden nel 12-16 March 2012) [10.1109/DATE.2012.6176441].
File in questo prodotto:
File Dimensione Formato  
date2012_nesting_CR.pdf

Accesso riservato

Dimensione 1.04 MB
Formato Adobe PDF
1.04 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1171852
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 22
  • ???jsp.display-item.citation.isi??? 12
social impact