Several recent many-core accelerators have been architected as fabrics of tightly-coupled shared memory clusters. A hierarchical interconnection system is used – with a crossbarlike medium inside each cluster and a network-on-chip (NoC) at the global level – which make memory operations nonuniform (NUMA). Nested parallelism represents a powerful programming abstraction for these architectures, where a first level of parallelism can be used to distribute coarse-grained tasks to clusters, and additional levels of fine-grained parallelism can be distributed to processors within a cluster. This paper presents a lightweight and highly optimized support for nested parallelism on cluster-based embedded many-cores. We assess the costs to enable multi-level parallelization and demonstrate that our techniques allow to extract high degrees of parallelism.

Fast and lightweight support for nested parallelism on cluster-based embedded many-cores / Marongiu, A., Burgio, P., Benini, L.. - STAMPA. - (2012), pp. 105-110. (15th Design, Automation and Test in Europe Conference and Exhibition, DATE 2012 Dresden, deu 12-16 March 2012) [10.1109/DATE.2012.6176441].

Fast and lightweight support for nested parallelism on cluster-based embedded many-cores

MARONGIU, ANDREA;BURGIO, PAOLO;
2012

Abstract

Several recent many-core accelerators have been architected as fabrics of tightly-coupled shared memory clusters. A hierarchical interconnection system is used – with a crossbarlike medium inside each cluster and a network-on-chip (NoC) at the global level – which make memory operations nonuniform (NUMA). Nested parallelism represents a powerful programming abstraction for these architectures, where a first level of parallelism can be used to distribute coarse-grained tasks to clusters, and additional levels of fine-grained parallelism can be distributed to processors within a cluster. This paper presents a lightweight and highly optimized support for nested parallelism on cluster-based embedded many-cores. We assess the costs to enable multi-level parallelization and demonstrate that our techniques allow to extract high degrees of parallelism.
2012
Inglese
15th Design, Automation and Test in Europe Conference and Exhibition, DATE 2012
Dresden, deu
12-16 March 2012
Proceedings of Design, Automation & Test in Europe Conference & Exhibition (DATE), 2012
105
110
5
9781457721458
IEEE Press
STATI UNITI D'AMERICA
345 E 47TH ST, NEW YORK, NY 10017 USA
MANY-CORE EMBEDDED SYSTEMS; SHARED MEMORY EMBEDDED SYSTEMS; OPENMP; PROGRAMMING MODELS; SYNCHRONIZATION; NESTED PARALLELISM; CLUSTERED ARCHITECTURES
Marongiu, Andrea; Burgio, Paolo; Benini, Luca
Atti di CONVEGNO::Relazione in Atti di Convegno
273
3
Fast and lightweight support for nested parallelism on cluster-based embedded many-cores / Marongiu, A., Burgio, P., Benini, L.. - STAMPA. - (2012), pp. 105-110. (15th Design, Automation and Test in Europe Conference and Exhibition, DATE 2012 Dresden, deu 12-16 March 2012) [10.1109/DATE.2012.6176441].
reserved
info:eu-repo/semantics/conferenceObject
File in questo prodotto:
File Dimensione Formato  
date2012_nesting_CR.pdf

Accesso riservato

Dimensione 1.04 MB
Formato Adobe PDF
1.04 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1171852
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 22
  • ???jsp.display-item.citation.isi??? 12
social impact