Several Chip-Multiprocessor designs today leverage tightly-coupled computing clusters as a building block. These clusters consist of a fairly large number N of simple cores, featuring fast communication through a shared multibanked L1 data memory and ≈ 1 Instruction-Per-Cycle (IPC) per core. Thus, aggregated I-fetch bandwidth approaches ƒ * N, where ƒ is the cluster clock frequency. An effective instruction cache architecture is key to support this I-fetch bandwidth. In this paper we compare two main architectures for instruction caching targeting tightly coupled CMP clusters: (i) private instruction caches per core and (ii) shared instruction cache per cluster. We developed a cycle-accurate model of the tightly coupled cluster with several configurable architectural parameters for exploration, plus a programming environment targeted at efficient data-parallel computing. We conduct an in-depth study of the two architectural templates based on the use of both synthetic microbenchmarks and real program workloads. Our results provide useful insights and guidelines for designers.
Exploring Instruction caching strategies for tightly-coupled shared-memory clusters / Bortolotti, D.; Paterna, F.; Pinto, C.; Marongiu, A.; Ruggiero, M.; Benini, L.. - ELETTRONICO. - (2011), pp. 34-41. (Intervento presentato al convegno System on Chip (SoC), 2011 International Symposium on - Tampere, Finland tenutosi a Tampere, Finland nel Oct. 31 2011-Nov. 2 2011) [10.1109/ISSOC.2011.6089691].
Exploring Instruction caching strategies for tightly-coupled shared-memory clusters
A. Marongiu;
2011
Abstract
Several Chip-Multiprocessor designs today leverage tightly-coupled computing clusters as a building block. These clusters consist of a fairly large number N of simple cores, featuring fast communication through a shared multibanked L1 data memory and ≈ 1 Instruction-Per-Cycle (IPC) per core. Thus, aggregated I-fetch bandwidth approaches ƒ * N, where ƒ is the cluster clock frequency. An effective instruction cache architecture is key to support this I-fetch bandwidth. In this paper we compare two main architectures for instruction caching targeting tightly coupled CMP clusters: (i) private instruction caches per core and (ii) shared instruction cache per cluster. We developed a cycle-accurate model of the tightly coupled cluster with several configurable architectural parameters for exploration, plus a programming environment targeted at efficient data-parallel computing. We conduct an in-depth study of the two architectural templates based on the use of both synthetic microbenchmarks and real program workloads. Our results provide useful insights and guidelines for designers.File | Dimensione | Formato | |
---|---|---|---|
Exploring instruction caching strategies for tightly-coupled shared-memory clusters.pdf
Accesso riservato
Dimensione
361.59 kB
Formato
Adobe PDF
|
361.59 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris