High performance and extreme energy efficiency are strong requirements for a fast-growing number of edge-node Internet of Things (IoT) applications. While traditional Ultra-Low-Power designs rely on single-core micro-controllers (MCU), a new generation of architectures leveraging fully programmable tightly-coupled clusters of near-threshold processors is emerging, joining the performance gain of parallel execution over multiple cores with the energy efficiency of low-voltage operation. In this work we tackle one of the most critical energy-efficiency bottlenecks for these architectures: instruction memory hierarchy. Exploiting the instruction locality typical of data-parallel applications, we explore two different shared instruction cache architectures, based on energy-efficient latch-based memory banks: one leveraging a crossbar between processors and single-port banks (SP), and one leveraging banks with multiple read ports (MP). We evaluate the proposed architectures on a set of signal processing applications with different executable sizes and working-sets. The results show that the shared cache architectures are able to efficiently execute a much wider set of applications (including those featuring large memory footprint and irregular access patterns) with a much smaller area and with much better energy efficiency with respect to the private cache. The multi-port cache is suitable for sizes up to a few kB, improving performance by up to 40%, energy efficiency by up to 20%, and energy × area efficiency by up to 30% with respect to the private cache. The single-port solution is more suitable for larger cache sizes (up to 16 kB), providing up to 20% better energy × area efficiency than the multi-port, and up to 30% better energy efficiency than private cache.

The Quest for Energy-Efficient I$ Design in Ultra-Low-Power Clustered Many-Cores / Loi, Igor; Capotondi, Alessandro; Rossi, Davide; Marongiu, Andrea; Benini, Luca. - In: IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS. - ISSN 2332-7766. - ELETTRONICO. - 4:2(2018), pp. 99-112. [10.1109/TMSCS.2017.2769046]

The Quest for Energy-Efficient I$ Design in Ultra-Low-Power Clustered Many-Cores

Capotondi, Alessandro;Marongiu, Andrea;
2018

Abstract

High performance and extreme energy efficiency are strong requirements for a fast-growing number of edge-node Internet of Things (IoT) applications. While traditional Ultra-Low-Power designs rely on single-core micro-controllers (MCU), a new generation of architectures leveraging fully programmable tightly-coupled clusters of near-threshold processors is emerging, joining the performance gain of parallel execution over multiple cores with the energy efficiency of low-voltage operation. In this work we tackle one of the most critical energy-efficiency bottlenecks for these architectures: instruction memory hierarchy. Exploiting the instruction locality typical of data-parallel applications, we explore two different shared instruction cache architectures, based on energy-efficient latch-based memory banks: one leveraging a crossbar between processors and single-port banks (SP), and one leveraging banks with multiple read ports (MP). We evaluate the proposed architectures on a set of signal processing applications with different executable sizes and working-sets. The results show that the shared cache architectures are able to efficiently execute a much wider set of applications (including those featuring large memory footprint and irregular access patterns) with a much smaller area and with much better energy efficiency with respect to the private cache. The multi-port cache is suitable for sizes up to a few kB, improving performance by up to 40%, energy efficiency by up to 20%, and energy × area efficiency by up to 30% with respect to the private cache. The single-port solution is more suitable for larger cache sizes (up to 16 kB), providing up to 20% better energy × area efficiency than the multi-port, and up to 30% better energy efficiency than private cache.
2018
4
2
99
112
The Quest for Energy-Efficient I$ Design in Ultra-Low-Power Clustered Many-Cores / Loi, Igor; Capotondi, Alessandro; Rossi, Davide; Marongiu, Andrea; Benini, Luca. - In: IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS. - ISSN 2332-7766. - ELETTRONICO. - 4:2(2018), pp. 99-112. [10.1109/TMSCS.2017.2769046]
Loi, Igor; Capotondi, Alessandro; Rossi, Davide; Marongiu, Andrea; Benini, Luca
File in questo prodotto:
File Dimensione Formato  
08094020.pdf

Accesso riservato

Tipologia: Versione pubblicata dall'editore
Dimensione 3.39 MB
Formato Adobe PDF
3.39 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
POSTPRINT_capotondi_TMSCS18.pdf

Open access

Dimensione 4.97 MB
Formato Adobe PDF
4.97 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1171834
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 13
  • ???jsp.display-item.citation.isi??? 10
social impact