Modern embedded platforms are known to be constrained by size, weight and power (SWaP) requirements. In such contexts, achieving the desired performance-per-watt target calls for increasing the number of processors rather than ramping up their voltage and frequency. Hence, generation after generation, modern heterogeneous System on Chips (SoC) present a higher number of cores within their CPU complexes as well as a wider variety of accelerators that leverages massively parallel compute architectures. Previous literature demonstrated that while increasing parallelism is theoretically optimal for improving on average performance, shared memory hierarchies (i.e. caches and system DRAM) act as a bottleneck by exposing the platform processors to severe contention on memory accesses, hence dramatically impacting performance and timing predictability. In this work we characterize how subsequent generations of embedded platforms from the NVIDIA Tegra family balanced the increasing parallelism of each platform's processors with the consequent higher potential on memory interference. We also present an open-source software for generating test scenarios aimed at measuring memory contention in highly heterogeneous SoCs.
Contending memory in heterogeneous SoCs: Evolution in NVIDIA Tegra embedded platforms / Capodieci, N.; Cavicchioli, R.; Olmedo, I. S.; Solieri, M.; Bertogna, M.. - (2020), pp. 1-10. (Intervento presentato al convegno 26th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, RTCSA 2020 tenutosi a kor nel 2020) [10.1109/RTCSA50079.2020.9203722].
Contending memory in heterogeneous SoCs: Evolution in NVIDIA Tegra embedded platforms
Capodieci N.;Cavicchioli R.;Solieri M.;Bertogna M.
2020
Abstract
Modern embedded platforms are known to be constrained by size, weight and power (SWaP) requirements. In such contexts, achieving the desired performance-per-watt target calls for increasing the number of processors rather than ramping up their voltage and frequency. Hence, generation after generation, modern heterogeneous System on Chips (SoC) present a higher number of cores within their CPU complexes as well as a wider variety of accelerators that leverages massively parallel compute architectures. Previous literature demonstrated that while increasing parallelism is theoretically optimal for improving on average performance, shared memory hierarchies (i.e. caches and system DRAM) act as a bottleneck by exposing the platform processors to severe contention on memory accesses, hence dramatically impacting performance and timing predictability. In this work we characterize how subsequent generations of embedded platforms from the NVIDIA Tegra family balanced the increasing parallelism of each platform's processors with the consequent higher potential on memory interference. We also present an open-source software for generating test scenarios aimed at measuring memory contention in highly heterogeneous SoCs.File | Dimensione | Formato | |
---|---|---|---|
rtcsapreprintdraft.pdf
Open access
Descrizione: Articolo principale
Tipologia:
Versione dell'autore revisionata e accettata per la pubblicazione
Dimensione
771.79 kB
Formato
Adobe PDF
|
771.79 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris