The ever-increasing need for computational power in embedded devices has led to the adoption heterogeneous SoCs combining a general purpose CPU with a data parallel accelerator. These systems rely on a shared main memory (DRAM), which makes them highly susceptible to memory interference. A promising software technique to counter such effects is the Predictable Execution Model (PREM). PREM ensures robustness to interference by separating programs into a sequence of memory and compute phases, and by enforcing a platform-level schedule where only a single processing subsystem is permitted to execute a memory phase at a time. This article demonstrates for the first time how PREM can be applied to heterogeneous SoCs, based on a synchronization technique for memory isolation between CPU and GPU plus a compiler to transform GPU kernels into PREM-compliant codes. For compute bound GPU workloads sharing the DRAM bandwidth 50/50 with the CPU we guarantee near-zero timing varibility at a performance loss of just 59 percent, which is one to two orders of magnitude smaller than the worst case we see for unmodified programs under memory interference.
HePREM: A Predictable Execution Model for GPU-based Heterogeneous SoCs / Forsberg, B.; Benini, L.; Marongiu, A.. - In: IEEE TRANSACTIONS ON COMPUTERS. - ISSN 0018-9340. - 70:1(2021), pp. 17-29. [10.1109/TC.2020.2980520]
HePREM: A Predictable Execution Model for GPU-based Heterogeneous SoCs
Marongiu A.
2021
Abstract
The ever-increasing need for computational power in embedded devices has led to the adoption heterogeneous SoCs combining a general purpose CPU with a data parallel accelerator. These systems rely on a shared main memory (DRAM), which makes them highly susceptible to memory interference. A promising software technique to counter such effects is the Predictable Execution Model (PREM). PREM ensures robustness to interference by separating programs into a sequence of memory and compute phases, and by enforcing a platform-level schedule where only a single processing subsystem is permitted to execute a memory phase at a time. This article demonstrates for the first time how PREM can be applied to heterogeneous SoCs, based on a synchronization technique for memory isolation between CPU and GPU plus a compiler to transform GPU kernels into PREM-compliant codes. For compute bound GPU workloads sharing the DRAM bandwidth 50/50 with the CPU we guarantee near-zero timing varibility at a performance loss of just 59 percent, which is one to two orders of magnitude smaller than the worst case we see for unmodified programs under memory interference.File | Dimensione | Formato | |
---|---|---|---|
HePREM_A_Predictable_Execution_Model_for_GPU-based_Heterogeneous_SoCs.pdf
Accesso riservato
Tipologia:
VOR - Versione pubblicata dall'editore
Dimensione
1.14 MB
Formato
Adobe PDF
|
1.14 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
tc.2020.2980520.pdf
Open access
Tipologia:
AAM - Versione dell'autore revisionata e accettata per la pubblicazione
Dimensione
1.8 MB
Formato
Adobe PDF
|
1.8 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris