Manufacturing and environmental variations cause timing errors in microelectronic processors that are typically avoided by ultra-conservative multi-corner design margins or corrected by error detection and recovery mechanisms at the circuit-level. In contrast, we present here runtime software support for cost-effective countermeasures against hardware timing failures during system operation. We propose a variability-aware OpenMP (VOMP) programming environment, suitable for tightly-coupled shared memory processor clusters, that relies upon modeling across the hardware/software interface. VOMP is implemented as an extension to the OpenMP v3.0 programming model that covers various parallel constructs, including , , and . Using the notion of work-unit vulnerability (WUV) proposed here, we capture timing errors caused by circuit-level variability as high-level software knowledge. WUV consists of descriptive metadata to characterize the impact of variability on different work-unit types running on various cores. As such, WUV provides a useful abstraction of hardware variability to efficiently allocate a given work-unit to a suitable core for execution. VOMP enables hardware/software collaboration with online variability monitors in hardware and runtime scheduling in software. The hardware provides online per-core characterization of WUV metadata. This metadata is made available by carefully placing key data structures in a shared L1 memory and is used by VOMP schedulerss. Our results show that VOMP greatly reduces the cost of timing error recovery compared to the baseline schedulers of OpenMP, yielding speedup of 3%–36% for tasks, and 26%–49% for sections. Further, VOMP reaches energy saving of 2%–46% and 15%–50% for tasks, and sections, respectively.
|Data di pubblicazione:||2014|
|Titolo:||Improving resilience to timing errors by exposing variability effects to software in tightly-coupled processor clusters|
|Autore/i:||Rahimi, Abbas; Cesarini, Daniele; Marongiu, Andrea; Gupta Rajesh, K.; Benini, Luca|
|Digital Object Identifier (DOI):||10.1109/JETCAS.2014.2315883|
|Codice identificativo Scopus:||2-s2.0-84903268562|
|Citazione:||Improving resilience to timing errors by exposing variability effects to software in tightly-coupled processor clusters / Rahimi, Abbas; Cesarini, Daniele; Marongiu, Andrea; Gupta Rajesh, K.; Benini, Luca. - In: IEEE JOURNAL OF EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS. - ISSN 2156-3357. - STAMPA. - 4:2(2014), pp. 216-229.|
|Tipologia||Articolo su rivista|
File in questo prodotto:
|Improving Resilience to Timing Errors by Exposing Variability Effects to Software in Tightly-Coupled Processor Clusters.pdf||N/A||Administrator Richiedi una copia|
I documenti presenti in Iris Unimore sono rilasciati con licenza Creative Commons Attribuzione - Non commerciale - Non opere derivate 3.0 Italia, salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris