Manufacturing and environmental variations cause timing errors in microelectronic processors that are typically avoided by ultra-conservative multi-corner design margins or corrected by error detection and recovery mechanisms at the circuit-level. In contrast, we present here runtime software support for cost-effective countermeasures against hardware timing failures during system operation. We propose a variability-aware OpenMP (VOMP) programming environment, suitable for tightly-coupled shared memory processor clusters, that relies upon modeling across the hardware/software interface. VOMP is implemented as an extension to the OpenMP v3.0 programming model that covers various parallel constructs, including , , and . Using the notion of work-unit vulnerability (WUV) proposed here, we capture timing errors caused by circuit-level variability as high-level software knowledge. WUV consists of descriptive metadata to characterize the impact of variability on different work-unit types running on various cores. As such, WUV provides a useful abstraction of hardware variability to efficiently allocate a given work-unit to a suitable core for execution. VOMP enables hardware/software collaboration with online variability monitors in hardware and runtime scheduling in software. The hardware provides online per-core characterization of WUV metadata. This metadata is made available by carefully placing key data structures in a shared L1 memory and is used by VOMP schedulerss. Our results show that VOMP greatly reduces the cost of timing error recovery compared to the baseline schedulers of OpenMP, yielding speedup of 3%–36% for tasks, and 26%–49% for sections. Further, VOMP reaches energy saving of 2%–46% and 15%–50% for tasks, and sections, respectively.

Improving resilience to timing errors by exposing variability effects to software in tightly-coupled processor clusters / Rahimi, Abbas; Cesarini, Daniele; Marongiu, Andrea; Gupta Rajesh, K.; Benini, Luca. - In: IEEE JOURNAL OF EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS. - ISSN 2156-3357. - STAMPA. - 4:2(2014), pp. 216-229. [10.1109/JETCAS.2014.2315883]

Improving resilience to timing errors by exposing variability effects to software in tightly-coupled processor clusters

Marongiu Andrea;
2014

Abstract

Manufacturing and environmental variations cause timing errors in microelectronic processors that are typically avoided by ultra-conservative multi-corner design margins or corrected by error detection and recovery mechanisms at the circuit-level. In contrast, we present here runtime software support for cost-effective countermeasures against hardware timing failures during system operation. We propose a variability-aware OpenMP (VOMP) programming environment, suitable for tightly-coupled shared memory processor clusters, that relies upon modeling across the hardware/software interface. VOMP is implemented as an extension to the OpenMP v3.0 programming model that covers various parallel constructs, including , , and . Using the notion of work-unit vulnerability (WUV) proposed here, we capture timing errors caused by circuit-level variability as high-level software knowledge. WUV consists of descriptive metadata to characterize the impact of variability on different work-unit types running on various cores. As such, WUV provides a useful abstraction of hardware variability to efficiently allocate a given work-unit to a suitable core for execution. VOMP enables hardware/software collaboration with online variability monitors in hardware and runtime scheduling in software. The hardware provides online per-core characterization of WUV metadata. This metadata is made available by carefully placing key data structures in a shared L1 memory and is used by VOMP schedulerss. Our results show that VOMP greatly reduces the cost of timing error recovery compared to the baseline schedulers of OpenMP, yielding speedup of 3%–36% for tasks, and 26%–49% for sections. Further, VOMP reaches energy saving of 2%–46% and 15%–50% for tasks, and sections, respectively.
2014
4
2
216
229
Improving resilience to timing errors by exposing variability effects to software in tightly-coupled processor clusters / Rahimi, Abbas; Cesarini, Daniele; Marongiu, Andrea; Gupta Rajesh, K.; Benini, Luca. - In: IEEE JOURNAL OF EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS. - ISSN 2156-3357. - STAMPA. - 4:2(2014), pp. 216-229. [10.1109/JETCAS.2014.2315883]
Rahimi, Abbas; Cesarini, Daniele; Marongiu, Andrea; Gupta Rajesh, K.; Benini, Luca
File in questo prodotto:
File Dimensione Formato  
Improving Resilience to Timing Errors by Exposing Variability Effects to Software in Tightly-Coupled Processor Clusters.pdf

Accesso riservato

Dimensione 2.19 MB
Formato Adobe PDF
2.19 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1171866
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 9
  • ???jsp.display-item.citation.isi??? 6
social impact