The widespread deployment of NVIDIA GPUs in latency-sensitive systems today requires predictable GPU multi-tasking, which cannot be trivially achieved. The NVIDIA CUDA API allows programmers to easily exploit the processing power provided by these massively parallel accelerators and is one of the major reasons behind their ubiquity. However, NVIDIA GPUs and the CUDA programming model favor throughput instead of latency and timing predictability. Hence, providing real-time and quality-of-service (QoS) properties to GPU applications presents an interesting research challenge. Such a challenge is paramount when considering simultaneous multikernel (SMK) scenarios, wherein kernels are executed concurrently within each streaming multiprocessor (SM). In this work, we explore QoS-based fine-grained multitasking in SMK via job arbitration at the lowest level of the GPU scheduling hierarchy, i.e., between warps. We present QoS-aware warp scheduling (QAWS) and evaluate it against state-of-the-art, kernel-agnostic policies seen in NVIDIA hardware today. Since the NVIDIA ecosystem lacks a mechanism to specify and enforce kernel priority at the warp granularity, we implement and evaluate our proposed warp scheduling policy on GPGPU-Sim. QAWS not only improves the response time of the higher priority tasks but also has comparable or better throughput than the state-of-the-art policies.

Reconciling QoS and Concurrency in NVIDIA GPUs via Warp-Level Scheduling / Singh, J.; Olmedo, I. S.; Capodieci, N.; Marongiu, A.; Caccamo, M.. - (2022), pp. 1275-1280. (Intervento presentato al convegno 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022 tenutosi a bel nel 2022) [10.23919/DATE54114.2022.9774761].

Reconciling QoS and Concurrency in NVIDIA GPUs via Warp-Level Scheduling

Capodieci N.;Marongiu A.;
2022

Abstract

The widespread deployment of NVIDIA GPUs in latency-sensitive systems today requires predictable GPU multi-tasking, which cannot be trivially achieved. The NVIDIA CUDA API allows programmers to easily exploit the processing power provided by these massively parallel accelerators and is one of the major reasons behind their ubiquity. However, NVIDIA GPUs and the CUDA programming model favor throughput instead of latency and timing predictability. Hence, providing real-time and quality-of-service (QoS) properties to GPU applications presents an interesting research challenge. Such a challenge is paramount when considering simultaneous multikernel (SMK) scenarios, wherein kernels are executed concurrently within each streaming multiprocessor (SM). In this work, we explore QoS-based fine-grained multitasking in SMK via job arbitration at the lowest level of the GPU scheduling hierarchy, i.e., between warps. We present QoS-aware warp scheduling (QAWS) and evaluate it against state-of-the-art, kernel-agnostic policies seen in NVIDIA hardware today. Since the NVIDIA ecosystem lacks a mechanism to specify and enforce kernel priority at the warp granularity, we implement and evaluate our proposed warp scheduling policy on GPGPU-Sim. QAWS not only improves the response time of the higher priority tasks but also has comparable or better throughput than the state-of-the-art policies.
2022
2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022
bel
2022
1275
1280
Singh, J.; Olmedo, I. S.; Capodieci, N.; Marongiu, A.; Caccamo, M.
Reconciling QoS and Concurrency in NVIDIA GPUs via Warp-Level Scheduling / Singh, J.; Olmedo, I. S.; Capodieci, N.; Marongiu, A.; Caccamo, M.. - (2022), pp. 1275-1280. (Intervento presentato al convegno 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022 tenutosi a bel nel 2022) [10.23919/DATE54114.2022.9774761].
File in questo prodotto:
File Dimensione Formato  
Reconciling_QoS_and_Concurrency_in_NVIDIA_GPUs_via_Warp-Level_Scheduling.pdf

Accesso riservato

Tipologia: Versione pubblicata dall'editore
Dimensione 517.18 kB
Formato Adobe PDF
517.18 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
QoS_Warp_scheduling_designDRAFT.pdf

Open access

Descrizione: Draft pre-print
Tipologia: Versione dell'autore revisionata e accettata per la pubblicazione
Dimensione 754.56 kB
Formato Adobe PDF
754.56 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1281886
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 3
social impact