Reconciling QoS and Concurrency in NVIDIA GPUs via Warp-Level Scheduling

The widespread deployment of NVIDIA GPUs in latency-sensitive systems today requires predictable GPU multi-tasking, which cannot be trivially achieved. The NVIDIA CUDA API allows programmers to easily exploit the processing power provided by these massively parallel accelerators and is one of the major reasons behind their ubiquity. However, NVIDIA GPUs and the CUDA programming model favor throughput instead of latency and timing predictability. Hence, providing real-time and quality-of-service (QoS) properties to GPU applications presents an interesting research challenge. Such a challenge is paramount when considering simultaneous multikernel (SMK) scenarios, wherein kernels are executed concurrently within each streaming multiprocessor (SM). In this work, we explore QoS-based fine-grained multitasking in SMK via job arbitration at the lowest level of the GPU scheduling hierarchy, i.e., between warps. We present QoS-aware warp scheduling (QAWS) and evaluate it against state-of-the-art, kernel-agnostic policies seen in NVIDIA hardware today. Since the NVIDIA ecosystem lacks a mechanism to specify and enforce kernel priority at the warp granularity, we implement and evaluate our proposed warp scheduling policy on GPGPU-Sim. QAWS not only improves the response time of the higher priority tasks but also has comparable or better throughput than the state-of-the-art policies.

Reconciling QoS and Concurrency in NVIDIA GPUs via Warp-Level Scheduling / Singh, J.; Olmedo, I. S.; Capodieci, N.; Marongiu, A.; Caccamo, M.. - (2022), pp. 1275-1280. (Intervento presentato al convegno 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022 tenutosi a bel nel 2022) [10.23919/DATE54114.2022.9774761].

Reconciling QoS and Concurrency in NVIDIA GPUs via Warp-Level Scheduling

Singh J.;Olmedo I. S.;Capodieci N.;Marongiu A.;Caccamo M.

2022

Abstract

The widespread deployment of NVIDIA GPUs in latency-sensitive systems today requires predictable GPU multi-tasking, which cannot be trivially achieved. The NVIDIA CUDA API allows programmers to easily exploit the processing power provided by these massively parallel accelerators and is one of the major reasons behind their ubiquity. However, NVIDIA GPUs and the CUDA programming model favor throughput instead of latency and timing predictability. Hence, providing real-time and quality-of-service (QoS) properties to GPU applications presents an interesting research challenge. Such a challenge is paramount when considering simultaneous multikernel (SMK) scenarios, wherein kernels are executed concurrently within each streaming multiprocessor (SM). In this work, we explore QoS-based fine-grained multitasking in SMK via job arbitration at the lowest level of the GPU scheduling hierarchy, i.e., between warps. We present QoS-aware warp scheduling (QAWS) and evaluate it against state-of-the-art, kernel-agnostic policies seen in NVIDIA hardware today. Since the NVIDIA ecosystem lacks a mechanism to specify and enforce kernel priority at the warp granularity, we implement and evaluate our proposed warp scheduling policy on GPGPU-Sim. QAWS not only improves the response time of the higher priority tasks but also has comparable or better throughput than the state-of-the-art policies.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2022
			
	Titolo del Convegno
	
				2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022
			
	Luogo del Convegno
	
				bel
			
	Data del Convegno
	
				2022
			
	Codice DOI
	
				https://dx.doi.org/10.23919/DATE54114.2022.9774761
			
	Codice WoS
	
				WOS:000819484300240
			
	Codice Scopus
	
				2-s2.0-85130828719
			
	Serie
	
				PROCEEDINGS - DESIGN, AUTOMATION, AND TEST IN EUROPE CONFERENCE AND EXHIBITION
			
	Pagina iniziale
	
				1275
			
	Pagina finale
	
				1280
			
	Tutti gli autori
	
						Singh, J.; Olmedo, I. S.; Capodieci, N.; Marongiu, A.; Caccamo, M.
					
	Citazione
	
				Reconciling QoS and Concurrency in NVIDIA GPUs via Warp-Level Scheduling / Singh, J.; Olmedo, I. S.; Capodieci, N.; Marongiu, A.; Caccamo, M.. - (2022), pp. 1275-1280. (Intervento presentato al  convegno 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022 tenutosi a bel nel 2022) [10.23919/DATE54114.2022.9774761].
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

File	Dimensione	Formato
Reconciling_QoS_and_Concurrency_in_NVIDIA_GPUs_via_Warp-Level_Scheduling.pdf Accesso riservato Tipologia: VOR - Versione pubblicata dall'editore Dimensione 517.18 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	517.18 kB	Adobe PDF	Visualizza/Apri Richiedi una copia
QoS_Warp_scheduling_designDRAFT.pdf Open access Descrizione: Draft pre-print Tipologia: AAM - Versione dell'autore revisionata e accettata per la pubblicazione Dimensione 754.56 kB Formato Adobe PDF Visualizza/Apri	754.56 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1281886

Citazioni

ND

6

6

social impact