Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine

Shared virtual memory (SVM) is key in heterogeneous systems on chip (SoCs), which combine a general-purpose host processor with a many-core accelerator, both for programmability and to avoid data duplication. However, SVM can bring a significant run time overhead when translation lookaside buffer (TLB) entries are missing. Moreover, allowing DMA burst transfers to write SVM traditionally requires buffers to absorb transfers that miss in the TLB. These buffers have to be overprovisioned for the maximum burst size, wasting precious on-chip memory, and stall all SVM accesses once they are full, hampering the scalability of parallel accelerators. In this work, we present our SVM solution that avoids the majority of TLB misses with prefetching, supports parallel burst DMA transfers without additional buffers, and can be scaled with the workload and number of parallel processors. Our solution is based on three novel concepts: To minimize the rate of TLB misses, the TLB is proactively filled by compiler-generated Prefetching Helper Threads, which use run-Time information to issue timely prefetches. To reduce the latency of TLB misses, misses are handled by a variable number of parallel Miss Handling Helper Threads. To support parallel burst DMA transfers to SVM without additional buffers, we add lightweight hardware to a standard DMA engine to detect and react to TLB misses. Compared to the state of the art, our work improves accelerator performance for memory-intensive kernels by up to 4~ and by up to 60% for irregular and regular memory access patterns, respectively.

Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine / Kurth, A.; Vogel, P.; Marongiu, A.; Benini, L.. - (2018), pp. 292-300. (Intervento presentato al convegno 36th International Conference on Computer Design, ICCD 2018 tenutosi a Holiday Inn Orlando - Disney Springs Area, usa nel 2018) [10.1109/ICCD.2018.00052].

Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine

Kurth A.;Vogel P.;Marongiu A.;Benini L.

2018

Abstract

Shared virtual memory (SVM) is key in heterogeneous systems on chip (SoCs), which combine a general-purpose host processor with a many-core accelerator, both for programmability and to avoid data duplication. However, SVM can bring a significant run time overhead when translation lookaside buffer (TLB) entries are missing. Moreover, allowing DMA burst transfers to write SVM traditionally requires buffers to absorb transfers that miss in the TLB. These buffers have to be overprovisioned for the maximum burst size, wasting precious on-chip memory, and stall all SVM accesses once they are full, hampering the scalability of parallel accelerators. In this work, we present our SVM solution that avoids the majority of TLB misses with prefetching, supports parallel burst DMA transfers without additional buffers, and can be scaled with the workload and number of parallel processors. Our solution is based on three novel concepts: To minimize the rate of TLB misses, the TLB is proactively filled by compiler-generated Prefetching Helper Threads, which use run-Time information to issue timely prefetches. To reduce the latency of TLB misses, misses are handled by a variable number of parallel Miss Handling Helper Threads. To support parallel burst DMA transfers to SVM without additional buffers, we add lightweight hardware to a standard DMA engine to detect and react to TLB misses. Compared to the state of the art, our work improves accelerator performance for memory-intensive kernels by up to 4~ and by up to 60% for irregular and regular memory access patterns, respectively.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
			2018
		
	Data di prima pubblicazione
	
			2018
		
	Titolo del Convegno
	
			36th International Conference on Computer Design, ICCD 2018
		
	Luogo del Convegno
	
			Holiday Inn Orlando - Disney Springs Area, usa
		
	Data del Convegno
	
			2018
		
	Codice DOI
	
			https://dx.doi.org/10.1109/ICCD.2018.00052
		
	Codice WoS
	
			WOS:000458293200041
		
	Codice Scopus
	
			2-s2.0-85062215151
		
	Pagina iniziale
	
			292
		
	Pagina finale
	
			300
		
	Tutti gli autori
	
			Kurth, A.; Vogel, P.; Marongiu, A.; Benini, L.
		
	Citazione
	
			Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine / Kurth, A.; Vogel, P.; Marongiu, A.; Benini, L.. - (2018), pp. 292-300. (Intervento presentato al  convegno 36th International Conference on Computer Design, ICCD 2018 tenutosi a Holiday Inn Orlando - Disney Springs Area, usa nel 2018) [10.1109/ICCD.2018.00052].
		
	Tipologia
	
			Relazione in Atti di Convegno

File in questo prodotto:

File	Dimensione	Formato
kurth_ICCD2018.pdf Accesso riservato Descrizione: Articolo principale (versione editoriale) Tipologia: Versione pubblicata dall'editore Dimensione 366.81 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	366.81 kB	Adobe PDF	Visualizza/Apri Richiedi una copia
svm-prefetch-dma-paper.pdf Open access Tipologia: Versione dell'autore revisionata e accettata per la pubblicazione Dimensione 2.66 MB Formato Adobe PDF Visualizza/Apri	2.66 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1179001

Citazioni

ND

10

6

social impact