Modern embedded Systems-on-Chip (SoCs) integrate a large number of parallel processing units -- such as multicore CPUs and massively parallel accelerators -- that share certain hardware resources. While this high level of parallelism enhances peak performance and energy efficiency, it also introduces contention, as multiple compute units compete for said shared resources. Shared memory, in particular, becomes a major bottleneck due to its limited bandwidth, leading to severe contention that degrades performance and undermines predictability. These effects are especially problematic in mixed-critical systems and time-sensitive applications, where bounded latency and timing guarantees are essential. Understanding the sources of memory interference and developing effective mitigation mechanisms are therefore key to improving isolation and timing predictability. This thesis tackles the problem of memory bandwidth contention in multicore and heterogeneous embedded platforms by combining a detailed analysis of interference phenomena with the design of novel mitigation strategies. It begins with a systematic characterization of memory contention in multicore platforms, analyzing how interference arises across different levels of the shared memory hierarchy—including DRAM, caches, and microarchitectural components. Experiments on widely adopted commercial platforms based on similar SoCs, such as Xilinx UltraScale+ and NVIDIA Xavier AGX, reveal how interference at each level impacts performance differently, highlighting the main contention points that limit scalability. Building on these insights, the thesis examines Memory Bandwidth Management Schemes (MBMSs)—software mechanisms designed to mitigate memory interference among parallel compute units. In dynamic mixed-criticality systems, where tasks frequently alternate across units, MBMSs mitigate interference by periodically reconfiguring memory bandwidth thresholds and regulating access accordingly. The effectiveness of an MBMS strongly depends on the granularity of reconfiguration, a factor often overlooked in prior work. This thesis provides an in-depth analysis of the bandwidth reconfiguration process, comparing two common approaches: synchronous and asynchronous. Experiments on a Xilinx UltraScale+ platform show that asynchronous reconfiguration significantly improves system responsiveness over the synchronous method. The analysis is then extended to heterogeneous platforms from NVIDIA, integrating GPUs as primary hardware accelerators. This thesis proposes a novel MBMS to protect CPU clusters from GPU-induced memory interference. The approach leverages CUDA Green Contexts -- a built-in NVIDIA feature -- making it applicable across all supported architectures and enabling fully software-driven, dynamic control of GPU memory bandwidth. Experiments on an NVIDIA Orin AGX platform demonstrate that this method effectively reduces GPU interference on CPU cores. Overall, this thesis advances the understanding of memory interference phenomena in both homogeneous and heterogeneous SoCs and proposes adaptive bandwidth control techniques that enhance performance predictability. The results provide a foundation for designing more efficient and QoS-aware embedded architectures through coordinated hardware-software memory management.

I moderni sistemi embedded integrano diverse unità di elaborazione parallela — come CPU multicore e acceleratori massivamente paralleli — che condividono alcune risorse hardware. Sebbene questo elevato livello di parallelismo migliori le prestazioni massime e l'efficienza energetica, introduce anche contesa, poiché diverse unità di calcolo competono per le risorse condivise. In particolare, la memoria condivisa diventa un collo di bottiglia a causa della sua banda limitata, causando una contesa che degrada le prestazioni e compromette la predicibilità. Questi effetti sono particolarmente problematici nei sistemi mixed-critical e nelle applicazioni time-sensitive, dove sono essenziali latenze limitate e garanzie temporali. Comprendere le fonti di interferenza della memoria e sviluppare meccanismi efficaci di mitigazione è quindi fondamentale per migliorare l'isolamento e la predicibilità temporale. Questa tesi affronta il problema della contesa della della memoria in piattaforme embedded multicore ed eterogenee, combinando un'analisi dettagliata dei fenomeni di interferenza con la progettazione di nuove strategie di mitigazione. Questa tesi inizia con una caratterizzazione sistematica della contesa della memoria nelle piattaforme multicore, analizzando come l'interferenza si manifesta a diversi livelli della gerarchia della memoria condivisa — inclusi DRAM, cache e componenti microarchitetturali. Esperimenti su piattaforme commerciali ampiamente adottate basate su sistemi simili, come Xilinx UltraScale+ e NVIDIA Xavier AGX, evidenziano come l'interferenza a ciascun livello impatti diversamente le prestazioni, mettendo in luce i principali punti di contesa che limitano la scalabilità. Sulla base di queste intuizioni, la tesi esamina i Memory Bandwidth Management Schemes (MBMS) — meccanismi software progettati per mitigare l'interferenza della memoria tra unità di calcolo parallele. Nei sistemi mixed-critical dinamici, dove i task si alternano frequentemente tra le unità, i MBMS mitigano l'interferenza riconfigurando periodicamente le soglie di banda allocate alle unità parallele, e regolando di conseguenza gli accessi alla memoria. L'efficacia di un MBMS dipende quindi fortemente dalla granularità della riconfigurazione, un aspetto spesso trascurato in lavori precedenti. Questa tesi offre un'analisi approfondita del processo di riconfigurazione della banda, confrontando due approcci comuni: sincrono e asincrono. Esperimenti su una piattaforma Xilinx UltraScale+ mostrano che la riconfigurazione asincrona migliora significativamente la reattività del sistema rispetto al metodo sincrono. L'analisi viene poi estesa a piattaforme eterogenee di NVIDIA, che integrano GPU come acceleratori hardware principali. La tesi propone un nuovo MBMS per proteggere i cluster CPU dall'interferenza della memoria indotta dalla GPU. L'approccio sfrutta CUDA Green Contexts — una funzionalità fornita da NVIDIA — rendendolo applicabile a tutte le architetture supportate e abilitando un controllo dinamico e completamente basato su software della banda di memoria consumata dalla GPU. Esperimenti su una piattaforma NVIDIA Orin AGX dimostrano che questo metodo riduce efficacemente l'interferenza della GPU sui core CPU. In generale, questa tesi approfondisce la comprensione dei fenomeni di interferenza della memoria in sistemi sia omogenei sia eterogenei e propone tecniche adattative di controllo della banda che migliorano la prevedibilità delle prestazioni. I risultati forniscono una base per progettare architetture embedded più efficienti mediante una gestione coordinata hardware-software della memoria.

Smascherare l'Interferenza di Memoria: Analisi e Mitigazione del Problema della Contesa della Memoria nei Sistemi su Chip Eterogenei / Andrea Serafini , 2026 Mar 31. 38. ciclo, Anno Accademico 2024/2025.

Smascherare l'Interferenza di Memoria: Analisi e Mitigazione del Problema della Contesa della Memoria nei Sistemi su Chip Eterogenei

SERAFINI, ANDREA
2026

Abstract

Modern embedded Systems-on-Chip (SoCs) integrate a large number of parallel processing units -- such as multicore CPUs and massively parallel accelerators -- that share certain hardware resources. While this high level of parallelism enhances peak performance and energy efficiency, it also introduces contention, as multiple compute units compete for said shared resources. Shared memory, in particular, becomes a major bottleneck due to its limited bandwidth, leading to severe contention that degrades performance and undermines predictability. These effects are especially problematic in mixed-critical systems and time-sensitive applications, where bounded latency and timing guarantees are essential. Understanding the sources of memory interference and developing effective mitigation mechanisms are therefore key to improving isolation and timing predictability. This thesis tackles the problem of memory bandwidth contention in multicore and heterogeneous embedded platforms by combining a detailed analysis of interference phenomena with the design of novel mitigation strategies. It begins with a systematic characterization of memory contention in multicore platforms, analyzing how interference arises across different levels of the shared memory hierarchy—including DRAM, caches, and microarchitectural components. Experiments on widely adopted commercial platforms based on similar SoCs, such as Xilinx UltraScale+ and NVIDIA Xavier AGX, reveal how interference at each level impacts performance differently, highlighting the main contention points that limit scalability. Building on these insights, the thesis examines Memory Bandwidth Management Schemes (MBMSs)—software mechanisms designed to mitigate memory interference among parallel compute units. In dynamic mixed-criticality systems, where tasks frequently alternate across units, MBMSs mitigate interference by periodically reconfiguring memory bandwidth thresholds and regulating access accordingly. The effectiveness of an MBMS strongly depends on the granularity of reconfiguration, a factor often overlooked in prior work. This thesis provides an in-depth analysis of the bandwidth reconfiguration process, comparing two common approaches: synchronous and asynchronous. Experiments on a Xilinx UltraScale+ platform show that asynchronous reconfiguration significantly improves system responsiveness over the synchronous method. The analysis is then extended to heterogeneous platforms from NVIDIA, integrating GPUs as primary hardware accelerators. This thesis proposes a novel MBMS to protect CPU clusters from GPU-induced memory interference. The approach leverages CUDA Green Contexts -- a built-in NVIDIA feature -- making it applicable across all supported architectures and enabling fully software-driven, dynamic control of GPU memory bandwidth. Experiments on an NVIDIA Orin AGX platform demonstrate that this method effectively reduces GPU interference on CPU cores. Overall, this thesis advances the understanding of memory interference phenomena in both homogeneous and heterogeneous SoCs and proposes adaptive bandwidth control techniques that enhance performance predictability. The results provide a foundation for designing more efficient and QoS-aware embedded architectures through coordinated hardware-software memory management.
Demystifying Memory Interference: Analysis and Mitigation of the Memory Contention Problem in Heterogeneous Systems-on-Chip
31-mar-2026
VALENTE, Paolo
MARONGIU, ANDREA
File in questo prodotto:
File Dimensione Formato  
Serafini.pdf

Open access

Descrizione: Serafini.Andrea.pdf
Tipologia: Tesi di dottorato
Dimensione 3.23 MB
Formato Adobe PDF
3.23 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1403093
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact