In recent years, image processing has been a key application area for mobile and embedded computing platforms. In this context, many-core accelerators are a viable solution to efficiently execute highly parallel kernels. However, architectural constraints impose hard limits on the main memory bandwidth, and push for software techniques which optimize the memory usage of complex multi-kernel applications. In this work, we propose a set of techniques, mainly based on graph analysis and image tiling, targeted to accelerate the execution of image processing applications expressed as standard OpenVX graphs on cluster-based many-core accelerators. We have developed a run-time framework which implements these techniques using a front-end compliant to the OpenVX standard, and based on an OpenCL extension that enables more explicit control and efficient reuse of on-chip memory and greatly reduces the recourse to off-chip memory for storing intermediate results. Experiments performed on the STHORM many-core accelerator demonstrate that our approach leads to massive reduction of time and bandwidth, even when the main memory bandwidth for the accelerator is severely constrained.

Optimizing memory bandwidth exploitation for OpenVX applications on embedded many-core accelerators / Tagliavini, Giuseppe; Haugou, Germain; Marongiu, Andrea; Benini, Luca. - In: JOURNAL OF REAL-TIME IMAGE PROCESSING. - ISSN 1861-8200. - STAMPA. - 15:1(2018), pp. 73-92. [10.1007/s11554-015-0544-0]

Optimizing memory bandwidth exploitation for OpenVX applications on embedded many-core accelerators

MARONGIU, ANDREA;
2018

Abstract

In recent years, image processing has been a key application area for mobile and embedded computing platforms. In this context, many-core accelerators are a viable solution to efficiently execute highly parallel kernels. However, architectural constraints impose hard limits on the main memory bandwidth, and push for software techniques which optimize the memory usage of complex multi-kernel applications. In this work, we propose a set of techniques, mainly based on graph analysis and image tiling, targeted to accelerate the execution of image processing applications expressed as standard OpenVX graphs on cluster-based many-core accelerators. We have developed a run-time framework which implements these techniques using a front-end compliant to the OpenVX standard, and based on an OpenCL extension that enables more explicit control and efficient reuse of on-chip memory and greatly reduces the recourse to off-chip memory for storing intermediate results. Experiments performed on the STHORM many-core accelerator demonstrate that our approach leads to massive reduction of time and bandwidth, even when the main memory bandwidth for the accelerator is severely constrained.
2018
20-nov-2015
15
1
73
92
Optimizing memory bandwidth exploitation for OpenVX applications on embedded many-core accelerators / Tagliavini, Giuseppe; Haugou, Germain; Marongiu, Andrea; Benini, Luca. - In: JOURNAL OF REAL-TIME IMAGE PROCESSING. - ISSN 1861-8200. - STAMPA. - 15:1(2018), pp. 73-92. [10.1007/s11554-015-0544-0]
Tagliavini, Giuseppe; Haugou, Germain; Marongiu, Andrea; Benini, Luca
File in questo prodotto:
File Dimensione Formato  
tagliavini_JRTIP2015.pdf

Accesso riservato

Dimensione 2.94 MB
Formato Adobe PDF
2.94 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1171835
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 12
  • ???jsp.display-item.citation.isi??? 8
social impact