Recent applications in low-power (1-20 mW) near-sensor computing require the adoption of floating-point arithmetic to reconcile high precision results with a wide dynamic range. In this article, we propose a low-power multi-core computing cluster that leverages the fined-grained tunable principles of transprecision computing to provide support to near-sensor applications at a minimum power budget. Our solution - based on the open-source RISC-V architecture - combines parallelization and sub-word vectorization with a dedicated interconnect design capable of sharing floating-point units (FPUs) among the cores. On top of this architecture, we provide a full-fledged software stack support, including a parallel low-level runtime, a compilation toolchain, and a high-level programming model, with the aim to support the development of end-to-end applications. We performed an exhaustive exploration of the design space of the transprecision cluster on a cycle-accurate FPGA emulator, varying the number of cores and FPUs to maximize performance. Orthogonally, we performed a vertical exploration to identify the most efficient solutions in terms of non-functional requirements (operating frequency, power, and area). We conducted an experimental assessment on a set of benchmarks representative of the near-sensor processing domain, complementing the timing results with a post place - route analysis of the power consumption. A comparison with the state-of-the-art shows that our solution outperforms the competitors in energy efficiency, reaching a peak of 97 Gflop/s/W on single-precision scalars and 162 Gflop/s/W on half-precision vectors. Finally, a real-life use case demonstrates the effectiveness of our approach in fulfilling accuracy constraints.

A Low-Power Transprecision Floating-Point Cluster for Efficient Near-Sensor Data Analytics / Montagna, F.; Mach, S.; Benatti, S.; Garofalo, A.; Ottavi, G.; Benini, L.; Rossi, D.; Tagliavini, G.. - In: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS. - ISSN 1045-9219. - 33:5(2022), pp. 1038-1053. [10.1109/TPDS.2021.3101764]

A Low-Power Transprecision Floating-Point Cluster for Efficient Near-Sensor Data Analytics

Benatti S.;
2022

Abstract

Recent applications in low-power (1-20 mW) near-sensor computing require the adoption of floating-point arithmetic to reconcile high precision results with a wide dynamic range. In this article, we propose a low-power multi-core computing cluster that leverages the fined-grained tunable principles of transprecision computing to provide support to near-sensor applications at a minimum power budget. Our solution - based on the open-source RISC-V architecture - combines parallelization and sub-word vectorization with a dedicated interconnect design capable of sharing floating-point units (FPUs) among the cores. On top of this architecture, we provide a full-fledged software stack support, including a parallel low-level runtime, a compilation toolchain, and a high-level programming model, with the aim to support the development of end-to-end applications. We performed an exhaustive exploration of the design space of the transprecision cluster on a cycle-accurate FPGA emulator, varying the number of cores and FPUs to maximize performance. Orthogonally, we performed a vertical exploration to identify the most efficient solutions in terms of non-functional requirements (operating frequency, power, and area). We conducted an experimental assessment on a set of benchmarks representative of the near-sensor processing domain, complementing the timing results with a post place - route analysis of the power consumption. A comparison with the state-of-the-art shows that our solution outperforms the competitors in energy efficiency, reaching a peak of 97 Gflop/s/W on single-precision scalars and 162 Gflop/s/W on half-precision vectors. Finally, a real-life use case demonstrates the effectiveness of our approach in fulfilling accuracy constraints.
2022
33
5
1038
1053
A Low-Power Transprecision Floating-Point Cluster for Efficient Near-Sensor Data Analytics / Montagna, F.; Mach, S.; Benatti, S.; Garofalo, A.; Ottavi, G.; Benini, L.; Rossi, D.; Tagliavini, G.. - In: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS. - ISSN 1045-9219. - 33:5(2022), pp. 1038-1053. [10.1109/TPDS.2021.3101764]
Montagna, F.; Mach, S.; Benatti, S.; Garofalo, A.; Ottavi, G.; Benini, L.; Rossi, D.; Tagliavini, G.
File in questo prodotto:
File Dimensione Formato  
2008.12243.pdf

Open access

Tipologia: Versione dell'autore revisionata e accettata per la pubblicazione
Dimensione 10.91 MB
Formato Adobe PDF
10.91 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1264922
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 12
  • ???jsp.display-item.citation.isi??? 10
social impact