We consider the numerical solution on modern multicore architectures of large-scale optimization problems arising in image restoration. An efficient solution of these optimization problems is important in several areas, such as medical imaging, microscopy and astronomy, where large-scale imaging is a basic task. To face these challenging problems, a lot of effort has been put in designing effective algorithms, that have largely improved the classical optimization strategies usually applied in image processing. Nevertheless, in many large-scale applications also these improved algorithms do not provide the expected reconstruction in a reasonable time. In these cases, the modern multiprocessor architectures represent an important resource for reducing the reconstruction time. Actually, one can consider different possibilities for a parallel computational scenario. One is the use of Graphics Processing Units (GPUs): they were originally designed to perform many simple operations on matrices and vectors with high efficiency and low accuracy (single precision arithmetic), but they have recently seen a huge development of both computational power and accuracy (double precision arithmetic), while still retaining compactness and low price. Another possibility is the use of last-generation multi-core CPUs, where general-purpose, very powerful computational cores are integrated inside the same CPU and a bunch of CPUs can be hosted by the same motherboard, sharing a central memory: they can perform completely dierent and asynchronous tasks, as well as cooperate by suitably distributing the workload of a complex task. Additional opportunities are offered by the more classical clusters of nodes, usually connected in dierent distributed-memory topologies to form large-scale high-performance machines with tens to hundred-thousands of processors. Needless to say, various mix of these architectures (such as clusters of GPUs) are also possible and sold, indeed. It should be noticed, however, that all the mentioned scenarios can exist even in very small-sized and cheap configurations. This is particularly relevant for GPUs: initially targeted at 3D graphics applications, they have been employed in many other scientific computing areas, such as signal and image reconstruction. Recent applications show that in many cases GPU performances are comparable to those of a medium-sized cluster, at a fraction of its cost. Thus, also small laboratories, which cannot afford a cluster, can benet from a substantial reduction of computing time compared to a standard CPU system. Nevertheless, for very large problems, as 3D imaging in confocal microscopy, the size of GPU's on-devices dedicated memory can become a limit to performance. For this reason, the ability to exploit the scalability of clusters by means of standard MPI implementations is still crucial for facing very large-scale applications. Here, we deal with both the GPU and the MPI implementation of an optimization algorithm, called Scaled Gradient Projection (SGP) method, that applies to several imaging problems. GPU versions of this method have been recently evaluated, while an MPI version is presented in this work in the cases of both deblurring and denoising problems. A computational study of the different implementations is reported, to show the enhancements provided by these parallel approaches in solving both 2D and 3D imaging problems.

Optimization methods for digital image restoration on MPP multicore architectures / Cavicchioli, Roberto; Prearo, A; Zanella, R; Zanghirati, G; Zanni, Luca. - STAMPA. - 27:(2012), pp. 93-116.

### Optimization methods for digital image restoration on MPP multicore architectures

#####
*CAVICCHIOLI, ROBERTO;ZANNI, Luca*

##### 2012

#### Abstract

We consider the numerical solution on modern multicore architectures of large-scale optimization problems arising in image restoration. An efficient solution of these optimization problems is important in several areas, such as medical imaging, microscopy and astronomy, where large-scale imaging is a basic task. To face these challenging problems, a lot of effort has been put in designing effective algorithms, that have largely improved the classical optimization strategies usually applied in image processing. Nevertheless, in many large-scale applications also these improved algorithms do not provide the expected reconstruction in a reasonable time. In these cases, the modern multiprocessor architectures represent an important resource for reducing the reconstruction time. Actually, one can consider different possibilities for a parallel computational scenario. One is the use of Graphics Processing Units (GPUs): they were originally designed to perform many simple operations on matrices and vectors with high efficiency and low accuracy (single precision arithmetic), but they have recently seen a huge development of both computational power and accuracy (double precision arithmetic), while still retaining compactness and low price. Another possibility is the use of last-generation multi-core CPUs, where general-purpose, very powerful computational cores are integrated inside the same CPU and a bunch of CPUs can be hosted by the same motherboard, sharing a central memory: they can perform completely dierent and asynchronous tasks, as well as cooperate by suitably distributing the workload of a complex task. Additional opportunities are offered by the more classical clusters of nodes, usually connected in dierent distributed-memory topologies to form large-scale high-performance machines with tens to hundred-thousands of processors. Needless to say, various mix of these architectures (such as clusters of GPUs) are also possible and sold, indeed. It should be noticed, however, that all the mentioned scenarios can exist even in very small-sized and cheap configurations. This is particularly relevant for GPUs: initially targeted at 3D graphics applications, they have been employed in many other scientific computing areas, such as signal and image reconstruction. Recent applications show that in many cases GPU performances are comparable to those of a medium-sized cluster, at a fraction of its cost. Thus, also small laboratories, which cannot afford a cluster, can benet from a substantial reduction of computing time compared to a standard CPU system. Nevertheless, for very large problems, as 3D imaging in confocal microscopy, the size of GPU's on-devices dedicated memory can become a limit to performance. For this reason, the ability to exploit the scalability of clusters by means of standard MPI implementations is still crucial for facing very large-scale applications. Here, we deal with both the GPU and the MPI implementation of an optimization algorithm, called Scaled Gradient Projection (SGP) method, that applies to several imaging problems. GPU versions of this method have been recently evaluated, while an MPI version is presented in this work in the cases of both deblurring and denoising problems. A computational study of the different implementations is reported, to show the enhancements provided by these parallel approaches in solving both 2D and 3D imaging problems.##### Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.

In caso di violazione di copyright, contattare Supporto Iris