A convergent least-squares regularized blind deconvolution approach

The aim of this work is to present a new and e ﬃ cient optimization method for the solution of blind deconvolution problems with data corrupted by Gaussian noise, which can be reformulated as a constrained minimization problem whose unknowns are the point spread function (PSF) of the acquisition system and the true image. The objective function we consider is the weighted sum of the least-squares ﬁt-to-data discrepancy and possible regularization terms accounting for speciﬁc features to be preserved in both the image and the PSF. The solution of the corresponding minimization problem is addressed by means of a proximal alternating linearized minimization (PALM) algorithm, in which the updating procedure is made up of one step of a gradient projection method along the arc and the choice of the parameter identifying the steplength in the descent direction is performed automatically by exploiting the optimality conditions of the problem. The resulting approach is a particular case of a general scheme whose convergence to stationary points of the constrained minimization problem has been recently proved. The e ﬀ ectiveness of the iterative method is validated in several numerical simulations in image reconstruction problems.


Introduction
Image deconvolution is an extremely prolific field which on one hand finds applications in a large variety of areas (physics, medicine, engineering,...) and on the other hand rounds up the efforts of a large community of mathematicians working on inverse problems and optimization methods. Most of the resulting works deal with the ill-conditioned discrete problem in which the blurring matrix is assumed to be known and the goal is to find a good approximation of the unknown image by means of some regularization approaches [1]. However, in many real applications the blurring matrix is not completely known due to a lack of information on the acquisition model and/or to external agents which corrupt the measured image (atmospheric turbulence, thermal blooming,...). This situation is known as blind deconvolution and most strategies to approach this problem are based on a simultaneous recovery of both the image approximation and the point spread function (PSF) of the acquisition system. Blind deconvolution is a very actual field and a much more challenging problem than the image deconvolution one, due to the strongly ill-posedness caused by the non-uniqueness of the solution. A review of some recent results can be found e.g. in two recent papers by Almeida & Figuereido and Oliveira et al. [2,3], even if many other different approaches have been developed. If the noise corrupting the measured data is assumed to have a Gaussian nature, blind deconvolution can be addressed by means of the constrained minimization of the squared Euclidean norm of the residuals plus some regularization terms suitably chosen according to the object and PSF to be reconstructed. An alternative formulation involves the unconstrained minimization of the same objective function with the addition of the indicator functions of the feasible sets, which can be numerically solved with forward-backward splitting methods [4,5]. The starting point of this paper is a proximal alternating method recently proposed by Bolte et al. [6] for a more general class of nonconvex and nonsmooth minimization problems, in which the parameters defining the method are fixed by using the Lipschitz constants of the partial gradients of the least-squares plus regularization part of the objective function. Convergence of the resulting sequence to a stationary point is ensured by the Kurdyka-Łojasiewicz property [7,8,9]. In particular, for the specific case of blind deconvolution, we extend the convergence results proved in [6] to a wider range of parameters, which allows the corresponding scheme to converge much faster toward the limit point. Moreover, we introduce a practical adaptive choice of the parameters based on a measure of the optimality condition violation, and we test the proposed algorithm in some simulated numerical experiments. The paper is organized as follows: in Section 2 we introduce the blind deconvolution from Gaussian data and we recall the specific formulation of the proximal alternating linearized minimization proposed in [6] for this problem. Our proposed extension of the scheme is described in Section 3, together with the analogous convergence results and some hints for the choice of the parameters. Section 4 is devoted to some numerical experiments on synthetic datasets, while our conclusions are given in Section 5.

Problem setting
When dealing with Gaussian noise, blind deconvolution can be modeled as the minimization problem min x∈X,h∈H where X ⊆ R n and H ⊆ R m are the nonempty, closed and convex feasible sets of the unknown image x and PSF h, respectively, * denotes the convolution operator, g is the measured image, R 1 , R 2 are differentiable regularization terms and λ 1 , λ 2 are positive regularization parameters. As usual, the computation of the convolution product is performed by means of a matrix-vector multiplication h * x = Hx = Xh, where X and H are suitable structured matrices depending on the choice of the boundary conditions [10]. Examples of regularization terms frequently used in the applications are the following: • R T0 (z) = z 2 (Tikhonov regularization of order 0); • R T1 (z) = ∇z 2 (Tikhonov regularization of order 1); • R HS (z) = i (∇z) i 2 + β 2 (hypersurface regularization), where (∇z) i is the 2-vector representing the discretization of the gradient of z at pixel i in the horizontal and vertical directions.
As concerns the feasible sets, we consider non-negative images and non-negative and normalized PSFs, therefore we assume If we denote with I X and I H the indicator functions of X and H, respectively, then problem (1) can be rewritten as Problem (2) can be addressed by means of recently proposed optimization methods, provided that the function F and its gradient satisfy some properties that we recall in the following definition.

Definition 1
We define the set F of functions F : R n × R m −→ R such that: for any x ∈ R n and h ∈ R m , the partial gradients ∇ x F(·, h) and ∇ h F(x, ·) are globally Lipschitz continuous, (iii) the gradient ∇F is locally Lipschitz continuous, i.e., for each bounded subset B of R n × R m , there exists The function F defined in problem (1), when R 1 , R 2 are the Tikhonov or HS regularizations, belongs to the set F . Indeed, in this case F is readily lower bounded. Moreover, for Tikhonov regularization of order 0 we have thus giving L x (h) = H 2 + 2λ 1 (and in a similar way L h (x) = X 2 + 2λ 2 ). As for Tikhonov regularization of order 1 and HS, we start by defining the 2d × d matrix (where d = n for the image and d = m for the PSF) where D i is the 2 × d matrix defined such that D i z is the forward difference approximation of the gradient at the i-th component of a vector z ∈ R d . Each row of D i (i = 1, . . . , d) contains two nonzero elements and it results that D 2 ≤ 8 [11]. The regularization operators and their gradients can be rewritten by means of D: It follows that for Tikhonov regularization of order 1 we have while for HS regularization [12] Finally, the last part of condition (ii) and condition (iii) straightly follow from the fact that F is a C 2 function (see Remark 3(iv) of [6]). Under these considerations, the proximal version being c k , d k positive real numbers, of the usual Gauss-Seidel scheme can be exploited for the solution of (2) [13,5,14]. In particular, in a recent paper [6] the authors proposed a proximal alternating linearized minimization (PALM) algorithm which, starting from a given initial point (x (0) , h (0) ) ∈ X ×H, generates a sequence (x (k+1) , h (k+1) ) (k = 0, 1, . . .) in the following way: 3 We recall that the proximal operator exploited in (ii) and (iv) is, in general, defined as for a given proper and lower semicontinuous function σ : R d −→ (−∞, +∞], θ > 0 and z ∈ R d . Since in our case X, H are convex, the proximal operators associated to I X and I H reduce to the projections P X and P H on the sets X and H, respectively. The inverse of the steplength parameters c k , d k must be chosen in order to satisfy the sufficient decrease properties [6] F which, in particular, can be surely reached by choosing [13]. These conditions combined with the requirement that F satisfies the Kurdyka-Łojasiewicz (KL) property [7,8,9] allow to demonstrate that the sequence generated by the PALM method converges to a critical point.
Moreover, we denote with ∂ f (z) the subdifferential of f at z ∈ R d (see [15,Definition 2.1]) and with dist(z, Ω) the distance between a point z and a set Ω ⊂ R d . The function f is said to have the KL property at z ∈ dom∂ f : , a neighborhood U of z and a continuous concave function φ : [0, υ) −→ (0, +∞) such that:

If f satisfies the KL property at each point of dom∂ f then f is called a KL function.
The KL condition concerns a large variety of functions which we do not consider in our work [15,6,16]. We only recall that any p-norm is a KL function (see e.g. [6, Section 5] and [16, Section 2.2]), which includes all the possible objective functions we consider in our blind deconvolution framework. Unfortunately, parameters c k , d k greater than or equal to the Lipschitz constants often lead to very small steps, with a result of a very slow convergence of the related sequence toward the critical point itself. In the next sections we extend the convergence result of the PALM method to parameters c k , d k which satisfy (4) and (5) but are smaller than the Lipschitz constants of the partial gradients (thus allowing larger steplengths) and we propose a linesearch strategy for their automatic selection. 4

Parameters selection
The rule we choose to update the constants c k , d k is the one used in [17] in the context of non-negative matrix factorization and is based on the optimality conditions for each subproblem. More in details, if we define the projected gradient of the function F as then a point (x * , h * ) is stationary for problem (2) if and only if we have Thus, the quantities h) can be considered as a measure of the optimality of the point (x, h) and exploited to estimate the steplength parameters at each iteration. Inspired by [17], we start from c k = L x (h (k) ), d k = L h (x (k+1) ) and we multiply them for a factor δ < 1 until conditions (4) and (5) are violated or where the adaptive tolerances η (k) and are updated as The idea of this first step is that the Lipschitz constants often provide parameters which largely satisfy conditions (4) and (5), and therefore these parameters are reduced by looking at the optimality conditions until a sort of neighborhood of the "optimal" values is found. If the reduction step leads to a violation of (4) and (5) before satisfying (7), the parameters c k , d k are slightly increased by a factor µ > 1 until the former inequalities return to hold true. Algorithm 1 gives the scheme of the resulting modified PALM method, which reduces to the standard PALM scheme if one neglects Steps 3, 4, 7 and 8. A graphical scheme of the proposed rule to select the c k , d k parameters is shown in Figure 1. The iterative procedure is stopped when one of the following conditions has been satisfied: • norm of the projected gradient: • maximum number of iteration: k + 1 ≥ K. (4) and par = 1 then Check for optimality condition on (4) and par = 2 then set c k = µγ x L x (h (k) ) and go to step 2 end if

8.
Set γ h = 1 and define η (k+1) x according to (8). end for values satisfying the sufficient decrease property µδ 2 L µ 2 δ 2 L Figure 1: Example of procedure to determine the inverse of the steplength parameters. Here L denotes the initial guess while ℓ represents the smaller parameter satisfying either (4) or (5).

Convergence properties
In this section we show the convergence property of the sequence generated by the modified PALM method to a critical point. The propositions and lemmas reported retrace the path followed in [6], suitable restricted to the considered blind deconvolution problem and extended to account for parameters c k ∈ [c min , L x (h (k) )], d k ∈ [d min , L h (x (k+1) )], being c min , d min predetermined positive constants. , h)).

Lemma 2 (Sufficient decrease property)
where u + is the projection of the vector u − 1 t ∇ f (u) onto C: Then there exists t 1 > t, A > 0 such that and a sufficient decrease in the value of f (u + ) is ensured: Proof. First, we notice that it's always possible to find a parameter t as in (10) since the requested inequality for t is satisfied, at least, by selecting t = L f . From the definition of the projection onto a closed, convex and nonempty set C, the vector u + is characterized as and in particular, by taking v = u, we obtain By writing the norm as scalar products one obtains the relation which implies (11) by choosing e.g. t 1 = 2 u + − u, ∇ f (u) / u + − u 2 . This condition combined with (10) leads to (12).
For sake of simplicity, in the following we will use the notation z (k) = (x (k) , h (k) ) (k ∈ N).

Lemma 3 (Convergence properties)
Suppose that F ∈ F and let {z (k) } k∈N be a sequence generated by the modified PALM method. The following assertions hold true: -the sequence {F(z (k) )} k∈N is nonincreasing and

Proof. From condition (ii) of Definition 1, the function x → F(x, h) (for h fixed) is differentiable and has a L x (h)-Lipschitz continuous gradient. Suppose that x (k+1) is computed as in
Step 2 of Algorithm 1, with the parameter c k satisfying condition (4). From Lemma 2 with f (·) := F(·, h (k) ) there exists c 1 k > c k such that In a similar way, one can prove that there exists d 1 k > d k such that Adding the above two inequalities, we obtain for all k ≥ 0 that It follows that {F(z (k) )} k∈N is a nonincreasing sequence bounded from below, and therefore it converges to some real numberF. Moreover, by choosing ρ = min{(c 1 and the first assertion is proved. Let N be a positive integer. Summing (13) from k = 0 to N − 1 we also get By taking the limit as N → +∞, the second part of the lemma is established.

Lemma 4
Suppose that F ∈ F and let {z (k) } k∈N be a bounded sequence generated by the modified PALM method. For each positive integer k, define (k) ) and there exists ζ > 0 such that for all k ≥ 1.
Proof. Let k be a positive integer. From Step 2 of Algorithm 1 and recalling that the projection onto the set X can be seen as the proximal operator of the indicator function I X , we have Writing down the optimality condition yields In a similar way, Step 6 of Algorithm 1 leads to with v (k) ∈ ∂I H (h (k) ). Since from Proposition 1 we have ). Let us prove now the second part of the lemma. Since ∇F is locally Lipschitz continuous (see condition (iii) of Definition 1) and we assumed that {z (k) } k∈N is bounded, there exists M > 0 such that By definition, the constant c k−1 is less or equal to L x (h (k−1) ) and, from condition (ii) of Definition 1, the Lipschitz constants {L x (h (k−1) )} k∈N are bounded from above by a fixed value α x . As a result of this, we have On the other hand, from the Lipschitz continuity of ∇ h F(x, ·) and the values range for d k , we have Since {L h (x (k) )} k∈N is bounded from above by a constant α h (condition (ii) of Definition 1), the following inequality holds If we sum up the previous inequalities and define τ = max(α x , α h ), we obtain which leads to the thesis by choosing ζ = (2M + 3τ).
Let {z (k) } k∈N be a sequence generated by the modified PALM from a starting point z (0) ∈ X × H, and let Ω(z (0) ) be the set of all its limit points, i.e., Moreover, we denote with crit( f ) the set of critical points of a function f . The following lemma and the convergence property in Theorem 1 can be demonstrated by following the same proofs in [6] and by exploiting the results previously shown in this section.

Lemma 5 Suppose that F ∈ F and let {z (k) } k∈N be a bounded sequence generated by the modified PALM method. Then:
Algorithm 2 Gradient projection alternating minimization (GPAM) method Choose the starting point (x (0) , h (0) ) ∈ X × H and the upper bounds N x , N h ≥ 1 for the inner iterations numbers.
starting from the point x (k) .

2.
Compute starting from the point h (k) . end for

Alternating proximal gradient
As a further comparison, we tested also the APG method proposed in [16], whose scheme retraces the PALM method provided that the two projections of Steps 2 and 6 of Algorithm 1 are substituted with ) , being two suitable extrapolated points. Convergence of the sequence generated by the APG algorithm to a critical point of Ψ is again ensured by the KL property and the fact that F is a block-multiconvex function. As in the PALM case, we tried to exploit our backtracking strategy for the choice of c k , d k also to this method but without obtaining significant improvements. As a result of this, in our numerical tests we will use the Lipschitz constant of the partial gradients as in the PALM method. As concerns the extrapolation weights, we defined where [25,26] t 0 = 1, t k = 1 The stopping criterion is again the one used for the other approaches.

Results
All the described methods have been tested in the following three simulated problems: 1. the modified Shepp-Logan phantom test image from the Matlab Image Processing toolbox, artificially blurred with a Gaussian PSF with standard deviation equal to 4. The regularization terms we chose in this case are R HS for the image and R T0 for the PSF; 2. the satellite image frequently used in several papers on image deconvolution (see e.g. [27,28,29]), artificially blurred with an out-of-focus PSF with radius equal to 4. The regularization term we chose in this case is R HS for both the image and the PSF; 3. the Hubble Space Telescope (HST) image of the crab nebula NGC 19521, artificially blurred with an Airy function [30] mimicking the ideal acquisition of one mirror of the Large Binocular Telescope (LBT -http://www.lbto.org). The regularization term we chose in this case is R T0 for both the image and the PSF.
The three blurred images have been corrupted with 5% Gaussian noise and the resulting data are shown in Figure 2 together with the true images x * and PSFs h * . All the images and the PSFs are sized 256 × 256. The initialization x (0) for the image has been set equal to the datum g for all the datasets, while for the PSF we used as h (0) a Gaussian function with standard deviation equal to 5.5 for the phantom dataset, an out-of-focus PSF with radius equal to 5.5 for the satellite dataset and a Lorentzian function with half-width at half-maximum equal to 6 for the crab nebula dataset. The parameters for the tuning of c k , d k and the thresholds defining the stopping criteria have been chosen as follows: δ = 0.1, µ = 1.5, c min = d min = 10 −10 , ε 1 = 10 −14 , ε 2 = 10 −6 , K = 1000. Finally, for each problem we manually selected the regularization parameters λ 1 , λ 2 which provided in average the best reconstructions. The numerical experiments have been performed by means of routines implemented by ourselves in Matlab R2014a and run on a PC equipped with a 1.80 GHz Intel(R) Core(TM) i7-4500U in a Windows 8 environment.
In Table 1 we reported the relative Euclidean reconstruction errors on image and PSF obtained by the GPAM, AGP, PALM and modified PALM methods, together with the number of iterations k performed to satisfy the stopping criterion and the computational time in seconds. For sake of completeness, we also added the errors (14) between the initial (k = 0) and the true images and PSFs, and the best error achieved in a standard deconvolution framework by using the true and the initial PSF (the first value should represent the best error achievable by a blind deconvolution approach, while the second value should be a reference value that a blind deconvolution method has to improve to justify the heaviest computations of the blind scheme). The optimization algorithm we used to compute the nonblind reconstructions is the scaled gradient projection (SGP) method, proposed by Bonettini et al. [18] and applied in several image denoising and deblurring problems [31,32,33]. Moreover, in Figure 3 we plot the errors behavior as a function of the iteration numbers, while in Figure 4 the deconvolved images for the satellite dataset are shown. Table 1: Results obtained with the considered algorithms in the three simulated datasets. For each problem, we show the errors on image (err x ) and PSF (err h ), the number of iterations required to satisfy the stopping criterion (iter) and the execution time in seconds (sec). Moreover, we also report the errors between true and initial images (e x (0) ) and PSFs (e h (0) ), together with the reconstruction errors on the image obtained in a standard deconvolution framework by using the true (S t ) or the initial (S 0 ) PSFs.  • the backtracking procedure in the choice of the steplengths strongly improves the convergence of the PALM method, with the result that good reconstructions can be achieved in a computational time which is comparable to those of the GPAM and AGP approaches. Here the GPAM method provides higher errors in a lower time, but this is mainly due to the stopping criteria we adopted, since for this method the graphs of the errors are still decreasing (see Figure 3). In further tests with different thresholds ε 1 , ε 2 we verified that both the errors and the computational times of GPAM become closer to those of mPALM; • the reconstruction errors achieved by the blind schemes are very close to the optimal ones obtained in a standard deconvolution framework with the true PSF, and significantly lower than the analogous ones provided by choosing an approximation of the PSF (in our case, h (0) ), thus confirming the potential benefit of the blind approach; • the lower number of iterations observed in the mPALM case with respect to the PALM and AGP methods, for which the convergence to a critical point has been proved, gives hope that a further improvement of the backtracking procedure which requires fewer steps (i.e., fewer fast Fourier transform computations) can lead to a significant reduction of the whole computational effort. We tried with a kind of "hot start" of the 13 parameters γ x , γ h by using as initializations the values obtained at the previous iteration, as done e.g. in [34,35], but we did not found any significant improvement. Further tests with means over a limited number of previous values and/or different (and possibly adaptive) factors δ, µ will be investigated in a future work.

Conclusions
In this paper we address the blind deconvolution problem with images perturbed by Gaussian noise and we consider a recent approach for a more general class of nonconvex and nonsmooth minimization problems whose sequence of iterates converges to a critical point. Our work has been to exploit the same arguments which led to the convergence proof of the method and extend it to the case of wider ranges of parameters, provided that the regularization terms used in the formulation of the blind deconvolution problems satisfy certain properties. The possibility to select steplength larger that the inverse of the Lipschitz constants of the partial gradients allows a faster convergence of the method to the critical point, and we proposed a backtracking procedure to automatically select these parameters based on the optimality conditions of the problem. Some numerical experiments with different images, PSFs and combinations of regularization terms showed that the new variation of the method strongly reduces the number of iterations required by the PALM algorithm, and provides comparable results with respect to other state-of-the-art methods, leaving also space for further improvements. A future development of this work will be the application of the proposed approach on real image deconvolution problems in astronomy, as the reconstruction of X-ray emission from a solar flare [23,36], in which a more detailed reconstructed image can strongly help in detecting some crucial features of the phenomenon under investigation (see e.g. [37,38]).