A comparison of edge-preserving approaches for differential interference contrast microscopy

In this paper we address the problem of estimating the phase from color images acquired with differential-interference-contrast microscopy. In particular, we consider the nonlinear and nonconvex optimization problem obtained by regularizing a least-squares-like discrepancy term with an edge-preserving functional, given by either the hypersurface potential or the total variation one. We investigate the analytical properties of the resulting objective functions, proving the existence of minimum points, and we propose effective optimization tools able to obtain in both the smooth and the nonsmooth case accurate reconstructions with a reduced computational demand.


Introduction
Since their invention, microscopes have been a powerful tool in a variety of disciplines such as biology, medicine and the study of materials. In particular, the branch of optical microscopy (also referred as light microscopy) has been successfully applied in biomedical sciences and cell biology in order to study detailed structures and understand their function in biological specimens. The optical microscope uses visible light for illuminating the object and contains lenses that magnify the image of the object and focus the light on the retina of the observer's eye [1]. Optical microscopy includes several techniques, such as bright-field, dark-field, phase contrast, differential interference contrast (DIC), fluorescence and confocal microscopy. We refer to the work of Wilson and Bacic [2] for a comparison of the advantages and limitations of these techniques.
The technique of interest in this paper is DIC microscopy, designed by Allen, David and Nomarski [3] to overcome the inability to image unstained transparent biological specimens, which is typical of bright-field microscopes, while avoiding at the same time the halo artifacts of other techniques designed for the same purpose, such as phase contrast. DIC microscopes are able to provide contrast to images by exploiting the phase shifts in light induced by the transparent specimens (also called phase objects) while passing through them. This phenomenon is not detected by the human eye, neither by an automatic visual system, and occurs because of the interaction of light with different refractive indexes of both the specimen and its surrounding medium. In DIC microscopy, such phase shifts are converted into artificial black and white shadows in the image, which correspond to changes in the spatial gradient of the specimen's optical path length. Furthermore, this technique has been widely recognized by its possibility to use full numerical apertures in the objective, which results in high contrast images at high lateral resolution.
One disadvantage of DIC microscopy is that the observed images cannot be easily used for topographical and morphological interpretation, because the changes in phase of the light are hidden in the intensity image. It is then of vital importance to recover the specimen's phase function from the observed DIC images. The problem of phase estimation in optical imaging has been widely studied, as shown in the review made in [4]. Previous work for reconstructing the DIC phase function has been done by Munster et al [5], who retrieved the phase information by deconvolution with a Wiener filter; line integration of DIC images is proposed by Kam in [6], supposing that the line integration along the shear angle yields a positive definite image, which is not always the case since the intensity image is a nonlinear relation between the transmission function of the specimen and the point spread function of the microscope. Kou et al [7] introduced the use of transport of intensity equation to retrieve the phase function; Bostan et al [8] also used this approach, including a total variation regularization term to preserve the phase transitions. Finally, in the work of Preza [9][10][11][12], the phase estimation in DIC microscopy has been addressed by considering the minimization of a Tikhonov regularized discrepancy term, which is performed by means of a modified nonlinear conjugate gradient (CG) method.
In this work, we are interested in reconstructing the phase by minimization of a penalized least-squares (LS) term as proposed in [11], suitably generalized in order to extend the one color acquisition to polychromatic ones. Instead of a first order Tikhonov regularization, which tends to recover oversmoothed images, we consider two different penalties, the first one being the total variation (TV) functional which is suitable for piecewise constant images, while the second is the hypersurface (HS) potential [13], which is a smooth generalization of the TV able to reconstruct both sharp and smooth variations of the unknown phase. Since the latter choice leads to the minimization of a smooth functional, we consider a limited memory gradient method, in which suitable adaptive steplength parameters are chosen to improve the convergence rate of the algorithm. As concerns the TV-based model, we address the minimization problem by means of a recently proposed linesearch-based forward-backward method able to handle the nonsmoothness of the TV functional [14].
We performed different numerical tests on synthetic realistic images and we compared the proposed methods with both the original CG method proposed in [11], exploiting a gradientfree linesearch for the computation of the steplength parameter, and standard CG approaches.
The results we obtained show that the performance of the limited memory gradient method in minimizing the LS + HS functional is much better than those of the CG approaches in terms of the number of function/gradient evaluations and, therefore, computational time. Moreover, despite the difficulties due to the presence of a nondifferentiable term, also the linesearchbased forward-backward method proposed in the case of the TV functional is able to provide reconstructed images with a computational cost comparable to that of the gradient methods, thus leaving to a potential user freedom to choose the desired regularizer without losing in efficiency.
The organization of the paper is as follows. In section 2, the DIC system for transmitted coherent light is described, together with the corresponding polychromatic image formation model. Furthermore the nonlinear inverse problem of the phase reconstruction and its corre sponding optimization problem are presented, proving some analytical properties of the objective function, such as the existence of minimum points. In section 3 the iterative optimization algorithms designed to address the phase reconstruction problem are detailed. In section 4 numerical simulations on synthetic images are presented in order to evaluate efficiency and robustness of the considered approaches. Conclusions and perspectives are included in section 5.

The DIC system
DIC microscopy works under the principle of dual beam interference of polarized light, as depicted in figure 1. Coherent light coming from a source is passed through a polarizer lens. Every incident ray of polarized light is split by a Nomarski prism placed at the front focal plane of the condenser. This splitting produces two wave components-ordinary and extraordinary-such that their corresponding electromagnetic fields are orthogonal and separated at a fixed shear distance 2∆x along a specific shear direction, whose angle τ k formed with the x-axis is the denominated shear angle. The specimen is sampled by the pair of waves; if they pass through a region where there is a gradient in the refractive index, the waves will be differ entially shifted in phase. After this, they will reach a second Normarski prism placed at the back focal plane of the objective lens. This prism introduces an additional phase shift, called the bias retardation and indicated with 2∆θ, which helps to improve the contrast of the observed image and to give the shadow-cast effect characteristic of DIC images (see figure 2). The interference of the two sheared and phase shifted waves occurs inside this prism and, thus, the two waves are recombined into a single beam that goes through a second polarizer lens called the analyzer. Further details on the DIC working principle can be found in the work of Murphy [15] and Mehta et al [16].
The observed images will have a uniform gray background on regions where there are no changes in the optical path, whereas they will have dark shadows and bright highlights where there are phase gradients in the direction of shear, having a 3D relief-like appearance (see figure 2). It is important to note that the shadows and highlights indicate the signs and slope of phase gradients in the specimen, and do not necessarily indicate high or low spots [3].
In this paper we consider the polychromatic rotational diversity model [17], which is an extension of the model presented in [11] to color image acquisition. In this model K RGB color images are acquired by rotating the specimen K times with respect to the shear axis, which results in K rotations of the amplitude point spread function. Typically K equals 2 and the difference between the two angles is π/2. Actually, for a given shear angle τ k , the acquired image k is related to the directional derivative of the object along the direction τ k   (1) and setting the shear to 2∆x = 0.6 µm, the bias to 2∆θ = π/2 rad and the shear angle to τ = π/4 rad.  [10]. Then the 2D image can be reconstructed from two orthogonal directional derivatives [11]. In this configuration, the relation between the acquired images and the unknown true phase φ is given by • k is the index of the angles τ k that the shear direction makes with the horizontal axis [11], is the index denoting one of the three RGB channels and j = ( j 1 , j 2 ) is a 2D index varying in the set χ = {1, . . . , M} × {1, . . . , P}, M and P meaning the size of the acquired image, which is determined by the resolution of the CCD detector of the microscope, with typical value of 1388 × 1040 pixels; • λ is the th illumination wavelength. The object is illuminated with white light, whose wavelengths range from 400-700 nm. The digital acquisition system of the microscope comprises a color bandpass optical filter which isolates the RGB wavelengths, acquired separately by the CCD detector [15]. Since it is selected a narrow band for each color, we use the mean wavelength λ at each band.
is the unknown phase vector and e −iφ/λ ∈ C MP stands for the vector defined by (e −iφ/λ ) j = e −iφj/λ ; • h k,λ ∈ C MP is the discretization of the continuous DIC point spread function [10,18] corresponding to the illumination wavelength λ and rotated by the angle τ k , i.e.
where p λ (x, y) is the coherent PSF of the microscope's objective lens for the wavelength λ , which is given by the inverse Fourier transform of the disk support function of amplitude 1 and radius equal to the cutoff frequency f c = NA/λ [10], being NA the numerical aperture of the objective lens, and R k is the rotation matrix which rotates the coordinates according to the shear angle τ k ; • h 1 ⊗ h 2 denotes the 2D convolution between the two M × P images h 1 , h 2 , extended with periodic boundary conditions; • η k,λ ∈ R MP is the noise corrupting the data, which is a realization of a Gaussian random vector with mean 0 ∈ R MP and covariance matrix σ 2 I (MP) 2 , where I (MP) 2 is the identity matrix of size (MP) 2 ; • a 1 ∈ R is a constant which corresponds to closing the condenser aperture down to a single point.

Least-squares data fidelity
The phase reconstruction problem consists in finding an approximation of the unknown phase vector φ from the observed RGB images o 1 , . . . , o K . Let us first address this problem by means of the maximum likelihood (ML) approach. Since the 3K images o k,λ are corrupted by Gaussian noise, then the negative log likelihood of each image is a least-squares measure, which is nonlinear due to the presence of the exponential and the squared modulus in (1).
In the case of white Gaussian noise, statistically independent of the data, the negative log likelihood of the problem is the sum of the negative log likelihoods of the different images, namely the following fit-to-data term Then the ML approach to the phase reconstruction inverse problem consists in the minimization of the function in (2): In the next result, we collect some properties of J 0 that will be useful hereafter.
If the thesis holds for J ,k,j , then it holds also for J 0 . We have (ii) By applying the triangular inequality and the fact that |e −iφj/λ | = 1, j ∈ χ, we have (iii) If J ,k,j is an analytic function on R MP , then J 0 is given by sums and compositions of analytic functions and thus it is itself analytic [19, propositions 1.6.2 and 1.6.7]. Hence we focus on J ,k,j . Since (h k,λ ) r ∈ C, it can be expressed in its trigonometric form Then we can rewrite J ,k,j as follows and the thesis follows from the analyticity of the functions sin(·), cos(·) and (·) 2 .

S Rebegoldi et al Inverse Problems 33 (2017) 085009
(iv) It is easy to see that J 0 is the sum of three periodic functions of variable φ j whose periods are 2πλ 1 , 2πλ 2 and 2πλ 3 , respectively. By recalling that the sum of two periodic functions is periodic if the ratio of the periods is a rational number, we can conclude that J 0 is periodic.

Introducing regularization
When the assumption of rationality on the wavelengths ratios holds, using points (iii) and (iv) of lemma 1, it is easy to see that the solution to problem (3) exists. However, points (i) and (iv) imply that the solution is not unique and may be determined only up to an unknown real constant or to multiples of the period T w.r.t. any variable φ j . Furthermore, J 0 is a nonconvex function of the phase φ, thus it may admit several local minima as well as saddle points. In the light of these considerations, we can conclude that (3) is a severely ill-posed problem, which requires regularization in order to impose some a priori knowledge on the unknown phase. In particular, we propose to solve the following regularized optimization problem where J 0 is the LS distance defined in (2) and J R is a regularization functional. In particular, we consider HS potential defined as [20,21] where µ > 0 is a regularization parameter, the discrete gradient operator D : and the additional parameter δ 0 plays the role of a threshold for the gradient of the phase. The choice of this kind of regularization term instead of the first order Tikhonov one used e.g. in [11,12] lies in the capability of the HS regularizer to behave both as a Tikhonov-like regularization in regions where the gradient assumes small values (w.r.t. δ), and as an edgepreserving regularizer in regions where the gradient is very large, as it happens in the neighborhood of jumps in the values of the phase. Moreover, we will also consider the case in which the regularization term is given by the standard TV functional [22], which is defined exactly as in (5) by setting δ = 0. We remark that, even if the HS term can be seen as a smoothed version of the TV one, their effect is quite different on the recovered images since, e.g. reconstructions provided by HS can be free of cartoon effects, typical of TV regularization. Furthermore, one should be careful to adopt appropriate minimization algorithms in both cases, since the use of a method designed for smooth optimization to minimize the LS + HS functional with a very small δ typically leads to severe numerical instability problems. Problem (4) is still a difficult nonconvex optimization problem and, when δ = 0, it is also nondifferentiable. Some properties of the objective function J are now reported. (4). Then: Proof.
(i) It follows from the relation We prove that also ∇J 0 is Lipschitz continuous.
− o k,λ and fix s ∈ χ, the partial derivative of J 0 and the entries of the Hessian matrix ∇ 2 J 0 are given by where and Re(·), Im(·) denote the real and imaginary parts of a complex number.
and therefore Point (ii) of lemma 2 shows that an estimate of the Lipschitz constant of ∇J 0 and, in general, ∇J can be computed. The estimate derived in (8) is far from being sharp, but this will not affect the behaviour of the algorithms described in the next section since, unlike many existing proximal gradient or forward-backward methods which exploits the value of the Lipschitz constant of ∇J 0 , the methods we propose do not need it explicitly.
Point (i) of lemma 2 makes clear that, if a solution to problem (4) exists, then it is not unique and it can be determined only up to a real constant. This is a common feature shared with the unregularized problem (3). However, since the objective function J is not periodic and, in addition, none of the two terms J 0 and J R are coercive, we can not prove the existence of a minimum point of J neither by continuity, as can be done for the function J 0 when the wavelengths ratios are rational, nor by coercivity. A specific proof of existence of the solution for problem (4) is now presented.
Thanks to part (i) of lemma 2, for any Hence we restrict the search of the minimum point on Π and we denote with J| Π the restriction of J to Π. Since S = arg min φ∈R MP J R (φ) and Π intersects S only in φ S , J R is a convex function with a unique minimum point on Π, which implies that J R is coercive on Π (see [23, proposition 3.2.5, definition 3.2.6]). Furthermore, from point (ii) of lemma 1, we know that J 0 is a bounded function on Π. Then J| Π is the sum of a coercive term and a bounded one, therefore it is itself coercive. This allows to conclude that J admits a minimum point on Π and thus also on R MP . The second part of the thesis follows from lemma 2, part (i). □ Note that the above proof of existence holds also for the regularized DIC problem proposed in [11,12], in which the first order Tikhonov regularizer used instead of the TV functional is also noncoercive.

Optimization methods
In previous works [9,11,12], the problem of DIC phase reconstruction had been addressed with a nonlinear conjugate gradient method [24]. However, as we will see in section 4, these methods require in practice several evaluations of the objective function and possibly its gradient in order to compute the linesearch parameter. What we propose instead is to tackle problem (4) with a gradient descent algorithm in the differentiable case (δ > 0) and a proximal gradient method in the nondifferentiable case (δ = 0). The key ingredients of both methods are the use of an Armijo linesearch at each iteration, which ensures convergence to a stationary point of problem (4), and a clever adaptive choice of the steplength in order to improve the speed of convergence.
For sake of simplicity, from now on each monochromatic image is treated as a vector in R N (being N = MP) obtained by a lexicographic reordering of its pixels.

The limited memory steepest descent method
We consider first the HS regularizer. In this case the objective function is differentiable and we exploit the limited memory steepest descent (LMSD) method proposed by Fletcher [25] and outlined in algorithm 1. The LMSD method is a standard gradient method equipped with a monotone Armijo linesearch and variable steplengths approximating the inverse of some eigenvalues of the Hessian matrix ∇ 2 J(φ (n) ) in order to improve the convergence speed. Unlike the classical Barzilai-Borwein (BB) rules [26] and their generalizations (see e.g. [27][28][29]) which try to approximate (∇ 2 J(φ (n) )) −1 with a constant diagonal matrix, the idea proposed by Fletcher for quadratic objective functions is based on a Lanczos iterative process applied to approximate some eigenvalues of the Hessian matrix of the objective function. Some algebra shows that this can be practically performed without the explicit knowledge of the Hessian itself but exploiting only a set of back gradients and steplengths (see steps 6-10 of algorithm 1). Generalization to nonquadratic functions can be obtained by computing the eigenvalues of the matrix Φ in step 10 instead of Φ (we remark that for quadratic J the two matrices coincide).
Some practical issues have to be addressed in the implementation of algorithm 1: • the first loop (step 1-5) builds a matrix The initial values for the first m steplengths can be provided by the user (e.g. by computing the BB ones) or can be chosen with the same approach described in steps 6-10 but with smaller matrices. For example, one can fix α (0) 0 , compute G = ∇J(φ (0) ) and use steps 6-10 to compute α • If G T G in step 7 is not positive definite, then the oldest gradient of G is discarded and a new matrix G T G is computed. This step is repeated until G T G becomes positive definite. • The stopping criterion can be chosen by the user and be related to the decrease of the objective function J or its gradient norm, or to the distance between two successive iterates.
Concerning the computational costs of LMSD, the heaviest tasks at each iteration are the computation of ∇J(φ (n) ) at step 1 and J(φ (n) − α n ∇J(φ (n) )) at step 2. Considering step 1, we focus on ∇J 0 . As it is written in (6), due to the product between e −iφs/λ and (h k,λ ) j−s , ∇J 0 can be performed with O(N 2 ) complexity; this is how the gradient is computed in [11]. However, if we take the sum over j of the residuals into the argument of Im(·), then we can conveniently rewrite (6) as where h 1 . * h 2 denotes the componentwise product between two images h 1 , h 2 and (h k,λ ) j = (h k,λ ) −j for all j ∈ χ. Then the heaviest operations in (11) are the two convolutions which, thanks to the assumption of periodic boundary conditions, can be performed with an FFT/IFFT pair (O(N log N) complexity). Hence, since ∇J R has O(N) complexity, we can conclude that step 1 has an overall complexity of O (N log N). Similarly, the function at step 2 is computed with complexity O (N log N), due to the presence of one convolution inside the triple sum in (2). From a practical point of view, we have already shown that the LMSD method is an effective tool for DIC imaging, especially if compared to more standard gradient methods equipped with the BB rules [17]. From a mathematical point of view, one can prove, in the same way as in [30], that every limit point of the sequence generated by algorithm 1 is a stationary point for problem (4). In addition, the convergence of algorithm 1 can be asserted whenever the objective function J satisfies the Kurdyka-Łojasiewicz (KL) property [31,32] at each point of 2. Compute the smallest non-negative integer i n such that α n = α 3. Compute φ (n+1) = φ (n) − α n ∇J(φ (n) ).
If 'Stopping Criterion' is satisfied 4. return else 5. set n = n + 1. EndIf EndFor 7. Compute the Cholesky factorization R T R of the m × m matrix G T G.

Define α (0)
n+i−1 = 1/θ i , i = 1, . . . , m. EndWhile its domain. More precisely, as shown in a number of recent papers [33][34][35], one can prove the convergence of a sequence {φ (n) } n∈N to a limit point (if any exists) which is stationary for J if the following three conditions are satisfied: This scheme applies to the LMSD method. First of all, condition (H3) is satisfied for the DIC functional defined in (4). Indeed J 0 is an analytic function (lemma 1, part (iii)) and J R is a semialgebraic function, which means that its graph is defined by a finite sequence of polynomial equations and inequalities (see [36] for a definition). Hence J is the sum of an analytic function and a semialgebraic one and for this reason it satisfies the KL property on R N (see [36, p 1769] and references therein). Conditions (H1)-(H2) follow from step 2 and 3, combined with the fact that ∇J is Lipschitz continuous (lemma 2, part (ii)), provided that the sequence of steplengths α (0) n defined at step 11 is bounded from above. Therefore we can state the following result: (4), {φ (n) } n∈N the sequence generated by algorithm 1 and α (0) n α max , where α max > 0. If φ * is a limit point of {φ (n) } n∈N , then φ * is a stationary point of J and φ (n) converges to φ * .

Proof. We start by proving condition (H1).
Step 3 of algorithm 1 can be rewritten in the following way: from which we have By substituting (13) in step 2 and since α n α (0) n α max , we obtain Then (H1) holds with a = ω/α max . Regarding condition (H2), we can rewrite again step 3 as: Recall that the Lipschitz continuity of ∇J implies that there is α min > 0 such that the linesearch parameter α n α min (see [14, proposition 4
Compute the smallest non-negative integer i n such that λ n = ρ in satisfies if 'Stopping Criterion' is satisfied 6. return else 7. set n = n + 1.

EndIf EndWhile
We now turn to the algorithm we used to address the nonsmooth TV functional. In particular, we considered a simplified version of a recently proposed proximal gradient method called VMILA (variable metric inexact linesearch algorithm) [14]. In its general form, this method exploits a variable metric in the (possibly inexact) computation of the proximal point at each iteration and a backtracking loop to satisfy an Armijo-like inequality. Effective variable metrics can be designed for specific objective functions by exploiting suitable decompositions of the gradient of the smooth part of the objective function itself [30,[37][38][39]. However, since in the DIC problem the gradient of J 0 does not lead to a natural decomposition in the required form, in our tests we used the standard Euclidean distance (we will denote with ILA this simplified version of VMILA).
The main steps of ILA are detailed in algorithm 2. At each iteration n, given the point φ (n) ∈ R N and the parameters α n > 0, γ ∈ [0, 1], we define the function We observe that h (n) γ is strongly convex for any γ ∈ (0, 1]. By setting h (n) = h (n) 1 and z (n) = φ (n) − α n ∇J 0 (φ (n) ), we define the unique proximal point In step 2 of algorithm 2, an approximation ψ(n) of the proximal point ψ (n) is defined by means of condition (16). Such a point can be practically computed by remarking that J R can be written as Then considering the dual problem of (19) the dual function Γ (n) has the following form where g * is the convex conjugate of g, namely the indicator function of the set B 2 0,µ N , being B 2 0,µ ⊂ R 2 the 2-dimensional Euclidean ball centered in 0 with radius μ.
Condition (16) is fulfilled by any point Such a point can be found by applying an iterative method to problem (20) and using (22) as stopping criterion. Similarly to LMSD, any limit point of the sequence generated by ILA is stationary for problem (4) [14, theorem 4.1] and, under the assumption that a limit point exists, the convergence of ILA to such a point holds when J satisfies the Kurdyka-Łojasiewicz property, the gradient of the smooth part ∇J 0 is Lipschitz continuous and the proximal point ψ(n) is computed exactly [35]. Whether and when ILA converges if the proximal point is computed inexactly is still an open problem, therefore all we can say for algorithm 2 applied to the DIC problem is that all its limit points are stationary.

Numerical experiments
In this section we test the effectiveness of the algorithms previously described in some synthetic problems. All the numerical results have been obtained on a PC equipped with an INTEL Core i7 processor 2.60 GHz with 8 GB of RAM running Matlab R2013a with its standard settings. For each test we will report the number of function evaluations, the number of gradient evaluations and the computational time needed by each algorithm to provide the reconstructed phase. With this information the reader should be able to estimate the complexity of the different approaches independently of the environment in which the algorithms are implemented and run. The LMSD and ILA routines for the DIC problem together with an illustrative example can be downloaded at the webpage www.oasis.unimore.it/site/home/software.html.

Comparison with state-of-the-art methods
Since in the DIC problem the evaluation of the gradient ∇J is computational demanding and its nonlinearity w.r.t. α requires a new computation for each step of the backtracking loop, in [9,11] a heuristic version of a nonlinear conjugate gradient (CG) is used exploiting a gradientfree linesearch based on a polynomial approximation method. Although this formulation has practical advantages, the resulting scheme is not guaranteed to converge, and in our tests we experienced very different behaviours w.r.t. to the choice of some initial parameters of the linesearch procedure. For this reason, we also implemented several standard CG methods [24,40], namely the Fletcher-Reeves (FR), Polak-Ribière (PR), PR with nonnegative values (PR + ) and PR constrained by the FR values (FR-PR) strategies [41]. For these algorithms, the global convergence is ensured by computing the steplength parameter by means of the strong Wolfe conditions [24,41].
The evaluations of the optimization methods have been carried out on two phantom objects (see figure 3), which have been computed by using the formula for the phase difference between two waves travelling through two different media where n 1 and n 2 are the refractive indices of the object structure and the surrounding medium, respectively, and t s is the thickness of the object at pixel s ∈ χ. The first phantom, denominated 'cone' and reported at the top row of figure 3, is a 64 × 64 phase function representing a truncated cone of radius r = 3.2 µm with n 1 = 1.33, n 2 = 1 and maximum value φ max = 1.57 rad attained at the cone vertex. The 'cross' phantom, shown at the bottom row of figure 3, is another 64 × 64 phase function of two crossing bars, each one of width 5 µm, measuring 0.114 rad inside the bars and 0 in the background. For both simulations, the DIC microscope parameters were set as follows: • shear: 2∆x = 0.6 µm; • bias: 2∆θ = π/2 rad; • numerical aperture of the objective: NA = 0.9. For each phantom, a dataset consisting of K = 2 polychromatic DIC images acquired at shear angles τ 1 = −π/4 rad and τ 2 = π/4 rad was created, as in model (1), by convolving the true phase function with the accordingly rotated DIC PSFs and then by corrupting the result with white Gaussian noise at different values of the signal-to-noise ratio SNR = 10 log 10 φ * σ (24) where φ * is the mean value of the true object and σ is the standard deviation of noise. The SNR values chosen in the simulations were 9 and 4.5 dB.
As far as the regularization parameter μ and the threshold δ in (5) are concerned, these have been manually chosen from a fixed range in order to obtain a visually satisfactory reconstruction. Note that the parameters were first set in the differentiable case (δ > 0) for the LMSD and the nonlinear CG methods and then the same value of the parameter μ was used also in the nondifferentiable case (δ = 0) for the ILA method. The values reported below have been used for each simulation presented in this section. The resulting values have been µ = 10 −2 , δ = 10 −2 for the cone and µ = 4 · 10 −2 , δ = 10 −3 for the cross.
Some details regarding the choice of the parameters involved in the optimization methods of section 3 are now provided. The linesearch parameters ρ, ω of the LMSD and ILA methods have been respectively set to 0.5, 10 −4 . These are the standard choices for the Armijo parameters, however it is known that the linesearch algorithm is not so sensible to modifications of these values [30,42]. The parameter γ in the Armijo-like rule (17) has been fixed equal to 1, which corresponds to the mildest choice in terms of decrease of the objective function J. The parameter m in algorithm 1 is typically a small value (m = 3, 4, 5), in order to avoid a significant computational cost in the calculation of the steplengths α  (20) is addressed, at each iteration of ILA, by means of algorithm FISTA [43] which is stopped by using criterion (22) with η = 10 −6 . This value represents a good balance between convergence speed and computational time per iteration [14]. Concerning the nonlinear CG methods equipped with the strong Wolfe conditions, we use the same parameters as done in [41] and we initialize the related backtracking procedure as suggested in [24, p 59]. Regarding the CG methods endowed with the polynomial approximation, a restart of the method is performed by taking a steepest descent step, whenever the search direction fails to be a descent direction. Finally, the constant phase object φ (0) = 0 is chosen as an initial guess for all methods.
In order to evaluate the performance of the phase reconstruction methods proposed in section 3, we will make use of the following error distance where φ * is the phase to be reconstructed and c = j∈χ . Unlike the usual root mean squared error, which is recovered by setting c = 0 in (25), the error distance defined in (25) is invariant with respect to phase shifts, i.e.
That makes the choice of (25) well-suited for problem (4), whose solution might be recovered only up to a real constant.
The methods have been run for the cone and cross phantoms with the parameters setting previously outlined. Since in the unconstrained differentiable case the goal is to vanish the gradient of J, the iterations of the LMSD and the CG methods have been arrested when the following stopping criterion based on the decrease of the gradient norm ∇J(φ (n) ) κ (27) was met with κ = 4 · 10 −2 for the cone and κ = 10 −3 for the cross. On the other hand, since with the TV functional the gradient is not available, the ILA method has been stopped when the error up-to-a-constant between two successive iterates was lower than a prefixed κ > 0, that is where φ (n+1) − φ (n) is the mean value of the difference between the two objects. The tolerance κ in (28) was set equal to 5 · 10 −5 for the cone and 10 −4 for the cross. We remark that these values, as the ones suggested before for the stopping criterion (27), have been tuned in order to obtain sensible reconstructions with errors close to the optimal ones. In figure 4 we show the reconstruction error provided by the different methods as a function of the computational time. Among the CG methods, we report only the results obtained by the PR algorithm combined with a polynomial-approximation-based linesearch (PR-PA) and the FR-PR one in which the linesearch parameter is computed with the SW conditions (FR-PR-SW), since they always outperformed the other possible choices. From the plots of figure 4, it can be drawn that each method is quite stable with respect to the noise level on the DIC images. However, in terms of time efficiency, LMSD outperforms the CG methods in both tests, showing a time reduction of at least 50% to satisfy the stopping criterion. We report some further details in tables 1 and 2 on the computational cost of the different methods. Of course the numerical values in the tables depend on the tolerances chosen for the stopping criteria, but some general considerations can be drawn, e.g. on the number of evaluations of J and ∇J (∇J 0 for ILA). For instance, in the case of the cone (table 1), LMSD evaluates the function less than two times per iteration. By contrast, the backtracking procedure exploited in the FR-PR-SW method requires an average of four evaluations per iteration of both the function and gradient to satisfy the strong Wolfe conditions, whereas the PR-PA method, despite evaluating the gradient only once, need on average 12 evaluations of the function before detecting the correct three-points-interval (see [11]). One could reduce the number of evaluations in PR-PA by properly tuning the initial parameters of the linesearch. However, as mentioned before, this method is quite sensitive to this choice, and little variations might result in a great increase of the number of restarts and, eventually, in the divergence of the algorithm. In addition, it seems that the optimal value of these parameters strictly depends on the object to be reconstructed.

Comparison between LMSD and ILA
We now compare the performance of LMSD and ILA. On one hand, ILA reconstructs the cross object slightly better than LMSD. Indeed, ILA provides the lowest reconstruction error in table 2 for each SNR value and the corresponding phase estimates have better preserved edges, as clearly depicted in figure 5, where we consider the following 'up-to-a-constant' residual to measure the quality of the reconstructions provided by the two methods. This result was expected, since ILA addresses problem (4) with the standard TV functional (δ = 0 in (5)), which is more suited than HS regularization (δ > 0) when the object to be reconstructed is piecewise constant. On the other hand, ILA may be computationally more expensive since, unlike LMSD, it requires to iteratively solve the inner subproblem (20) at each outer iteration. Indeed, looking at table 2 we notice that, although the number of function evaluations per iteration in LMSD and ILA is quite similar (on average around 1.4 for LMSD and 1.8 for ILA) and the ILA iterations are stopped way before the LMSD ones, the computational time in ILA is always higher. For instance, in the case SNR = 9 dB, the methods require approximately the same time, although the number of iterations of ILA is more than halved. This fact is explained  (see table 1). In this case, LMSD is able to achieve a lower reconstruction error w.r.t. ILA in very few iterations, providing a remarkable gain in the computational time needed. In order to deepen the analysis between the differentiable TV approximation and the original nondifferentiable one, we compared the LMSD and ILA methods in one further realistic simulation. In particular, we considered the 'grid' object in figure 6, which is a 1388 × 1040 image emulating the phase function of a multi-area calibration artifact [44], which measures 1.212 rad inside the black regions and 2.187 rad inside the white ones. The setup of the two methods is identical to that of the previous tests (with the exception of the numerical aperture of the objective NA which has been set equal to 0.8), and the parameters μ (for both models) and δ (for the smooth TV functional) have been set equal to 2 · 10 −1 and 10 −1 , respectively. Instead of three levels of noise, here we only considered a SNR equal to 9 dB. In figure 7 we  The grid dataset confirms the remarks previously done, since ILA takes almost twice as long compared to LMSD to provide an estimate of the phase. This is again due to the number of inner iterations, which starts to oscillatory increase after the first 20 iterations (see figure 7). To conclude, we reckon that the LMSD method is generally preferable since, unlike ILA, it does not require any inner subproblem to be solved and thus it is generally less expensive from the computational point of view. However, the ILA method should be considered as a valid alternative when the object to be reconstructed is piecewise constant.

Influence of color and bias retardation on phase reconstruction
Another analysis of our interest was to observe how color information and bias retardation in the observations affect the behavior of phase reconstruction. We set four scenarios for comparison: independent monochromatic observations with red, green, and blue light, and polychromatic observation where all wavelengths are combined. For each of these scenarios we used the cross object to generate 100 observations at different realizations of noise, for both SNR = 4.5 dB and SNR = 9 dB, and bias retardation of 0 rad and π/2 rad, at shear angles equal to −π/4 rad and π/4 rad. We tested the LMSD method to perform the reconstructions; results for SNR = 4.5 dB are shown in figure 8 and for SNR = 9 dB in figure 9. Data and results for the grid object. From left to right: true object, noisy DIC color image taken at shear angle π 4 rad and corrupted with white Gaussian noise at SNR = 9 dB, and reconstructed phase with the LMSD method from observations at shear angles equal to −π/4 rad and π/4 rad. The lines show the average error over the 100 observations. It is noticed that for 0 rad bias retardation, the reconstruction for polychromatic observations behave better than for the monochromatic ones, even though the amount of error is not promising of a good reconstruction. For π/2 rad bias retardation the algorithm stops before the maximum number of iterations (500) is reached. In this case, for both levels of noise, the performance of the reconstruction with polychromatic light is quite comparable with monochromatic light. Another interesting finding about the convergence for monochromatic light, is that for all cases, it happens in the order red-green-blue; this is due to the fact that the amplitude PSF for blue light has the bigger frequency support, thus provides more information for reconstruction.

Conclusions and future work
In this paper we provided both theoretical and practical contributions to the inverse problem of phase estimation from polychromatic DIC images. First of all, we studied the analytical properties of the LS data fidelity function arising from a maximum likelihood approach, showing its periodicity, shift-invariance and analyticity. Secondly, we analyzed the minimization  problem of the functional given by the sum of the discrepancy term and an edge-preserving regularizer, proving the existence of minimum points. Finally, we proposed two recent optimization strategies for the case of both smooth and nonsmooth regularizers, and compared their performance with state-of-the-art conjugate gradient methods. In particular, for the HS regularizer we considered the LMSD method while in the case of the nonsmooth TV functional we proposed to exploit the ILA approach.
From the analysis we performed we drive the conclusions that an edge-preserving regularizer combined with an effective optimization tool can rapidly provide a good reconstruction of the phase. Of course the LMSD method has a much simpler structure than ILA and, in general, it should converge faster since ILA depends on two cycles of iterations (the outer defining the sequence and the inner computing the proximal point). However, in our tests the differences in time are not so significant, therefore a possible user might prefer to avoid the choice of a further parameter (the δ defining the HS term) and adopt the standard LS + TV model. Future work will concern the application of the proposed strategies to real acquisitions and the reformulation of the minimization problem in terms of the specimen's transmission function e −iφ . This would lead to a standard regularized LS problem restricted to a nonconvex feasible set, which would require a generalization of the (VM)ILA approach able to account for nonconvex projections and to exploit the steplength selection rule proposed by Fletcher [25] in the presence of constraints [45,46].