Tomography from highly limited data
Compressed sensing reconstruction
Pierre Paleo –
[email protected] ESRF
20/05/2015
Outline
1 From FBP to iterative techniques
2 Dictionary Learning : an example of convex functional
3 Optimization algorithms
4 PyHST2 : features and outlooks
2/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
1. Plan
1 From FBP to iterative techniques
2 Dictionary Learning : an example of convex functional
3 Optimization algorithms
4 PyHST2 : features and outlooks
3/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
1. FBP : limitations
Filtered Backprojection is fast, but... • It needs ∼ π2 × Nrows projections • Subsampling leads to a poor reconstruction quality (star
artifacts, ...) • It is not parametric : no room for a priori knowledge
I Not adapted to highly limited data
4/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
1. Iterative techniques : framework • Tomographic reconstruction problem amounts to an
optimization problem : argmin {f (x, d) + g(x)} x
• • • •
d: x: f : g:
acquired projections slice/volume to reconstruct fidelity term regularization term
• Example : f (x, d) = 12 kP x − dk22 , g(x) = 0 • P : forward projection operator • Least squares formulation of P x = d • ART, SIRT ...
5/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
1. Importance of regularization
• Problem P x = d is ill-posed • Proper regularization imposes stability • Regularization allows to incorporate a priori information 2 • Tikhonov : penalize large norm solutions : g(x) = β kxk2 • LASSO : g(x) = β kxk1 • Total Variation : penalize not null gradients : g(x) = k∇xk1 • Compressed sensing : “accurate reconstruction from highly
undersampled data”
6/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
1. Compressed sensing iterative techniques • Accurately reconstruct with few projections (∼ Q log N 2 ) [1] • Q : number of non-zeros elements in a domain • Choice of the sampling operator (sparsifying transform) • L1 norm as a measure of sparsity (convex relaxation of L0) • Non-smooth functionals : “gradient” optimization fails • Convex optimization • Adapted optimization algorithms for non-smooth convex
functionals • Often rely on Proximal operator (or resolvent) [2] :
proxg (y) = argmin x
1 2 kx − yk2 + g(x) 2
• Can be fast, depending on the assumptions on f and g
7/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
1. Example : Total Variation • For piece-wise constant images, the gradient ∇x is sparse • TV reconstruction : kP x − dk22 + β k∇xk1
|
{z
f (x)
}
| {z } g(x)
• Possible reconstruction of 2k×2k slices with ∼ 150 projections
TV reconstruction of 2k×2k slice with 150 projections. β ∈
8/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
n
10, 3000, 1 · 105
o
[3]
1. Functionals and a priori knowledge • Functionals can also be adapted to correct artifacts • Example : ring artifacts correction [4]
F (x, r) =
1 kP x + r − dk22 + krk1 + k∇xk1 2
Example of iterative ring artifacts correction with Total Variation reconstruction
I Many functionals, many possibilities ! 9/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
2. Plan
1 From FBP to iterative techniques
2 Dictionary Learning : an example of convex functional
3 Optimization algorithms
4 PyHST2 : features and outlooks
10/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
2. Dictionary Learning • The images are not always piecewise constant • Dictionary Learning (DL) : build a basis where the image is
sparse • Each part (patch) of the image is a linear combination of the
basis vectors (atoms) ϕk : patch(p) =
X k
11/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
wk,p ϕk
2. Dictionary Learning
• For pixel i of the image x :
xi =
X
wk,pi ϕk (i−r pi )
k
pi : patch containing the pixel i rpi : center of this patch
• To avoid discontinuities effects, patches are allowed to
overlap
12/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
2. Dictionary Learning • For pixel i of the image x :
xi =
X
wk,pi ϕk (i−r pi )
k
pi : patch containing the pixel i rpi : center of this patch
• To avoid discontinuities effects, patches are allowed to
overlap
12/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
2. Dictionary Learning • For pixel i of the image x :
xi =
X
wk,pi ϕk (i−r pi )
k
pi : patch containing the pixel i rpi : center of this patch
• To avoid discontinuities effects, patches are allowed to
overlap
12/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
2. Dictionary Learning
xi =
X
wk,pi ϕk (i − r pi )
k
{z
|
}
combination of atoms on the patch containing i
1p (i) : indicator of the patch X 1p (i) ≥ 1
xi =
X p
p
1cp (i) : indicator of the patch center X c 1p (i) = 1
|
Tomography from highly limited data
20/05/2015
X
wk,p ϕk (i − r p )
k
{z
}
combination of atoms on the patch whose center contains i
p
13/32
1cp (i)
Pierre Paleo
2. Dictionary Learning • Functional for Dictionary Learning :
F (w) = f1 (w) + f2 (w) + {z } | convex, smooth
f1 (w) = kP · x(w) −
g(w) | {z }
convex, non smooth
dk22
fidelity term !2
f2 (w) = ρ ·
X
1p (i) xi −
p,i
X
wk,p ϕk (i − rp )
overlap weight
k
g(w) = β · kwk1
sparsity weight
• Proximal algorithms : proxg (w) ˜ is straightforward (g = k·k1 ) • Evaluating the gradient of f1 + f2 is the computationally
expensive part 14/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
2. Advantages wrt. TV • Adapted for a larger variety of images • Easier on the optimization side • More robust than TV when the SNR is low
FBP (left) and DL reconstruction (right) of a 1024 × 1024 phantom with 150 projections in presence of a Gaussian noise (σ = 5%(max)).
15/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
3. Plan
1 From FBP to iterative techniques
2 Dictionary Learning : an example of convex functional
3 Optimization algorithms
4 PyHST2 : features and outlooks
16/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
3. Optimization algorithms
• Building a functional is only one part of the work • Other part : designing an optimization algorithm • General purpose (few assumptions on functional properties) vs specialized (exploit smooth/non-smooth terms properties)
17/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
3. Proximal algorithms • Problem statement :
argmin {f (x) + g(x)} x
f : convex, Lipschitz-differentiable g : convex, non smooth
• The first order condition at an optimum x ˆ is
0 ∈ ∇f (ˆ x) + ∂g(ˆ x) 0 ∈ ∇f (ˆ x) − x ˆ+x ˆ + ∂g(ˆ x) (Id +∂g) (ˆ x) ∈ (Id −∇f ) (ˆ x) x) x ˆ = (Id +∂g)−1 (Id −∇f ) (ˆ | {z } proxg
I Iterative point scheme 18/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
3. ISTA, FISTA • ISTA : xk+1
1 = proxg/L xk − ∇f (xk ) L
• L : Lipschitz constant of ∇f
• Accelerated version known as FISTA[5] or Nesterov[6]
algorithm • One iteration of FISTA is :
1 xk = proxg/L xk − ∇f (xk ) L q tk+1 = 1 + 1 + 4t2k /2 tk − 1 xk+1 = xk + (xk − xk−1 ) tk+1
19/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
3. One drawback of FISTA • FISTA is well-suited for TV-denoising problems :
argmin x
1 kx − dk22 + k∇xk1 2
• But for TV-deblurring problems :
argmin x
1 kP x − dk 22 + k∇xk1 2
two nested loops are required
(except if P can be diagonalized)
1 • At each iteration : denoise x − P T (P x − d) L • Note: alternatives approaches for Total Variation : • Primal smoothing : k∇xk1 '
Xp |(∇x)i |2 + µ2 i
• Dual smoothing (Moreau-Yosida regularization) 20/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
3. Chambolle-Pock algorithm • Primal problem :
min f (x) + g(Kx) x
• Corresponding primal-dual saddle-point problem :
min max {hKx , yi + f (x) − g ∗ (y)} x
y
K : linear operator ; g ∗ : Fenchel conjugate of g • One iteration of Chambolle-Pock algorithm :
yn+1 = proxσg∗ (yn + σKxn ) xn+1 = proxτ f (xn − τ K ∗ yn+1 ) xn+1 = xn+1 + θ · (xn+1 − xn ) • This is much more flexible than FISTA 21/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
3. Example for Total Variation (1/2) • Primal problem :
min f (x) + g(Kx) x
f (x) = kP x − dk22 g(v) = kvk1 , K = ∇
• Using
1 2
2
kP x − dk2
∗ = max q
D E 1 1 2 T 2 hP x − d , qi − kqk = max x , P q − hd , qi − kqk q 2 2
the primal-dual problem is
1 2 T min max x , − div p + P q − hd , qi − iβB∞ (p) − kqk x p,q 2 f , g and K become : f (x) = 0 1 kqk2 + hd , qi 2 ∇ ∗ T K = − div , P ⇔ K= P g ∗ (p, q) = iβB∞ (p) +
22/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
3. Example for Total Variation (2/2)
• The proximal operators are :
proxτ f (x) = τ x q − σd βp proxσg∗ (p, q) = , max(β, |p|) 1 + σ
yn+1 = proxσg∗ (yn + σKxn ) xn+1 = proxτ f (xn − τ K ∗ yn+1 ) xn+1 = xn+1 + θ · (xn+1 − xn )
• These element-wise operations are GPU-friendly • The convergence rate is comparable to FISTA
23/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
4. Plan
1 From FBP to iterative techniques
2 Dictionary Learning : an example of convex functional
3 Optimization algorithms
4 PyHST2 : features and outlooks
24/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
4. The case of PyHST2
• PyHST2 : software used at ESRF for tomographic
reconstruction • Currently implements two types of iterative methods • Reconstruction with TV regularization • Optimization with FISTA • Optimization with Chambolle-Pock algorithm • Dictionary Learning reconstruction • Optimization with FISTA • Ongoing work on a Conjugate Subgradient algorithm • A ring artifacts correction method is available in the iterative
algorithms [4]
25/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
4. PyHST2 : Conjugate subgradient • For L2+L1 minimization (Dictionary Learning !), the
subgradient is easy to compute : ∂ k·k1 = sign (·) • This makes possible the use of subgradient algorithms. • Ongoing work : conjugate subgradient for L1 minimization
26/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
4. Direct Fourier Projection/Inversion
• For Chambolle-Pock algorithm, each iteration is very fast • Bottleneck : computation the forward/backward projections • Use of Direct Fourier Projection (DFP) and Direct Fourier
Inversion (DFI) [7] F1 (dθ ) (ν) = F2 (x) (ν cos θ, ν sin θ) |{z} | {z } sinogram at angle θ
line (ν cos θ,ν sin θ) of 2D FT of the slice
• DFI : P T y = F2−1 (Pol2Cart (F1 (yθ ))) • DFP : P x = F1−1 (Cart2Pol (F2 (x)))
27/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
4. Wavelets/Curvelets as sparsifying operators
• Dictionary Learning is a “L2+L1” optimization problem
argmin w
1 kP x(w) − dk22 + β kwk1 2
• The optimization problem is simple, but each iteration is
rather expansive (w → x(w)) • Replace DL by a faster operator :
argmin w
1 kP A∗ w − dk22 + β kwk1 2
A : (shift-invariant) wavelet transform, curvelet transform...
28/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
4. Conclusion
• Tomographic reconstruction : build a functional + design an
optimization algorithm • Compressed sensing enables reconstruction from few
projections • There is room for • Quality improvements, modeling, artifacts reduction : new functionals • Speed : new algorithms
29/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
Thank you for your attention !
30/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
E.J. Candes, J. Romberg, and T. Tao. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. Information Theory, IEEE Transactions on, 52(2):489–509, Feb 2006. P. L. Combettes and J.-C. Pesquet. Proximal Splitting Methods in Signal Processing. ArXiv e-prints, December 2009. Alessandro Mirone, Emmanuel Brun, Emmanuelle Gouillart, Paul Tafforeau, and Jerome Kieffer. The pyhst2 hybrid distributed code for high speed tomographic reconstruction with iterative reconstruction and a priori knowledge capabilities. Nuclear Instruments and Methods in Physics Research Section B: Beam Interactions with Materials and Atoms, 324(0):41 – 48, 2014. 1st International Conference on Tomography of Materials and Structures. Pierre Paleo and Alessandro Mirone. Ring artifacts correction in compressed sensing tomographic reconstruction. Journal of Synchrotron Radiation, forthcoming. Amir Beck and Marc Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Img. Sci., 2(1):183–202, March 2009. Yu. Nesterov. Gradient methods for minimizing composite objective function. CORE Discussion Papers 2007076, Universit catholique de Louvain, Center for Operations Research and Econometrics (CORE), 2007. Roman Shkarin et al. Gpu-optimized direct fourier method for on-line tomography. ??, 2014.
31/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
4. FISTA: Why TV-deblurring is more difficult
• The primal problem is : minx
n
1 2
kx − dk22 + k∇xk1
o
(1).
• FISTA requires to compute the above proximal operator of TV
• The dual of (1) is : minx
max
n
kzk∞ ≤1
1 2
kx −
dk22
which can be re-written 1 2 max min kx − dk2 − hx , div zi 2 kzk∞ ≤1 x
+ h∇x , zi
(2)
• Differentiating wrt x gives x∗ = d + div z • Now if the fidelity term is kAx − dk22 , one would have to
invert AT A !
32/32
Tomography from highly limited data
20/05/2015
Pierre Paleo
o
4. Total Variation : Moreau-Yosida regularization Approximate J(x) = k∇xk1 = max {h∇x , zi} by kzk∞ ≤1
o nµ n µ h∇x , zi − kzk22 + 2 2 kzk∞ ≤1
Jµ (x) = max Then Jµ (x) =
X
ψµ (|(∇x)i |)
with
i
ψµ (x) = and ∇Jµ (x) = − div Φ
32/32
Tomography from highly limited data
v2 2µ
+
µ 2
if |v| ≥ µ otherwise
with (
Φi =
( |v|
(∇x)i |(∇x)i | (∇x)i µ
20/05/2015
if |(∇x)i | ≥ µ otherwise Pierre Paleo