Tomography from highly limited data Compressed ... - Pierre Paleo

For piece-wise constant images, the gradient ∇x is sparse ..... (∇x)i. µ otherwise. 32/32. Tomography from highly limited data. 20/05/2015. Pierre Paleo.
2MB taille 4 téléchargements 216 vues
Tomography from highly limited data

Compressed sensing reconstruction

Pierre Paleo – [email protected] ESRF

20/05/2015

Outline

1 From FBP to iterative techniques

2 Dictionary Learning : an example of convex functional

3 Optimization algorithms

4 PyHST2 : features and outlooks

2/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

1. Plan

1 From FBP to iterative techniques

2 Dictionary Learning : an example of convex functional

3 Optimization algorithms

4 PyHST2 : features and outlooks

3/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

1. FBP : limitations

Filtered Backprojection is fast, but... • It needs ∼ π2 × Nrows projections • Subsampling leads to a poor reconstruction quality (star

artifacts, ...) • It is not parametric : no room for a priori knowledge

I Not adapted to highly limited data

4/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

1. Iterative techniques : framework • Tomographic reconstruction problem amounts to an

optimization problem : argmin {f (x, d) + g(x)} x

• • • •

d: x: f : g:

acquired projections slice/volume to reconstruct fidelity term regularization term

• Example : f (x, d) = 12 kP x − dk22 , g(x) = 0 • P : forward projection operator • Least squares formulation of P x = d • ART, SIRT ...

5/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

1. Importance of regularization

• Problem P x = d is ill-posed • Proper regularization imposes stability • Regularization allows to incorporate a priori information 2 • Tikhonov : penalize large norm solutions : g(x) = β kxk2 • LASSO : g(x) = β kxk1 • Total Variation : penalize not null gradients : g(x) = k∇xk1 • Compressed sensing : “accurate reconstruction from highly

undersampled data”

6/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

1. Compressed sensing iterative techniques  • Accurately reconstruct with few projections (∼ Q log N 2 ) [1] • Q : number of non-zeros elements in a domain • Choice of the sampling operator (sparsifying transform) • L1 norm as a measure of sparsity (convex relaxation of L0) • Non-smooth functionals : “gradient” optimization fails • Convex optimization • Adapted optimization algorithms for non-smooth convex

functionals • Often rely on Proximal operator (or resolvent) [2] :

 proxg (y) = argmin x

 1 2 kx − yk2 + g(x) 2

• Can be fast, depending on the assumptions on f and g

7/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

1. Example : Total Variation • For piece-wise constant images, the gradient ∇x is sparse • TV reconstruction : kP x − dk22 + β k∇xk1

|

{z

f (x)

}

| {z } g(x)

• Possible reconstruction of 2k×2k slices with ∼ 150 projections

TV reconstruction of 2k×2k slice with 150 projections. β ∈

8/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

n

10, 3000, 1 · 105

o

[3]

1. Functionals and a priori knowledge • Functionals can also be adapted to correct artifacts • Example : ring artifacts correction [4]

F (x, r) =

1 kP x + r − dk22 + krk1 + k∇xk1 2

Example of iterative ring artifacts correction with Total Variation reconstruction

I Many functionals, many possibilities ! 9/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

2. Plan

1 From FBP to iterative techniques

2 Dictionary Learning : an example of convex functional

3 Optimization algorithms

4 PyHST2 : features and outlooks

10/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

2. Dictionary Learning • The images are not always piecewise constant • Dictionary Learning (DL) : build a basis where the image is

sparse • Each part (patch) of the image is a linear combination of the

basis vectors (atoms) ϕk : patch(p) =

X k

11/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

wk,p ϕk

2. Dictionary Learning

• For pixel i of the image x :

xi =

X

wk,pi ϕk (i−r pi )

k

pi : patch containing the pixel i rpi : center of this patch

• To avoid discontinuities effects, patches are allowed to

overlap

12/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

2. Dictionary Learning • For pixel i of the image x :

xi =

X

wk,pi ϕk (i−r pi )

k

pi : patch containing the pixel i rpi : center of this patch

• To avoid discontinuities effects, patches are allowed to

overlap

12/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

2. Dictionary Learning • For pixel i of the image x :

xi =

X

wk,pi ϕk (i−r pi )

k

pi : patch containing the pixel i rpi : center of this patch

• To avoid discontinuities effects, patches are allowed to

overlap

12/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

2. Dictionary Learning

xi =

X

wk,pi ϕk (i − r pi )

k

{z

|

}

combination of atoms on the patch containing i

1p (i) : indicator of the patch X 1p (i) ≥ 1

xi =

X p

p

1cp (i) : indicator of the patch center X c 1p (i) = 1

|

Tomography from highly limited data

20/05/2015

X

wk,p ϕk (i − r p )

k

{z

}

combination of atoms on the patch whose center contains i

p

13/32

1cp (i)

Pierre Paleo

2. Dictionary Learning • Functional for Dictionary Learning :

F (w) = f1 (w) + f2 (w) + {z } | convex, smooth

f1 (w) = kP · x(w) −

g(w) | {z }

convex, non smooth

dk22

fidelity term !2

f2 (w) = ρ ·

X

1p (i) xi −

p,i

X

wk,p ϕk (i − rp )

overlap weight

k

g(w) = β · kwk1

sparsity weight

• Proximal algorithms : proxg (w) ˜ is straightforward (g = k·k1 ) • Evaluating the gradient of f1 + f2 is the computationally

expensive part 14/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

2. Advantages wrt. TV • Adapted for a larger variety of images • Easier on the optimization side • More robust than TV when the SNR is low

FBP (left) and DL reconstruction (right) of a 1024 × 1024 phantom with 150 projections in presence of a Gaussian noise (σ = 5%(max)).

15/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

3. Plan

1 From FBP to iterative techniques

2 Dictionary Learning : an example of convex functional

3 Optimization algorithms

4 PyHST2 : features and outlooks

16/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

3. Optimization algorithms

• Building a functional is only one part of the work • Other part : designing an optimization algorithm • General purpose (few assumptions on functional properties) vs specialized (exploit smooth/non-smooth terms properties)

17/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

3. Proximal algorithms • Problem statement :

argmin {f (x) + g(x)} x

f : convex, Lipschitz-differentiable g : convex, non smooth

• The first order condition at an optimum x ˆ is

0 ∈ ∇f (ˆ x) + ∂g(ˆ x) 0 ∈ ∇f (ˆ x) − x ˆ+x ˆ + ∂g(ˆ x) (Id +∂g) (ˆ x) ∈ (Id −∇f ) (ˆ x) x) x ˆ = (Id +∂g)−1 (Id −∇f ) (ˆ | {z } proxg

I Iterative point scheme 18/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

3. ISTA, FISTA • ISTA : xk+1

  1 = proxg/L xk − ∇f (xk ) L

• L : Lipschitz constant of ∇f

• Accelerated version known as FISTA[5] or Nesterov[6]

algorithm • One iteration of FISTA is :



 1 xk = proxg/L xk − ∇f (xk ) L   q tk+1 = 1 + 1 + 4t2k /2   tk − 1 xk+1 = xk + (xk − xk−1 ) tk+1

19/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

3. One drawback of FISTA • FISTA is well-suited for TV-denoising problems :

argmin x

1 kx − dk22 + k∇xk1 2

• But for TV-deblurring problems :

argmin x

1 kP x − dk 22 + k∇xk1 2

two nested loops are required

(except if P can be diagonalized)

  1 • At each iteration : denoise x − P T (P x − d) L • Note: alternatives approaches for Total Variation : • Primal smoothing : k∇xk1 '

Xp |(∇x)i |2 + µ2 i

• Dual smoothing (Moreau-Yosida regularization) 20/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

3. Chambolle-Pock algorithm • Primal problem :

min f (x) + g(Kx) x

• Corresponding primal-dual saddle-point problem :

min max {hKx , yi + f (x) − g ∗ (y)} x

y

K : linear operator ; g ∗ : Fenchel conjugate of g • One iteration of Chambolle-Pock algorithm :

yn+1 = proxσg∗ (yn + σKxn ) xn+1 = proxτ f (xn − τ K ∗ yn+1 ) xn+1 = xn+1 + θ · (xn+1 − xn ) • This is much more flexible than FISTA 21/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

3. Example for Total Variation (1/2) • Primal problem :

min f (x) + g(Kx) x

f (x) = kP x − dk22 g(v) = kvk1 , K = ∇

• Using 

1 2

2

kP x − dk2

∗ = max q

  D  E 1 1 2 T 2 hP x − d , qi − kqk = max x , P q − hd , qi − kqk q 2 2

the primal-dual problem is  

1 2 T min max x , − div p + P q − hd , qi − iβB∞ (p) − kqk x p,q 2 f , g and K become : f (x) = 0 1 kqk2 + hd , qi 2    ∇ ∗ T K = − div , P ⇔ K= P g ∗ (p, q) = iβB∞ (p) +

22/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

3. Example for Total Variation (2/2)

• The proximal operators are :

proxτ f (x) = τ x   q − σd βp proxσg∗ (p, q) = , max(β, |p|) 1 + σ

yn+1 = proxσg∗ (yn + σKxn ) xn+1 = proxτ f (xn − τ K ∗ yn+1 ) xn+1 = xn+1 + θ · (xn+1 − xn )

• These element-wise operations are GPU-friendly • The convergence rate is comparable to FISTA

23/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

4. Plan

1 From FBP to iterative techniques

2 Dictionary Learning : an example of convex functional

3 Optimization algorithms

4 PyHST2 : features and outlooks

24/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

4. The case of PyHST2

• PyHST2 : software used at ESRF for tomographic

reconstruction • Currently implements two types of iterative methods • Reconstruction with TV regularization • Optimization with FISTA • Optimization with Chambolle-Pock algorithm • Dictionary Learning reconstruction • Optimization with FISTA • Ongoing work on a Conjugate Subgradient algorithm • A ring artifacts correction method is available in the iterative

algorithms [4]

25/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

4. PyHST2 : Conjugate subgradient • For L2+L1 minimization (Dictionary Learning !), the

subgradient is easy to compute : ∂ k·k1 = sign (·) • This makes possible the use of subgradient algorithms. • Ongoing work : conjugate subgradient for L1 minimization

26/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

4. Direct Fourier Projection/Inversion

• For Chambolle-Pock algorithm, each iteration is very fast • Bottleneck : computation the forward/backward projections • Use of Direct Fourier Projection (DFP) and Direct Fourier

Inversion (DFI) [7] F1 (dθ ) (ν) = F2 (x) (ν cos θ, ν sin θ) |{z} | {z } sinogram at angle θ

line (ν cos θ,ν sin θ) of 2D FT of the slice

• DFI : P T y = F2−1 (Pol2Cart (F1 (yθ ))) • DFP : P x = F1−1 (Cart2Pol (F2 (x)))

27/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

4. Wavelets/Curvelets as sparsifying operators

• Dictionary Learning is a “L2+L1” optimization problem

 argmin w

1 kP x(w) − dk22 + β kwk1 2



• The optimization problem is simple, but each iteration is

rather expansive (w → x(w)) • Replace DL by a faster operator :

 argmin w

1 kP A∗ w − dk22 + β kwk1 2



A : (shift-invariant) wavelet transform, curvelet transform...

28/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

4. Conclusion

• Tomographic reconstruction : build a functional + design an

optimization algorithm • Compressed sensing enables reconstruction from few

projections • There is room for • Quality improvements, modeling, artifacts reduction : new functionals • Speed : new algorithms

29/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

Thank you for your attention !

30/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

E.J. Candes, J. Romberg, and T. Tao. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. Information Theory, IEEE Transactions on, 52(2):489–509, Feb 2006. P. L. Combettes and J.-C. Pesquet. Proximal Splitting Methods in Signal Processing. ArXiv e-prints, December 2009. Alessandro Mirone, Emmanuel Brun, Emmanuelle Gouillart, Paul Tafforeau, and Jerome Kieffer. The pyhst2 hybrid distributed code for high speed tomographic reconstruction with iterative reconstruction and a priori knowledge capabilities. Nuclear Instruments and Methods in Physics Research Section B: Beam Interactions with Materials and Atoms, 324(0):41 – 48, 2014. 1st International Conference on Tomography of Materials and Structures. Pierre Paleo and Alessandro Mirone. Ring artifacts correction in compressed sensing tomographic reconstruction. Journal of Synchrotron Radiation, forthcoming. Amir Beck and Marc Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Img. Sci., 2(1):183–202, March 2009. Yu. Nesterov. Gradient methods for minimizing composite objective function. CORE Discussion Papers 2007076, Universit catholique de Louvain, Center for Operations Research and Econometrics (CORE), 2007. Roman Shkarin et al. Gpu-optimized direct fourier method for on-line tomography. ??, 2014.

31/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

4. FISTA: Why TV-deblurring is more difficult

• The primal problem is : minx

n

1 2

kx − dk22 + k∇xk1

o

(1).

• FISTA requires to compute the above proximal operator of TV

 • The dual of (1) is : minx

max

n

kzk∞ ≤1

1 2

kx −

dk22

which can be re-written   1 2 max min kx − dk2 − hx , div zi 2 kzk∞ ≤1 x

+ h∇x , zi

(2)

• Differentiating wrt x gives x∗ = d + div z • Now if the fidelity term is kAx − dk22 , one would have to

invert AT A !

32/32

Tomography from highly limited data

20/05/2015

Pierre Paleo

o

4. Total Variation : Moreau-Yosida regularization Approximate J(x) = k∇xk1 = max {h∇x , zi} by kzk∞ ≤1

o nµ n µ h∇x , zi − kzk22 + 2 2 kzk∞ ≤1

Jµ (x) = max Then Jµ (x) =

X

ψµ (|(∇x)i |)

with

i

ψµ (x) = and ∇Jµ (x) = − div Φ

32/32

Tomography from highly limited data

v2 2µ

+

µ 2

if |v| ≥ µ otherwise

with (

Φi =

( |v|

(∇x)i |(∇x)i | (∇x)i µ

20/05/2015

if |(∇x)i | ≥ µ otherwise Pierre Paleo