A Student-t based sparsity enforcing hierarchical prior for linear

Aug 26, 2016 - A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1/27 ...
3MB taille 2 téléchargements 236 vues
. A Student-t based sparsity enforcing hierarchical prior for linear inverse problems and its efficient Bayesian computation for 2D and 3D Computed Tomography Ali Mohammad-Djafari, Li Wang, Nicolas Gac and Folker Bleichrodt Laboratoire des Signaux et Syst`emes (L2S) UMR8506 CNRS-CentraleSup´elec-UNIV PARIS SUD SUPELEC, 91192 Gif-sur-Yvette, France http://lss.centralesupelec.fr Email: [email protected] http://djafari.free.fr http://publicationslist.org/djafari iTwist2016, Aug. 24-26, 2016, Aalborg, Denemark

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1

Contents 1. Computed Tomography in 2D and 3D 2. Main classical methods 3. Basic Bayesian approach 4. Sparsity enforcing models through Student-t and IGSM 5. Computational tools: JMAP, EM, VBA 6. Implementation issues I

I

Main GPU implementation steps: Forward and Back Projections Multi-Resolution implementation

7. Some results 8. Conclusions

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2

Computed Tomography: Seeing inside of a body I

f (x, y ) a section of a real 3D body f (x, y , z)

I

gφ (r ) a line of observed radiography gφ (r , z)

I

Forward model: Line integrals or Radon Transform Z gφ (r ) = f (x, y ) dl + φ (r ) L

ZZ r ,φ = f (x, y ) δ(r − x cos φ − y sin φ) dx dy + φ (r ) I

Inverse problem: Image reconstruction Given the forward model H (Radon Transform) and a set of data gφi (r ), i = 1, · · · , M find f (x, y )

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 3

2D and 3D Computed Tomography 3D

2D

Z gφ (r1 , r2 ) =

Z f (x, y , z) dl

Lr1 ,r2 ,φ

gφ (r ) =

f (x, y ) dl Lr ,φ

Forward probelm: f (x, y ) or f (x, y , z) −→ gφ (r ) or gφ (r1 , r2 ) Inverse problem: gφ (r ) or gφ (r1 , r2 ) −→ f (x, y ) or f (x, y , z)

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 4

Algebraic methods: Discretization y 6

S•

Hij

r 

@ @

Q Q

f1 Q

@ @ @ f (x, y )@ @@   @  @ φ @ @ HH @ H

QQ fjQ Q Q Q Qg -

x

P f b (x, y ) j j j 1 if (x, y ) ∈ pixel j bj (x, y ) = 0 else f (x, y ) =

@ @ •D

@ @

g (r , φ) Z g (r , φ) =

f (x, y ) dl L

i

fN

gi =

N X

Hij fj + i

j=1

gk = Hk f + k , k = 1, · · · , K −→ g = Hf +  gk projection at angle φk , g all the projections.

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 5

Algebraic methods 

   g1 H1 X  ..   ..  Hk f+k = Hf+ g =  .  , H =  .  → gk = Hk f+k → g = k gK HK I I I I I I

I

H is huge dimensional: 2D: 106 × 106 , 3D: 109 × 109 . Hf corresponds to forward projection Ht g corresponds to Back projection (BP) H may not be invertible and even not square H is, in general, ill-conditioned In limited angle tomography H is under determined, si the problem has infinite number of solutions Minimum Norm Solution X bf = Ht (HHt )−1 g = Htk (Hk Htk )−1 gk k

can be interpreted as the Filtered Back Projection solution.

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 6

Prior information or constraints fj > 0 or fj ∈ IR+

I

Positivity:

I

Boundedness:

I

Smoothness: fj depends on the neighborhoods.

I

Sparsity: many fj are zeros.

I

Sparsity in a transform domain: f = Dz and many zj are zeros.

I

Discrete valued (DV):

I

Binary valued (BV):

I

Compactness: f (r) is non zero in one or few non-overlapping compact regions

I

Combination of the above mentioned constraints Main mathematical questions:

I

I I

1 > fj ) > 0 or fj ∈ [0, 1]

fj ∈ {0, 1, ..., K } fj ∈ {0, 1}

Which combination results to unique solution ? How to apply them ?

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 7

Algebraic methods: Regularization I I I

Minimum Norm Solution: minimize kfk22 s.t. Hf = g −→ bf = Ht (HHt )−1 g Least square Solution: bf = arg min J(f) = kg − Hfk2 → bf = (Ht H)−1 Ht g f Quadratic Regularization: J(f) = kg − Hfk2 + λkfk22 −→ bf = (Ht H + λI)−1 Ht g

I I I

L1 Regularization: J(f) = kg − Hfk2 + λkfk1 Lpq Regularization: J(f) = kg − Hfkpp + λkDfkqq More general Regularization: X X J(f) = φ(g i − [Hf]i ) + λ ψ(Df]j ) i

j

with convex potential functions φ and Ψ or J(f) = ∆1 (g, Hf) + λ∆2 (f, f 0 ) with ∆1 and ∆2 any distances (L2, L2, ..) or divergence (KL)

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 8

Deterministic approaches I

I I I

I

I I

I

Iterative methods: SIRT, ART, Quadratic or L1 regularization, Bloc Coordinate Descent, Multiplicative ART,... Criteria: J(f) = kg − Hfk2 + λkDfk2 Gradient based algorithms: ∇J(f) = −2Ht (g − Hf) + 2λDt Df Simplest algorithm: h i bf (k+1) = bf (k) + α(k) Ht (g − Hbf (k) ) + 2λDt Dbf (k) More criteria: P P J(f) = i φ(g i − [Hf]i ) + λ j ψ((Df]j ) with φ(t) and ψ(t) = {t 2 , |t|, |t|p , ...} or non convex ones. Imposing constraints in each iteration (example: DART) Mathematical studies of uniqueness and convergence of these algorithms are necessary Many specialized algorithms (ISTA, FISTA, ADMM, AMP, GAMP,...) are developped for L1 regularization.

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 9

Bayesian estimation approach M: I

I I I

g = Hf + 

Observation model M + Hypothesis on the noise  −→ p(g|f; M) = p (g − Hf) A priori information p(f|M) p(g|f; M) p(f|M) Bayes : p(f|g; M) = p(g|M) Gaussian priors: I I I I

I

Prior knowledge on the noise:  ∼ N (0, v 2 I) Prior knowledge on f: f ∼ hN (0, vf2 (D0 D)−1 ) i A posteriori: p(f|g) ∝ exp − 2v1 2 kg − Hfk2 − 2v1 2 kDfk2 f MAP : bf = arg maxf {p(f|g)} = arg minf {J(f)} 2 with J(f) = kg − Hfk2 + λkDfk2 , λ = vv2 f Advantage : characterization of the solution b p(f|g) = N (bf, Σ) with   bf = H0 H + λD0 D −1 H0 g, Σ b = v H0 H + λD0 D −1

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1

Sparsity enforcing models I

I

3 classes of models: 1- Generalized Gaussian (Double Exp. as particular case) 2- Mixture models (2 Gaussians mixture, BG, ..) 3- Heavy tailed (Student-t and Cauchy) Student-t model    ν+1 2 log 1 + f /ν St(f |ν) ∝ exp − 2

Infinite Gausian Scaled Mixture (IGSM) equivalence Z ∞ St(f |ν) ∝ N (f |, 0, 1/z) G(z|α, β) dz, with α = β = ν/2 0 i h P Q Q p(f|z) = j p(fj |z j ) = j N (fj |0, 1/z j ) ∝ exp − 21 j z j fj2 Q Q (α−1) p(z|α, β) = j G(z |α, β) ∝ j j zj iexp [−βz j ] hP (α − 1) ln z j − βz j ∝ exp h jP i p(f, z|α, β) ∝ exp − 21 j z j fj2 + (α − 1) ln z j − βz j I

            

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1

Sparse model in a Transform domain

f piecewise continuous

gradient of f Sparse

z =Haar Transform of f Sparse

I

Analysis: J(f) = kg − Hfk22 + λkDfk1

I

Synthesis: f = Dz −→ J(z) = kg − HDzk2 + λkzk1

I

Explicit modelling f = Dz + ξ,

z sparse, ξ Gaussian with unknown variance

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1

Sparse model in a Transform domain g = Hf + , f = Dz + ξ, z sparse  p(g|f, v ) = N (g|Hf, V ) V = diag [v ] p(f|z) = N (f|Dz, Vξ I) Vξ = diag [vξ ] αξ0 , βξ0 αz0 , βz0  p(z|vz ) = N (z|0, V ), V z z = diag [vz ] Q  ?  ?  vξ vz α , β p(v ) = Qi IG(vi |α0 , β0 )  0 0 p(vz ) = j IG(v zj |αz0 , βz0 ) ?  ?  ? p(v ) = Q IG(v |α , β )  ξ0 ξ ξ j ξ0 j v z ξ p(f, z, v , vz , v ξ |g) ∝p(g|f, v ) p(f|zf ) p(z|vz )    D ?  @  ? p(v ) p(vz ) p(v ξ ) R f @  – JMAP:   (bf, b z, vˆ , b vz , vbξ ) = arg max {p(f, z, v , vz , v ξ |g)} H ?  (f ,z,v ,vz ,v ξ ) g Alternate optimization.  – VBA: Approximate p(f, z, v , vz , v ξ |g) by q1 (f) q2 (z) q3 (v ) q4 (vz ) q5 (v ξ ) Alternate optimization.

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1

JMAP Algorithm [bf, b z, b vz , b v , b vξ ] = arg max(f ,z,vz ,v ,vξ ) {p (f, z, vz , v , vξ |g)} (k+1) (k) (k) (k) iter : bf = bf − γ bf ∇J (bf ) (k) iter : b z(k+1) = b z(k) − γ bz ∇J (b z(k) )  h i b 2 β0 + 21 g i − Hf z 2j βz0 + 12 b i , vc vbzj = αz +3/2 , vˆi = ξj = α +3/2 0

0

 2 b] βξ0 + 12 b f j −[Dz j αξ0 +3/2

where 0 −1 1 J (f) = 12 (g − Hf)0 V−1  (g − Hf) + 2 (f − Dz) Vξ (f − Dz) 0 −1 1 0 J (z) = 21 (f − Dz) V−1 z ξ (f − Dz) + 2 z Vz (k)

=

(k)

=

γ bf

γ bz

(k) 2

∇J (b

f )



2

2 (k)

b

b b(k)

Y  H∇J (b

+ Y

f ) ∇J ( f ) ξ



2 (k)

b ( z )

∇J



2 (k) 2 b

b b ) + Y z ∇J (z b (k) )

Y ξ D∇J (z

1

1

b = V b − 2 and Y bξ = V b−2 where Y  ξ 1

bz = V b−2 where Y z

and ∇J (·) is the gradient of J (·).

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1

Implementation issues I

In almost all the algorithms, the step of computation of bf needs an optimization algorithm. The criterion to optimize is often in the form of J(f) = kg − Hfk22 + λkDfk22 or J(z) = kg − HDzk2 + λkzk1

I

Very often, we use the gradient based algorithms which need to compute ∇J(f) = −2Ht (g − Hf) + 2λDt Df and so, the simplest case, in each h step, we have i (k+1) (k) (k) (k) (k) bf = bf + α Ht (g − Hbf ) + 2λDt Dbf

b = Hbf (Forward projection) 1. Compute g b (Error or residual) 2. Compute δg = g − g

3. Compute δf 1 = H0 δg (Backprojection of error) (k+1)

4. Compute δf 2 = −D0 Dbf and update bf I

(k)

= bf

+ [δf 1 + δf 2 ]

Steps 1 and 3 need great computational cost and have been implemented on GPU. In this work, we used ASTRA Toolbox.

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1

Multi-Resolution Implementation Sacle 1: black g(1) = H(1) f (1) ( N × N ) Sacle 2: green g(2) = H(2) f (2) (N/2 × N/2) Sacle 3: red g(3) = H(3) f (3) (N/4 × N/4)

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1

Results Phantom: 128 x 128 x 128 Projections: 128, 64 and 32 SNR=40 dB

128 projections

64 projections

32 projections

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1

Results Phantom: 128 x 128 x 128 Projections: 128 SNR=20 dB

QR

TV

HHBM

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1

Results with High SNR=30dB

b δf = kf −f k kf k

δg =

b kHf −gk kg k

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1

Results: High SNR=30dB and Low SNR=20dB

b δf = kf −f k High SNR=30dB kf k

b δf = kf −f k Low SNR=20dB kf k

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2

Variational Bayesian Approximation (VBA) Depending on cases, we have to handle p(f, θ|g), p(f, z, θ|g), p(f, w, θ|g) or p(f, w, z, θ|g). Let consider the simplest case: I Approximate p(f, θ|g) by q(f, θ|g) = q1 (f|g) q2 (θ|g) and then continue computations. I Criterion KL(q(f, θ|g) : p(f, θ|g)) R RR RR I KL(q : p) = q ln q/p = q1 q2 ln q1pq2 = q1 ln q1 + R RR q2 ln q2 − q ln p = −H(q1 ) − H(q2 )− < ln p >q I Iterative algorithm q1 −→ q2 −→ q1 −→ q2 , · · ·   q1 (f)

i h ∝ exp hln p(g, f, θ; M)iq2 (θ ) h i  q2 (θ) ∝ exp hln p(g, f, θ; M)i q1 (f ) p(f, θ|g) −→

Variational Bayesian Approximation

b1 (f) −→ bf −→ q b b2 (θ) −→ θ −→ q

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2

JMAP, Marginalization, Sampling and exploration, VBA I

JMAP: −→ bf p(f, θ|g) optimization −→ θ b

I

Marginalization p(f, θ|g) −→

Alternate Optimization

p(θ|g)



f = arg maxf {p(f|θ, g)} θ = arg maxθ {p(θ|fg)}

b −→ p(f|θ, b g) −→ bf −→ θ

Joint Posterior Marginalize over f I

Sampling and Exploration I I

I

Gibbs sampling: f ∼ p(f|θ, g) → θ ∼ p(θ|fg) Other sampling methods: IS, MH, Slice sampling,...

Variational Bayesian Approximation p(f, θ|g) −→

Variational Bayesian Approximation

−→ q1 (f) −→ bf b −→ q2 (θ) −→ θ

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2

VBA: Choice of family of laws q1 and q2 Case 1 : −→ Joint MAP  n o ( e M) ef = arg max p(f, θ|g; e e b1 (f|f) = δ(f − f) q f n o −→ e e e b2 (θ|θ) = δ(θ − θ) θ= arg max p(ef, θ|g; M) q θ I

I



(

Case 2 : −→ EM

 e M)i b1 (f) q ∝ p(f|θ, g) Q(θ, θ)= hln p(f, θ|g; q1 (o f |θe ) n −→ e = δ(θ − θ) e θ e e b2 (θ|θ) q = arg maxθ Q(θ, θ) I

(1)

(2)

Appropriate choice for inverse problems

 e g; M) Accounts for the uncertainties of b1 (f) ∝ p(f|θ, q −→ b e θ for bf and vise versa. b2 (θ) ∝ p(θ|f, g; M) q (3) Exponential families, Conjugate priors

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2

JMAP, EM and VBA JMAP Alternate optimization Algorithm: n o e ef = arg max p(f, θ|g) e −→ef −→ bf θ (0) −→ θ−→ f ↑ ↓ n o b ←− θ←− e e = arg max p(ef, θ|g) ←−ef θ θ θ EM: e θ (0) −→ θ−→ ↑ b ←− θ←− e θ

e g) q1 (f) = p(f|θ, e = hln p(f, θ|g)i Q(θ, θ) q1o (f ) n e = arg max Q(θ, θ) e θ θ

−→q1 (f) −→ bf ↓ ←− q1 (f)

VBA: h i θ (0) −→ q2 (θ)−→ q1 (f) ∝ exp hln p(f, θ|g)iq2 (θ ) −→q1 (f) −→ bf ↑ ↓ h i b θ ←− q2 (θ)←− q2 (θ) ∝ exp hln p(f, θ|g)iq1 (f ) ←−q1 (f)

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2

VBA Approximate p(f, z, vz , vξ , v |g) by q1 (f) q2 (z) q3 (vz ) q4 (vξ ) q5 (v ) Alternate optimization:  ef ) q1 (f) = N (f|e µf , Σ     ez)  µz , Σ q2 (z) = NQ(z|e q3 (vz ) = j IG(v zj |e αzj , βezj )  Q  e q4 (vξ ) = j IG(v ξ |e  i αξj , βξj )  Q  αi , βei ) q5 (v )= i IG(vi |e −1 µ  e f = H0 V−1 H + v−1 H0 V−1 z)    (g − Db ξ     −1   e f = H0 V−1 H + v−1 Σ  ξ  −1  −1 0 −1 b  e µz = D Vξ D + vz D0 V−1  ξ f      −1  Σ e z = D0 V−1 D + v−1 z ξ

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2

JMAP Algorithm [bf, b z, b vz , b v , b vξ ] = arg max(f ,z,vz ,v ,vξ ) {p (f, z, vz , v , vξ |g)} (k+1) (k) (k) (k) iter : bf = bf − γ bf ∇J (bf ) (k) iter : b z(k+1) = b z(k) − γ bz ∇J (b z(k) )  h i b 2 β0 + 21 g i − Hf z 2j βz0 + 12 b i vbzj = αz +3/2 , vˆi = , vc ξj = α +3/2 0

 2 b] βξ0 + 12 b f j −[Dz j

0

αξ0 +3/2

The expressions of vbzj , vˆi and vc ξ j are more complex and needs the computation of the diagonal elements of the posterior covariances. Specific techniques are needed to compute them efficiently. vbzj =

βz0 + 12 αz0 +3/2

, vˆi =

β0 + 21

D

h i  b 2 g i − Hf i

α0 +3/2

, vc ξj =

βξ0 + 12

 2  b b] f j −[Dz j αξ0 +3/2

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2

Conclusions I

Computed Tomography is an ill-posed Inverse problem

I

Algebraic methods and Regularization methods push further the limitations.

I

Bayesian approach has many potentials for real applications where we need to estimate the hyper parameters (semi-supervised) and to quantify the uncertainties.

I

Hierarchical prior model with hidden variables are very powerful tools for Bayesian approach to inverse problems.

I

Main Bayesian computation tools: JMAP, VBA, AMP and MCMC

I

Application in different other imaging systems: Microwaves, PET, Ultrasound, Optical Diffusion Tomography (ODT),..

I

Current Projects: Efficient implementation in 3D cases to reconstruct 1024 x 1024 x 1024 volumes using GPU.

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2