. A Student-t based sparsity enforcing hierarchical prior for linear inverse problems and its efficient Bayesian computation for 2D and 3D Computed Tomography Ali Mohammad-Djafari, Li Wang, Nicolas Gac and Folker Bleichrodt Laboratoire des Signaux et Syst`emes (L2S) UMR8506 CNRS-CentraleSup´elec-UNIV PARIS SUD SUPELEC, 91192 Gif-sur-Yvette, France http://lss.centralesupelec.fr Email:
[email protected] http://djafari.free.fr http://publicationslist.org/djafari iTwist2016, Aug. 24-26, 2016, Aalborg, Denemark
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1
Contents 1. Computed Tomography in 2D and 3D 2. Main classical methods 3. Basic Bayesian approach 4. Sparsity enforcing models through Student-t and IGSM 5. Computational tools: JMAP, EM, VBA 6. Implementation issues I
I
Main GPU implementation steps: Forward and Back Projections Multi-Resolution implementation
7. Some results 8. Conclusions
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2
Computed Tomography: Seeing inside of a body I
f (x, y ) a section of a real 3D body f (x, y , z)
I
gφ (r ) a line of observed radiography gφ (r , z)
I
Forward model: Line integrals or Radon Transform Z gφ (r ) = f (x, y ) dl + φ (r ) L
ZZ r ,φ = f (x, y ) δ(r − x cos φ − y sin φ) dx dy + φ (r ) I
Inverse problem: Image reconstruction Given the forward model H (Radon Transform) and a set of data gφi (r ), i = 1, · · · , M find f (x, y )
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 3
2D and 3D Computed Tomography 3D
2D
Z gφ (r1 , r2 ) =
Z f (x, y , z) dl
Lr1 ,r2 ,φ
gφ (r ) =
f (x, y ) dl Lr ,φ
Forward probelm: f (x, y ) or f (x, y , z) −→ gφ (r ) or gφ (r1 , r2 ) Inverse problem: gφ (r ) or gφ (r1 , r2 ) −→ f (x, y ) or f (x, y , z)
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 4
Algebraic methods: Discretization y 6
S•
Hij
r
@ @
Q Q
f1 Q
@ @ @ f (x, y )@ @@ @ @ φ @ @ HH @ H
QQ fjQ Q Q Q Qg -
x
P f b (x, y ) j j j 1 if (x, y ) ∈ pixel j bj (x, y ) = 0 else f (x, y ) =
@ @ •D
@ @
g (r , φ) Z g (r , φ) =
f (x, y ) dl L
i
fN
gi =
N X
Hij fj + i
j=1
gk = Hk f + k , k = 1, · · · , K −→ g = Hf + gk projection at angle φk , g all the projections.
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 5
Algebraic methods
g1 H1 X .. .. Hk f+k = Hf+ g = . , H = . → gk = Hk f+k → g = k gK HK I I I I I I
I
H is huge dimensional: 2D: 106 × 106 , 3D: 109 × 109 . Hf corresponds to forward projection Ht g corresponds to Back projection (BP) H may not be invertible and even not square H is, in general, ill-conditioned In limited angle tomography H is under determined, si the problem has infinite number of solutions Minimum Norm Solution X bf = Ht (HHt )−1 g = Htk (Hk Htk )−1 gk k
can be interpreted as the Filtered Back Projection solution.
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 6
Prior information or constraints fj > 0 or fj ∈ IR+
I
Positivity:
I
Boundedness:
I
Smoothness: fj depends on the neighborhoods.
I
Sparsity: many fj are zeros.
I
Sparsity in a transform domain: f = Dz and many zj are zeros.
I
Discrete valued (DV):
I
Binary valued (BV):
I
Compactness: f (r) is non zero in one or few non-overlapping compact regions
I
Combination of the above mentioned constraints Main mathematical questions:
I
I I
1 > fj ) > 0 or fj ∈ [0, 1]
fj ∈ {0, 1, ..., K } fj ∈ {0, 1}
Which combination results to unique solution ? How to apply them ?
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 7
Algebraic methods: Regularization I I I
Minimum Norm Solution: minimize kfk22 s.t. Hf = g −→ bf = Ht (HHt )−1 g Least square Solution: bf = arg min J(f) = kg − Hfk2 → bf = (Ht H)−1 Ht g f Quadratic Regularization: J(f) = kg − Hfk2 + λkfk22 −→ bf = (Ht H + λI)−1 Ht g
I I I
L1 Regularization: J(f) = kg − Hfk2 + λkfk1 Lpq Regularization: J(f) = kg − Hfkpp + λkDfkqq More general Regularization: X X J(f) = φ(g i − [Hf]i ) + λ ψ(Df]j ) i
j
with convex potential functions φ and Ψ or J(f) = ∆1 (g, Hf) + λ∆2 (f, f 0 ) with ∆1 and ∆2 any distances (L2, L2, ..) or divergence (KL)
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 8
Deterministic approaches I
I I I
I
I I
I
Iterative methods: SIRT, ART, Quadratic or L1 regularization, Bloc Coordinate Descent, Multiplicative ART,... Criteria: J(f) = kg − Hfk2 + λkDfk2 Gradient based algorithms: ∇J(f) = −2Ht (g − Hf) + 2λDt Df Simplest algorithm: h i bf (k+1) = bf (k) + α(k) Ht (g − Hbf (k) ) + 2λDt Dbf (k) More criteria: P P J(f) = i φ(g i − [Hf]i ) + λ j ψ((Df]j ) with φ(t) and ψ(t) = {t 2 , |t|, |t|p , ...} or non convex ones. Imposing constraints in each iteration (example: DART) Mathematical studies of uniqueness and convergence of these algorithms are necessary Many specialized algorithms (ISTA, FISTA, ADMM, AMP, GAMP,...) are developped for L1 regularization.
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 9
Bayesian estimation approach M: I
I I I
g = Hf +
Observation model M + Hypothesis on the noise −→ p(g|f; M) = p (g − Hf) A priori information p(f|M) p(g|f; M) p(f|M) Bayes : p(f|g; M) = p(g|M) Gaussian priors: I I I I
I
Prior knowledge on the noise: ∼ N (0, v 2 I) Prior knowledge on f: f ∼ hN (0, vf2 (D0 D)−1 ) i A posteriori: p(f|g) ∝ exp − 2v1 2 kg − Hfk2 − 2v1 2 kDfk2 f MAP : bf = arg maxf {p(f|g)} = arg minf {J(f)} 2 with J(f) = kg − Hfk2 + λkDfk2 , λ = vv2 f Advantage : characterization of the solution b p(f|g) = N (bf, Σ) with bf = H0 H + λD0 D −1 H0 g, Σ b = v H0 H + λD0 D −1
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1
Sparsity enforcing models I
I
3 classes of models: 1- Generalized Gaussian (Double Exp. as particular case) 2- Mixture models (2 Gaussians mixture, BG, ..) 3- Heavy tailed (Student-t and Cauchy) Student-t model ν+1 2 log 1 + f /ν St(f |ν) ∝ exp − 2
Infinite Gausian Scaled Mixture (IGSM) equivalence Z ∞ St(f |ν) ∝ N (f |, 0, 1/z) G(z|α, β) dz, with α = β = ν/2 0 i h P Q Q p(f|z) = j p(fj |z j ) = j N (fj |0, 1/z j ) ∝ exp − 21 j z j fj2 Q Q (α−1) p(z|α, β) = j G(z |α, β) ∝ j j zj iexp [−βz j ] hP (α − 1) ln z j − βz j ∝ exp h jP i p(f, z|α, β) ∝ exp − 21 j z j fj2 + (α − 1) ln z j − βz j I
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1
Sparse model in a Transform domain
f piecewise continuous
gradient of f Sparse
z =Haar Transform of f Sparse
I
Analysis: J(f) = kg − Hfk22 + λkDfk1
I
Synthesis: f = Dz −→ J(z) = kg − HDzk2 + λkzk1
I
Explicit modelling f = Dz + ξ,
z sparse, ξ Gaussian with unknown variance
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1
Sparse model in a Transform domain g = Hf + , f = Dz + ξ, z sparse p(g|f, v ) = N (g|Hf, V ) V = diag [v ] p(f|z) = N (f|Dz, Vξ I) Vξ = diag [vξ ] αξ0 , βξ0 αz0 , βz0 p(z|vz ) = N (z|0, V ), V z z = diag [vz ] Q ? ? vξ vz α , β p(v ) = Qi IG(vi |α0 , β0 ) 0 0 p(vz ) = j IG(v zj |αz0 , βz0 ) ? ? ? p(v ) = Q IG(v |α , β ) ξ0 ξ ξ j ξ0 j v z ξ p(f, z, v , vz , v ξ |g) ∝p(g|f, v ) p(f|zf ) p(z|vz ) D ? @ ? p(v ) p(vz ) p(v ξ ) R f @ – JMAP: (bf, b z, vˆ , b vz , vbξ ) = arg max {p(f, z, v , vz , v ξ |g)} H ? (f ,z,v ,vz ,v ξ ) g Alternate optimization. – VBA: Approximate p(f, z, v , vz , v ξ |g) by q1 (f) q2 (z) q3 (v ) q4 (vz ) q5 (v ξ ) Alternate optimization.
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1
JMAP Algorithm [bf, b z, b vz , b v , b vξ ] = arg max(f ,z,vz ,v ,vξ ) {p (f, z, vz , v , vξ |g)} (k+1) (k) (k) (k) iter : bf = bf − γ bf ∇J (bf ) (k) iter : b z(k+1) = b z(k) − γ bz ∇J (b z(k) ) h i b 2 β0 + 21 g i − Hf z 2j βz0 + 12 b i , vc vbzj = αz +3/2 , vˆi = ξj = α +3/2 0
0
2 b] βξ0 + 12 b f j −[Dz j αξ0 +3/2
where 0 −1 1 J (f) = 12 (g − Hf)0 V−1 (g − Hf) + 2 (f − Dz) Vξ (f − Dz) 0 −1 1 0 J (z) = 21 (f − Dz) V−1 z ξ (f − Dz) + 2 z Vz (k)
=
(k)
=
γ bf
γ bz
(k) 2
∇J (b
f )
2
2 (k)
b
b b(k)
Y H∇J (b
+ Y
f ) ∇J ( f ) ξ
2 (k)
b ( z )
∇J
2 (k) 2 b
b b ) + Y z ∇J (z b (k) )
Y ξ D∇J (z
1
1
b = V b − 2 and Y bξ = V b−2 where Y ξ 1
bz = V b−2 where Y z
and ∇J (·) is the gradient of J (·).
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1
Implementation issues I
In almost all the algorithms, the step of computation of bf needs an optimization algorithm. The criterion to optimize is often in the form of J(f) = kg − Hfk22 + λkDfk22 or J(z) = kg − HDzk2 + λkzk1
I
Very often, we use the gradient based algorithms which need to compute ∇J(f) = −2Ht (g − Hf) + 2λDt Df and so, the simplest case, in each h step, we have i (k+1) (k) (k) (k) (k) bf = bf + α Ht (g − Hbf ) + 2λDt Dbf
b = Hbf (Forward projection) 1. Compute g b (Error or residual) 2. Compute δg = g − g
3. Compute δf 1 = H0 δg (Backprojection of error) (k+1)
4. Compute δf 2 = −D0 Dbf and update bf I
(k)
= bf
+ [δf 1 + δf 2 ]
Steps 1 and 3 need great computational cost and have been implemented on GPU. In this work, we used ASTRA Toolbox.
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1
Multi-Resolution Implementation Sacle 1: black g(1) = H(1) f (1) ( N × N ) Sacle 2: green g(2) = H(2) f (2) (N/2 × N/2) Sacle 3: red g(3) = H(3) f (3) (N/4 × N/4)
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1
Results Phantom: 128 x 128 x 128 Projections: 128, 64 and 32 SNR=40 dB
128 projections
64 projections
32 projections
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1
Results Phantom: 128 x 128 x 128 Projections: 128 SNR=20 dB
QR
TV
HHBM
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1
Results with High SNR=30dB
b δf = kf −f k kf k
δg =
b kHf −gk kg k
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1
Results: High SNR=30dB and Low SNR=20dB
b δf = kf −f k High SNR=30dB kf k
b δf = kf −f k Low SNR=20dB kf k
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2
Variational Bayesian Approximation (VBA) Depending on cases, we have to handle p(f, θ|g), p(f, z, θ|g), p(f, w, θ|g) or p(f, w, z, θ|g). Let consider the simplest case: I Approximate p(f, θ|g) by q(f, θ|g) = q1 (f|g) q2 (θ|g) and then continue computations. I Criterion KL(q(f, θ|g) : p(f, θ|g)) R RR RR I KL(q : p) = q ln q/p = q1 q2 ln q1pq2 = q1 ln q1 + R RR q2 ln q2 − q ln p = −H(q1 ) − H(q2 )− < ln p >q I Iterative algorithm q1 −→ q2 −→ q1 −→ q2 , · · · q1 (f)
i h ∝ exp hln p(g, f, θ; M)iq2 (θ ) h i q2 (θ) ∝ exp hln p(g, f, θ; M)i q1 (f ) p(f, θ|g) −→
Variational Bayesian Approximation
b1 (f) −→ bf −→ q b b2 (θ) −→ θ −→ q
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2
JMAP, Marginalization, Sampling and exploration, VBA I
JMAP: −→ bf p(f, θ|g) optimization −→ θ b
I
Marginalization p(f, θ|g) −→
Alternate Optimization
p(θ|g)
f = arg maxf {p(f|θ, g)} θ = arg maxθ {p(θ|fg)}
b −→ p(f|θ, b g) −→ bf −→ θ
Joint Posterior Marginalize over f I
Sampling and Exploration I I
I
Gibbs sampling: f ∼ p(f|θ, g) → θ ∼ p(θ|fg) Other sampling methods: IS, MH, Slice sampling,...
Variational Bayesian Approximation p(f, θ|g) −→
Variational Bayesian Approximation
−→ q1 (f) −→ bf b −→ q2 (θ) −→ θ
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2
VBA: Choice of family of laws q1 and q2 Case 1 : −→ Joint MAP n o ( e M) ef = arg max p(f, θ|g; e e b1 (f|f) = δ(f − f) q f n o −→ e e e b2 (θ|θ) = δ(θ − θ) θ= arg max p(ef, θ|g; M) q θ I
I
(
Case 2 : −→ EM
e M)i b1 (f) q ∝ p(f|θ, g) Q(θ, θ)= hln p(f, θ|g; q1 (o f |θe ) n −→ e = δ(θ − θ) e θ e e b2 (θ|θ) q = arg maxθ Q(θ, θ) I
(1)
(2)
Appropriate choice for inverse problems
e g; M) Accounts for the uncertainties of b1 (f) ∝ p(f|θ, q −→ b e θ for bf and vise versa. b2 (θ) ∝ p(θ|f, g; M) q (3) Exponential families, Conjugate priors
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2
JMAP, EM and VBA JMAP Alternate optimization Algorithm: n o e ef = arg max p(f, θ|g) e −→ef −→ bf θ (0) −→ θ−→ f ↑ ↓ n o b ←− θ←− e e = arg max p(ef, θ|g) ←−ef θ θ θ EM: e θ (0) −→ θ−→ ↑ b ←− θ←− e θ
e g) q1 (f) = p(f|θ, e = hln p(f, θ|g)i Q(θ, θ) q1o (f ) n e = arg max Q(θ, θ) e θ θ
−→q1 (f) −→ bf ↓ ←− q1 (f)
VBA: h i θ (0) −→ q2 (θ)−→ q1 (f) ∝ exp hln p(f, θ|g)iq2 (θ ) −→q1 (f) −→ bf ↑ ↓ h i b θ ←− q2 (θ)←− q2 (θ) ∝ exp hln p(f, θ|g)iq1 (f ) ←−q1 (f)
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2
VBA Approximate p(f, z, vz , vξ , v |g) by q1 (f) q2 (z) q3 (vz ) q4 (vξ ) q5 (v ) Alternate optimization: ef ) q1 (f) = N (f|e µf , Σ ez) µz , Σ q2 (z) = NQ(z|e q3 (vz ) = j IG(v zj |e αzj , βezj ) Q e q4 (vξ ) = j IG(v ξ |e i αξj , βξj ) Q αi , βei ) q5 (v )= i IG(vi |e −1 µ e f = H0 V−1 H + v−1 H0 V−1 z) (g − Db ξ −1 e f = H0 V−1 H + v−1 Σ ξ −1 −1 0 −1 b e µz = D Vξ D + vz D0 V−1 ξ f −1 Σ e z = D0 V−1 D + v−1 z ξ
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2
JMAP Algorithm [bf, b z, b vz , b v , b vξ ] = arg max(f ,z,vz ,v ,vξ ) {p (f, z, vz , v , vξ |g)} (k+1) (k) (k) (k) iter : bf = bf − γ bf ∇J (bf ) (k) iter : b z(k+1) = b z(k) − γ bz ∇J (b z(k) ) h i b 2 β0 + 21 g i − Hf z 2j βz0 + 12 b i vbzj = αz +3/2 , vˆi = , vc ξj = α +3/2 0
2 b] βξ0 + 12 b f j −[Dz j
0
αξ0 +3/2
The expressions of vbzj , vˆi and vc ξ j are more complex and needs the computation of the diagonal elements of the posterior covariances. Specific techniques are needed to compute them efficiently. vbzj =
βz0 + 12 αz0 +3/2
, vˆi =
β0 + 21
D
h i b 2 g i − Hf i
α0 +3/2
, vc ξj =
βξ0 + 12
2 b b] f j −[Dz j αξ0 +3/2
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2
Conclusions I
Computed Tomography is an ill-posed Inverse problem
I
Algebraic methods and Regularization methods push further the limitations.
I
Bayesian approach has many potentials for real applications where we need to estimate the hyper parameters (semi-supervised) and to quantify the uncertainties.
I
Hierarchical prior model with hidden variables are very powerful tools for Bayesian approach to inverse problems.
I
Main Bayesian computation tools: JMAP, VBA, AMP and MCMC
I
Application in different other imaging systems: Microwaves, PET, Ultrasound, Optical Diffusion Tomography (ODT),..
I
Current Projects: Efficient implementation in 3D cases to reconstruct 1024 x 1024 x 1024 volumes using GPU.
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2