. Bayesian Discrete Tomography from a few number of projections Ali Mohammad-Djafari Laboratoire des Signaux et Syst`emes (L2S) UMR8506 CNRS-CentraleSup´elec-UNIV PARIS SUD SUPELEC, 91192 Gif-sur-Yvette, France http://lss.centralesupelec.fr Email:
[email protected] http://djafari.free.fr http://publicationslist.org/djafari Invited talk at Workshop on Discrete Tomography, Polytechnico de Milan, Italy
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1
Contents 1. Limited angle Tomography 2. Basic Bayesian approach 3. Two main steps: I I
Choosing appropriate Prior model Do the computational efficiently
4. Hierarchical prior modelling I I
Sparsity enforcing models through Student-t and IGSM Gauss-Markov-Potts models
5. Computational tools: JMAP, Gibbs Sampling MCMC, VBA 6. Case study: Image Reconstruction with only two projections 7. Implementation issues I
I
Main GPU implementation steps: Forward and Back Projections Multi-Resolution implementation
8. Conclusions
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2
Limited angle Tomography: Limitations of analytical methods
Original
Data
Backprojection Filtered Backprojection
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 3
Algebraic methods: Discretization y 6
S•
Hij
r
@ @
Q Q
f1 Q
@ @ @ f (x, y )@ @@ @ @ φ @ @ HH @ H
QQ fjQ Q Q Q Qg -
x
P f b (x, y ) j j j 1 if (x, y ) ∈ pixel j bj (x, y ) = 0 else f (x, y ) =
@ @ •D
@ @
g (r , φ) Z g (r , φ) =
f (x, y ) dl L
i
fN
gi =
N X
Hij fj + i
j=1
gk = Hk f + k , k = 1, · · · , K −→ g = Hf + gk projection at angle φk , g all the projections.
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 4
Algebraic methods
g1 H1 X .. .. Hk f+k = Hf+ g = . , H = . → gk = Hk f+k → g = k gK HK I I I I I I
I
H is huge dimensional: 2D: 106 × 106 , 3D: 109 × 109 . Hf corresponds to forward projection Ht g corresponds to Back projection (BP) H may not be invertible and even not square H is, in general, ill-conditioned In limited angle tomography H is under determined, si the problem has infinite number of solutions Minimum Norm Solution X bf = Ht (HHt )−1 g = Htk (Hk Htk )−1 gk k
can be interpreted as the Filtered Back Projection solution.
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 5
Algebraic methods I
I
I
Minimum Norm Solution: minimize kfk22 s.t. Hf = g −→ bf = Ht (HHt )−1 g Least square Solution: bf = arg min J(f) = kg − Hfk2 → bf = (Ht H)−1 Ht g f Quadratic Regularization: J(f) = kg − Hfk2 + λkfk22 −→ bf = (Ht H + λI)−1 Ht g
I I I
L1 Regularization: J(f) = kg − Hfk2 + λkfk1 Lpq Regularization: J(f) = kg − Hfkpp + λkDfkqq More general Regularization: X X J(f) = φ(g i − [Hf]i ) + λ ψ((Df]j ) i
j
J(f) = ∆1 (g, Hf) + λ∆2 (f, f 0 ) with ∆1 and ∆2 any distances (L2, L2, ..) or divergence (KL)
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 6
y
6 Computed Tomography with only two projections
Hij f (x, y )
Q Q
f1 Q QQ fjQ Q Q Q Qg
i
fN
@ @ -
x
HH @
Z g (r , φ) =
f (x, y ) dl PL f (x, y ) = j fj bj (x, y ) 1 if (x, y ) ∈ pixel j bj (x, y ) = 0 else N X gi = Hij fj + i j=1
g = Hf +
Case study: Reconstruction from 2 projections R g1 (x) = R f (x, y ) dy , g2 (y ) = f (x, y ) dx Very ill-posed inverse problem f (x, y ) = g1 (x) g2 (y ) Ω(x, y ) RΩ(x, y ) is a Copula: R Ω(x, y ) dx = 1 Ω(x, y ) dy = 1
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 7
Simple example
1 2 3 g1 g2 g3 g4
3 4 ? 4 6 ? 7 3 1 0 = 1 0
I
? 4 f1 f3 g3 1 -1 ? 6 f2 f4 g4 -1 1 7 g1 g2 0 0 f1 f4 f1 1 0 0 f2 f2 f5 0 1 1 f3 f6 f3 0 1 0 g1 g2 f4 1 0 1 Hf = g −→ bf = H−1 g if H invertible.
0 0
-1 1 0
1 0 -1 0 0
I
H is rank deficient: rank(H) = 3
I
Problem has infinite number of solutions.
I
How to find all those solutions ?
I
Which one is the good one? Needs prior information.
I
To find an unique solution, one needs either more data or prior information.
f7 g4 f8 g5 f9 g6 g3
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 8
Prior information or constraints fj > 0 or fj ∈ IR+
I
Positivity:
I
Boundedness:
I
Smoothness: fj depends on the neighborhoods.
I
Sparsity: many fj are zeros.
I
Sparsity in a transform domain: f = Dz and many zj are zeros.
I
Discrete valued (DV):
I
Binary valued (BV):
I
Compactness: f (r) is non zero in one or few non-overlapping compact regions
I
Combination of the above mentioned constraints Main mathematical questions:
I
I I
1 > fj ) > 0 or fj ∈ [0, 1]
fj ∈ {0, 1, ..., K } fj ∈ {0, 1}
Which combination results to unique solution ? How to apply them ?
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 9
Deterministic approaches I
I I I
I
Iterative methods: SIRT, ART, Quadratic or L1 regularization, Bloc Coordinate Descent, Multiplicative ART,... Criteria: J(f) = kg − Hfk2 + λkDfk2 Gradient based algorithms: ∇J(f) = −2Ht (g − Hf) + 2λDt Df Simplest algorithm: h i bf (k+1) = bf (k) + α(k) Ht (g − Hbf (k) ) + 2λDt Dbf (k) More criteria: P P J(f) = i φ(g i − [Hf]i ) + λ j ψ((Df]j ) with φ(t) and ψ(t) = {t 2 , |t|, |t|p , ...} or J(f) = ∆1 (g, Hf) + λ∆2 (f, f 0 )
I I
Imposing constraints in each iteration (example: DART) Mathematical studies of uniqueness and convergence of these algorithms are necessary
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1
Bayesian estimation approach M: I
I I I
g = Hf +
Observation model M + Hypothesis on the noise −→ p(g|f; M) = p (g − Hf) A priori information p(f|M) p(g|f; M) p(f|M) Bayes : p(f|g; M) = p(g|M) Maximum A Posteriori (MAP) : bf = arg max {p(f|g)} = arg max {p(g|f) p(f)} f f = arg min {J(f) = − ln p(g|f) − ln p(f)} f
I
Link with Regularization: bf = arg min {J(f) = ∆1 (g, Hf) + λR(f)} f with ∆1 (g, Hf) = − ln p(g|f)
and
λR(f) = − ln p(f)
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1
Case of linear models and Gaussian priors I
g = Hf + Prior knowledge on the noise:
I
I
I
I
1 2 ∼ N (0, v I) → p(g|f) ∝ exp − 2 kg − Hfk 2v Prior knowledge on f: 1 2 2 0 −1 f ∼ N (0, vf (D D) ) → p(f) ∝ exp − 2 kDfk 2vf A posteriori: 1 1 2 2 p(f|g) ∝ exp − 2 kg − Hfk − 2 kDfk 2v 2vf MAP : bf = arg maxf {p(f|g)} = arg minf {J(f)} 2 with J(f) = kg − Hfk2 + λkDfk2 , λ = vv2 f Advantage : characterization of the solution b p(f|g) = N (bf, Σ) with bf = H0 H + λD0 D −1 H0 g, Σ b = v H0 H + λD0 D −1 2
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1
MAP estimation with other priors: bf = arg min {J(f)} with J(f) = 1 kg − Hfk2 + Ω(f) v f Separable priors: I
Gaussian: P p(fj ) ∝ exp −α|fj |2 −→ Ω(f) = α j |fj |2 = kfk22
I
Generalized Gaussian: P p(fj ) ∝ exp [−α|fj |p ] , 1 < p < 2 → Ω(f) = α j |fj |p = kfkpp P Gamma: p(fj ) ∝ fjα exp [−βfj ] −→ Ω(f) = α j ln fj + βfj
I I
Beta: P P p(fj ) ∝ fjα (1 − fj )β −→ Ω(f) = α j ln fj + β j ln(1 − fj )
Markovian models: p(fj |f) ∝ exp −α
X i∈Nj
φ(fj , fi ) −→
Ω(f) = α
XX j
φ(fj , fi ),
i∈Nj
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1
Sparsity enforcing models I
I
I
3 classes of models: 1- Generalized Gaussian, 2- Mixture models and 3- Heavy tailed (Cauchy and Student-t) Student-t model ν+1 2 log 1 + f /ν St(f |ν) ∝ exp − 2 Infinite Gausian Scaled Mixture (IGSM) equivalence Z ∞ St(f |ν) ∝ N (f |, 0, 1/z) G(z|α, β) dz, with α = β = ν/2 0
p(f|z) p(z|α, β) p(f, z|α, β)
i h 1P 2 N (f |0, 1/z ) ∝ exp − z f j j j j j j 2 Q Q (α−1) = j G(z hPj |α, β) ∝ j z j iexp [−βz j ] ∝ exp (α − 1) ln z j − βz j h jP i ∝ exp − 21 j z j fj2 + (α − 1) ln z j − βz j =
Q
j p(fj |z j ) =
Q
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1
Non stationary noise and sparsity enforcing model – Non stationary noise: g = Hf+, i ∼ N (i |0, vi ) → ∼ N (|0, V = diag [v1 , · · · , vM ]) – Student-t prior model and its equivalent IGSM : f j |vfj ∼ N (f j |0, vfj ) and vfj ∼ IG(vfj |αf0 , βf0 ) → f j ∼ St(f j |αf0 , βf0 )
p(g|f, v ) = N (g|Hf, V ), V = diag [v ] p(f|vf ) = N (g|0, Vf ), Vf = diag [vf ] Q ? ? p(v ) = Qi IG(vi |α0 , β0 ) vf v p(vf ) = i IG(vfj |αf0 , βf0 ) ? ? p(f, v , vf |g) ∝ p(g|f, v ) p(f|vf ) p(v ) p(vf ) f – JMAP: (b f, vˆ , vˆf ) = arg max(f ,v ,vf ) {p(f, v , vf |g)} H ? – VBA: Approximate p(f, v , vf |g) g by q1 (f) q2 (v ) q3 (vf )
αf0 , βf0 α0 , β0
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1
Sparse model in a Transform domain 1 g = Hf + , f = Dz, z sparse p(g|z, v ) = N (g|HDf, v I) Vz = diag [vz ] p(z|vz ) = N (z|0, Vz ), p(v ) = IG(v αz0 , βz0 Q |α0 , β0 ) p(v ) = z i IG(vz j |αz0 , βz0 ) ? p(z, v , vz , v ξ |g) ∝p(g|z, v ) p(z|vz ) p(v ) p(vz ) p(v ξ ) vz α , β 0 0 – JMAP: ? ? (b z, vˆ , b vz ) = arg max {p(z, v , vz |g)} v z (z,v ,vz ) D ? Alternate optimization: ? b z = arg minz {J(z)} with: f −1/2 2 1 zk J(z) = 2vˆ kg − HDzk2 + kVz H 2 βz0 +b zj ? vbzj = αz +1/2 g 0 vˆ = β0 +kg−HDzbk2 α0 +M/2 – VBA: Approximate p(z, v , vz , v ξ |g) by q1 (z) q2 (v ) q3 (vz ) Alternate optimization.
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1
Sparse model in a Transform domain 2 g = Hf + , f = Dz + ξ, z sparse p(g|f, v ) = N (g|Hf, v I) p(f|z) = N (f|Dz, v ξ I), αξ0 , βξ0 αz0 , βz0 p(z|v Vz = diag [vz ] z ) = N (z|0, Vz ), ? ? vξ vz α , β p(v ) = IG(v Q |α0 , β0 ) 0 0 p(vz ) = i IG(vz j |αz0 , βz0 ) ? ? ? p(v ) = IG(v |α , β ) ξ0 ξ ξ ξ0 v z ξ p(f, z, v , vz , v ξ |g) ∝p(g|f, v ) p(f|zf ) p(z|vz ) D ? @ ? p(v ) p(vz ) p(v ξ ) R f @ – JMAP: (bf, b z, vˆ , b vz , vbξ ) = arg max {p(f, z, v , vz , v ξ |g)} H ? (f ,z,v ,vz ,v ξ ) g Alternate optimization. – VBA: Approximate p(f, z, v , vz , v ξ |g) by q1 (f) q2 (z) q3 (v ) q4 (vz ) q5 (v ξ ) Alternate optimization.
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1
Gauss-Markov-Potts prior models for images
f (r)
z(r)
c(r) = 1 − δ(z(r) − z(r0 ))
p(f (r)|z(r) = k, mk , vk ) = N (mk , vk ) X p(f (r)) = P(z(r) = k) N (mk , vk ) Mixture of Gaussians I I
k Q Separable iid hidden variables: p(z) = r p(z(r)) Markovian hidden variables: p(z) Potts-Markov: X p(z(r)|z(r0 ), r0 ∈ V(r)) ∝ exp γ δ(z(r) − z(r0 )) r0 ∈V(r) X X p(z) ∝ exp γ δ(z(r) − z(r0 )) r∈R r0 ∈V(r)
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1
Four different cases To each pixel of the image is associated 2 variables f (r) and z(r) I
f|z Gaussian iid, z iid : Mixture of Gaussians
I
f|z Gauss-Markov, z iid : Mixture of Gauss-Markov
I
f|z Gaussian iid, z Potts-Markov : Mixture of Independent Gaussians (MIG with Hidden Potts)
I
f|z Markov, z Potts-Markov : Mixture of Gauss-Markov (MGM with hidden Potts)
f (r)
z(r)
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1
Gauss-Markov-Potts prior models for images
f (r) z(r) c(r) = 1 − δ(z(r) − z(r0 )) a0 g = Hf + m 0 , v0 γ α0 , β0 p(g|f, v ) = N (g|Hf, v I) α0 , β0 p(v ) = IG(v |α0 , β0 ) ? ? ? p(f = k,Q mk , vk ) = N (f (r)|mk , vk ) (r)|z(r) P v z θ p(f|z, θ) = k r∈Rk ak N (f (r)|mk , v k ), θ = {(a , m , k k v k ), k = 1, · · · , K } @ ? ? R f @ p(θ) = D(a|a )N 0 , v 0)IG(v|α0 , β0 ) h0 P(a|m i P 0 p(z|γ) ∝ exp γ δ(z(r) − z(r )) Potts MRF 0 r r ∈N (r) H ? p(f, z, θ|g) ∝ p(g|f, v ) p(f|z, θ) p(z|γ) g MCMC: Gibbs Sampling VBA: Alternate optimization.
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2
Bayesian Computation and Algorithms I
Joint posterior probability law of all the unknowns f, z, θ p(f, z, θ|g) ∝ p(g|f, θ 1 ) p(f|z, θ 2 ) p(z|θ 3 ) p(θ)
I
Often, the expression of p(f, z, θ|g) is complex.
I
Its optimization (for Joint MAP) or its marginalization or integration (for Marginal MAP or PM) is not easy
I
Two main techniques: I
I
MCMC: Needs the expressions of the conditionals p(f|z, θ, g), p(z|f, θ, g), and p(θ|f, z, g) VBA: Approximate p(f, z, θ|g) by a separable one q(f, z, θ|g) = q1 (f) q2 (z) q3 (θ) and do any computations with these separable ones.
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2
MCMC based algorithm p(f, z, θ|g) ∝ p(g|f, z, θ 1 ) p(f|z, θ 2 ) p(z|θ 3 ) p(θ) General Gibbs sampling scheme: bf ∼ p(f|b b g) −→ b b g) −→ θ b ∼ (θ|bf, b z, θ, z ∼ p(z|bf, θ, z, g) I
b g) ∝ p(g|f, θ) p(f|b b Generate samples f using p(f|b z, θ, z, θ) When Gaussian, can be done via optimization of a quadratic criterion.
I
b g) ∝ p(g|bf, b b p(z) Generate samples z using p(z|bf, θ, z, θ) Often needs sampling (hidden discrete variable)
I
Generate samples θ using p(θ|bf, b z, g) ∝ p(g|bf, σ2 I) p(bf|b z, (mk , vk )) p(θ) Use of Conjugate priors −→ analytical expressions.
I
After convergence use samples to compute means and variances.
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2
Application in CT: Reconstruction from 2 projections
g|f g = Hf + g|f ∼ N (Hf, σ2 I) Gaussian
f|z iid Gaussian or Gauss-Markov
z iid or Potts
c q(r) ∈ {0, 1} 1 − δ(z(r) − z(r0 )) binary
p(f, z, θ|g) ∝ p(g|f, θ 1 ) p(f|z, θ 2 ) p(z|θ 3 ) p(θ)
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2
Proposed algorithms p(f, z, θ|g) ∝ p(g|f, θ 1 ) p(f|z, θ 2 ) p(z|θ 3 ) p(θ) • MCMC based general scheme: bf ∼ p(f|b b g) −→ b b g) −→ θ b ∼ (θ|bf, b z, θ, z ∼ p(z|bf, θ, z, g) Iterative algorithme: I
I
I
b g) ∝ p(g|f, θ) p(f|b b Estimate f using p(f|b z, θ, z, θ) Needs optimization of a quadratic criterion. b g) ∝ p(g|bf, b b p(z) Estimate z using p(z|bf, θ, z, θ) Needs sampling of a Potts Markov field. Estimate θ using p(θ|bf, b z, g) ∝ p(g|bf, σ2 I) p(bf|b z, (mk , vk )) p(θ) Conjugate priors −→ analytical expressions.
• Variational Bayesian Approximation I
Approximate p(f, z, θ|g) by q1 (f) q2 (z) q3 (θ)
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2
Results with two projections
Original
Backprojection
Filtered BP
Gauss-Markov+pos
GM+Line process
GM+Label process
c
LS
z
c
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2
Implementation issues I
In almost all the algorithms, the step of computation of bf needs an optimization algorithm.
I
The criterion to optimize is often in the form of J(f) = kg − Hfk2 + λkDfk2
I
Very often, we use the gradient based algorithms which need to compute ∇J(f) = −2Ht (g − Hf) + 2λDt Df
I
So, for the simplest case, in each step, we have h i bf (k+1) = bf (k) + α(k) Ht (g − Hbf (k) ) + 2λDt Dbf (k)
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2
Gradient based algorithms h
i bf (k+1) = bf (k) + α H0 g − Hbf (k) − λD0 Dbf (k) b = Hbf (Forward projection) 1. Compute g b (Error or residual) 2. Compute δg = g − g 0 3. Compute δf 1 = H δg (Backprojection of error) 4. Compute δf 2 = −D0 Dbf (Correction due to regularization) 5. Update
bf (k+1) = bf (k) + [δf 1 + δf 2 ]
projections of Initial estimated Forward guess −→ image −→ projection −→ estimated image −→ H b g = Hf (k) f (0) f (k) ↑ update ↑ correction term in image space δf = H0 δg − λD0 Df (k)
I
–
Measured ← projections g
↓ compare ↓ ←−
Backprojection ←− H0
correction term in projection space δg = g − b g
Steps 1 and 3 need great computational cost and have been implemented on GPU.
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2
Multi-Resolution Implementation Sacle 1: black g(1) = H(1) f (1) ( N × N ) Sacle 2: green g(2) = H(2) f (2) (N/2 × N/2) Sacle 3: red g(3) = H(3) f (3) (N/4 × N/4)
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2
Results with 4 projection
Original
Projections
Initialization
Final result
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2
Conclusions I I I I I I I I I
I
Limited angle Computed Tomography is a very ill-posed Inverse problem Analytical methods have many limitations Algebraic methods push further these limitations Deterministic Regularization methods push still further the limitations of ill-conditioning. Probabilistic and in particular the Bayesian approach has many potentials Hierarchical prior model with hidden variables are very powerful tools for Bayesian approach to inverse problems. Gauss-Markov-Potts models for images incorporating hidden regions and contours Main Bayesian computation tools: JMAP, MCMC and VBA Application in different imaging system (X ray CT, Microwaves, PET, Ultrasound, Optical Diffusion Tomography (ODT), Acoustic source localization,...) Current Projects: Efficient implementation in 2D and 3D cases
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 3