.
Inverse problems in Finance and Human sciences Ali MohammadDjafari Laboratoire des Signaux et Syst`emes (L2S) UMR8506 CNRSCentraleSup´elecUNIV PARIS SUD SUPELEC, 91192 GifsurYvette, France http://lss.centralesupelec.fr Email:
[email protected] http://djafari.free.fr http://publicationslist.org/djafari Workshop Inverse problems in Finance and Human sciences, ATU, Tehran, Iran, September 2425, 2016
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 1/
Contents 1. Examples of inverse problems I I
Low dimensional case High dimensional case
2. Basics of Bayesian inference 3. Bayes for Inverse Problems and Machine Learning (Estimation, Prediction, Model Evaluation and selection) 4. Approximate Bayesian Computation (ABC) I I I I
Laplace approximation Bayesian Information Criterion (BIC) Variational Bayesian Approximation Expectation Propagation (EP), Message Passing, MCMC, Exact Sampling, ...
5. Bayes for inverse problems I
I
Traffic Management and Computed Tomography: A Linear problem Differential Equations in Finance and Microwave imaging: A BiLinear or NonLinear problem
6. Some canonical problems in Machine Learning I
Regression, Classification and Model selection for classical and Big Data cases
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 2/
Examples of inverse problems
1. Discrete cases examples 2. Continuous cases examples
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 3/
Traffic management r 1 .. . P
j
f ij = r i .. .
?c 1 .. . {f ij }
rI I I I
I
P
i
f ij
?r J
M residential places each containing r i , i = 1, · · · I cars N working places each containing c j , j = 1, · · · J parking lots We want to estimate the numbers f i,j of cars going from the residential place i to working place j we know: J X
f i,j = r i , i = 1, · · · , I ,
j=1 I
?c j = .. .
I X
f i,j = c j , j = 1, · · · , J
i=1
find f i,j .
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 4/
Traffic management: A very low dimensional and simple example I
I = 2, J = 2 2 X
f i,j = r i , i = 1, 2
j=1
I
2 X
f i,j = c j , j = 1, 2
i=1
r1 = 4, r2 = 6, c1 = 3, c2 = 7 Writing it differently: find f i,j f 1,1 f 1,2 4 f 2,1 f 2,2 6 3 7
I
A second example f 1,1 f 1,2 f 1,3 9 f 2,1 f 2,2 f 2,3 10 3 7 11
I
Then, we extend it greater dimension
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 5/
Traffic management Think about: I
Has this problem a solution?
I
Is the solution unique?
I
How to find all the possible solutions?
I
How we can select one of these solution? P Minimum Norm solutions? min i,j f 2i,j subject to the data constraints. What about minimizing:
I
I
I I I
P l1 norm: Pi,j f i,j  lα norm: P i,j f i,j α Entropy: − i,j f i,j log f i,j
I
What if there are uncertainties in the data ?
I
If you have to decide to construct for a fast road which one you propose?
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 6/
Continuous equivalent problem
I
Given two functions r (y ) and c(x), find a function and f (x, y ) of 2 variables x and y such that: R R f (x, y ) dx = r (y ) f (x, y ) dy = c(x)
I
When r (y ) and c(x) are the marginal probability distribution of a joint probability distribution f (x, y ), we see the Copula theory
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 7/
Prediction of gain I
I
Three days ago you gained 200 Euros, Tow days ago you gained 100 Euros, Yesterday 200 Euros, Today 100 Euros. How much are you expecting to gain tomorrow? after tomorrow ? ti xi
I
−3 −2 −1 0 1 2 200 100 160 200 ? ?
Think about the following models: I I I I I I I I I
x(ti ) = θ0 + θ1 (ti ) + i x(ti ) = θ0 + θ1 (ti ) + θ2 (ti2 ) + i x(ti ) = θ0 + θ1 (ti ) + θ2 (ti2 ) + θ3 (ti3 ) + i ... xi = θ1 xi−1 + i xi = θ1 xi−1 + θ2 xi−2 + i PK xi = k θk xi−k + i ... xi = θ0 + θ1 sin(πti ) + i
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 8/
Prediction I
Write the equation x(ti ) = θ0 + θ1 (ti ) + θ2 (ti2 ) + θ3 (ti3 ) + i in Matrix form 1 t1 t12 t13 θ0 x1 x2 1 t2 t 2 t 3 2 2 θ1 .. = .. θ2 . . θ3 1 tN t 2 t 3 xN N
I
N
Use x = Hθ + and solve for θ b = (H0 H)−1 H0 x θ
I
Do prediction for any ti : x(ti ) = θ0 + θ1 (ti ) + θ2 (ti2 ) + θ3 (ti3 ) + i
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 9/
Prediction I I
Use any dictionary {hk (t), k = 1, · · · , K } P Write the equation x(ti ) = K k=1 θ k hk (ti ) + i in matrix form h1 (t1 ) h2 (t1 ) · · · hK (t1 ) x1 h1 (t2 ) h2 (t2 ) · · · hK (t2 ) θ1 x2 h1 (t3 ) h2 (t3 ) · · · hK (t3 ) θ2 .. .. = . . .. . θK xN h1 (tN ) h2 (tN ) · · · fK (tN )
I
Use x = Hθ + and solve for θ
I
b = (H0 H)−1 H0 x θ P Do prediction for any ti : x(ti ) = K k=1 θ k hk (ti ) + i
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 10
Prediction examples
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 11
Prediction examples
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 12
Prediction of gain The same problem, but this time you have much more data I
regular daily since 2 years
I
regular daily but with some missing since 2 years
I
regular daily but with some outliers since 2 years
I
regular daily of yourself and your colleagues with the same rank and positions
I
regular daily of yourself and your colleagues with the same rank and positions, but also other colleagues in your company
I
regular daily of yourself and your colleagues with the same rank and positions, but also other colleagues in your company and in many other companies
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 13
Prediction problems examples
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 14
Prediction examples
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 15
Prediction with direct or indirect observation The same problem, but this time you want to do daily prediction, but you have the data either weakly or monthly. I
Each week, you have the mean value of the week
I
Each month, you have the mean value of the month
If we note f (n) the daily data and g (m) the observed data, then we have I
last day of the week: g (m) = f (n = 7 ∗ m)
I
I
last day of the month: g (m) = f (n = 30 ∗ m) P mean value of the week: g (m) = 7k=1 f (7 ∗ m − k + 1) P mean value of the month: g (m) = 30 k=1 f (7 ∗ m − k + 1)
I
Uncertain data
I
g (m) =
K X
f (7 ∗ m − k + 1) + (m)
k=1
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 16
Prediction problems examples
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 17
Prediction examples
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 18
Population observation, modelling and evolution We know approximate numbers of population in some of the cities g (xi , yi ), i = 1, · · · , M (probably every 4 years since 40 years g i (tn ), n = −40 : 4 : 0) and we want to know I
The distribution of the population in the whole country {f i,j , i = 1, · · · , I , j = 1, · · · , J}
I
The evolution of this distribution year by year
I
Prediction of this distribution of the population in the future years
But also I
To model the evolution of this distribution and its correlation with some other external events
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 19
Population observation, modelling and evolution Think about modelling: I
Discrete writing X
g (xi , yi ) =
f (xi − xk , yi − yl ) + (xi , yi )
(k,l):(xk ,yl )∈Rr I
Continuous writing Z g (xi , yi ) =
f (xi − x, yi − y ) dx dy + (xi , yi ) Rr
I
Differential forms ∂f (x, y ) + f (x, y ) = 0 with initial condition f (x, y ) = g (x, y ) ∂x
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 20
Inverse problems scientific communities Two communities working on Inverse problems: I
Mathematical departments: Analytical methods: Existance and Uniqueness Differential equations, PDE
I
Engineering and Computer sciences: Algebraic methods: Discretization, Uniqueness and Stability Integral equations, Discretization using Moments method, Galerkin, ...
Two examples: I
Deconvolution: Inverse filtering and Wiener filtering
I
X ray Computed Tomography: Radon transform: Direct Inversion or Filtered Backprojection methods
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 21
Differential Equation, State Space and InputOutput A simple electric system f (t) ↑
— R ———–—  x(t) ↑ C  ————–—–—
↑ g (t)
∂x(t) + x(t), RC = 1 ∂t Differential Equation Modelling ∂x(t) + x(t) = f (t), x(t) = g (t) ∂t State Space Modelling ∂x(t) = −x(t) + f (t) ∂t g (t) = x(t) f (t) = R i(t) + vc (t) = RC
I
I
InputOutput Modelling 1 pX (p) = −X (p) + F (p) → X (p) = p+1 F (p) ∂t = −x(t) + f (t) → g (t) = x(t) = h(t) ∗ f (t), h(t) = exp [−t] g (t) = x(t) I
∂x(t)
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 22
A more complex electric system example — R1 ————— — R2 ———————   f (t) ↑ x2 (t) ↑ C1 x1 (t) ↑ C2   —————————————————— f (t) =
∂x2 (t) + x2 (t), ∂t
x2 (t) =
↑ g (t)
∂x1 (t) + x1 (t) ∂t
2
I I
I
x1 (t) 1 (t) Differential Equation model: ∂ ∂t + x1 (t) = f (t) + 2 ∂x∂t 2 State space model # " ∂x1 (t) x1 (t) 0 −1 1 ∂t = + f (t) ∂x2 (t) 0 −1 x2 (t) 1 ∂t 1 = x1 (t) g (t) 0
InputOutput Model: g (t) = h(t) ∗ f (t)
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 23
Design/Control Inverse problems examples Simple Electrical system: a
I
x(0) = x0 ,
g (t) = x(t)
Design: θ = a = RC I I
I
∂x(t) + x(t) = f (t), ∂t
Forward: Given θ = a and f (t), t > 0, find x(t), t > 0 Inverse: Given x(t) and f (t) find θ = a
Control: f (t) I I
Forward: Given θ = a and f (t), t > 0, find x(t), t > 0 Inverse: Given θ = a and x(t), t > 0, find f (t)
More complex Electrical system: f (t) = b
∂x2 (t) + x2 (t), ∂t
x2 (t) = a
∂x1 (t) + x1 (t), ∂t
g (t) = x1 (t)
θ = (a = R1 C1 , b = R2 C2 )
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 24
Design/Control Inverse problems examples Massspringdashpot system m
I
∂x(t) ∂ 2 x(t) +c + k = F (t), 2 ∂t ∂t
∂x (0) = v0 ∂t
Design: θ = (m, c, k) I
I
I
x(0) = x0 ,
Forward: Given θ = (m, c, k), x0 , v0 and F (t), t > 0, find x(t), t > 0 Inverse: Given x(t) for t > 0, v0 , F (t) find θ = (m, c, k)
Control: F (t) I
I
Forward: Given θ = (m, c, k), x0 , v0 and F (t), t > 0, find x(t), t > 0 Inverse: Given θ = (m, c, k), v0 and x(t), t > 0, find F (t)
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 25
InputOutput model I
I
Linear Systems I
Single Input Single Output (SISO) systems Z y (t) = h(t, τ ) u(τ ) dτ
I
Multi Input Multi Output (MIMO) systems Z y(t) = H(t, τ ) u(τ ) dτ
Linear Time Invariant System I
SISO Convolution Z y (t) = h(t) ∗ u(t) =
I
h(t − τ ) u(τ ) dτ
MIMO Convolution Z y(t) =
H(t − τ ) u(τ ) dτ
I
. . . Impulse response h(t) or H(t) = . hij (t) . . . .
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 26
State space model: Continuous case Dynamic systems: I
Single Input Single Output (SISO) system: x(t) ˙ = A x(t) + B u(t) State equation y (t) = C x(t) + D v (t) Observation equation
I
Multiple Input Multiple Output (MIMO) system: ˙ x(t) = H x(t) + B u(t) State equation y(t) = C x(t) + D v(t) Observation equation H, B, C and D are the matrices of the system.
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 27
Modelling with Partial Differential Equations
I
Different PDE ∂ 2 f (x, y ) ∂ 2 f (x, y ) + + f (x, y ) = 0 ∂x 2 ∂y 2 ∂ 2 f (x, y ) ∂ 2 f (x, y ) ∂f (x, y ) ∂f (x, y ) + + f (x, y ) = 0 ∂x 2 ∂y 2 ∂x ∂y with initial cond. f (x, y ) = g (x, y )
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 28
Prediction with indirect observation I
Data available every K days with uncertainty g (m) =
K X
f (K ∗ m − k + 1) + (m)
k=1 I
more general: Convolution g (m) =
K X
h(k)f (n − k + 1) + (m)
k=1 I
One can show easily that both can be written as g = Hf +
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 29
Prediction examples
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 30
Simple Examples f1 f5
f9
f13 g8 f14 g7
f11 f12 f13 f14
g24
f21 f22 f23 f24
g23
f3 f7 f11 f15 g6 f4 f8 f12 f16 g5
f31 f32 f33 f34
g22
f41 f42 f43 f44
g21
f2 f6 f10
g1 g2 g3 g4
g11 g12 g13 g14
Noting also by: g1 = [g1 , · · · , g4 ]t = [g11 , · · · , g14 ]t , g2 = [g5 , · · · , g8 ]t = [g21 , · · · , g24 ]t and the matrices H1 , H2 and H such that: g1 = H1 f,
g2 = H2 f,
g = Hf =
H1 H2
f
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 31
Inversion, Generalized inversion I
Forward problem: Given f compute g: g1 = H1 f = [0 2 2 0]t , 0 0 0 0 t g2 =H2 f = 0 1 1 0 [0 2 2 0] , f = 0 1 1 0 −→ g = g1 = H1 f H2 g2 0 0 0 0 g = Hf = [0 2 2 0 0 2 2 0]t
I
Inverse problem: Given g Many possible solutions: 0 0 0 0 0 2 0 2 0 0 0 0
find f. 0 0 0 0
−.5
0
0
.5
1 −1
2
0
−1
0
2
0.5
0
0 −.5
1
0 0 0 0
−.5
0 2 0 0
0 0 0 0
0 0 2 0 0
0
.5
0 0
2
0
0
0
2
.5
0
0 −.5
0
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 32
MN, LS and MNLS solutions I
Minimum Norme (MN) solution: bf = arg min kfk2 Hf =g and if HHt was invertible, then we had: bf = Ht (HHt )−1 g. But svd(HHt ) = [8 4 4 4 4 4 4 0]
I
Least Squares (LS) solution: bf = arg min kg − Hfk2 , f and if Ht H was invertible, then we had: bf = (Ht H)−1 Ht g. But svd(Ht H) = [8 4 4 4 4 4 4 0 0 0 0 0 0 0 0 0]
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 33
SVD and MNLS solutions I
Truncation of the singular values to define an unique Generalized Inverse solution bf =
k X < g, uk > vk λk k=1
where uk and vk are, respectively, the eigenvectors of AAt and At A and λk their corresponding eigen values. −0.2500 0.2500 0.2500 −0.2500 0.2500 0.7500 0.7500 0.2500 bf = 0.2500 0.7500 0.7500 0.2500 −0.2500 0.2500 0.2500 −0.2500 I
MNLS bf = arg min kg − Hfk2 + kfk2 , f
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 34
Regularization theory Inverse problems = Ill posed problems −→ Need for prior information Functional space (Tikhonov): g = H(f ) + −→ J(f ) = g − H(f )22 + λDf 22 Finite dimensional space (Philips & Towmey): g = H(f) + • Minimum norme LS (MNLS): J(f) = g − H(f)2 + λf2 • Classical regularization: J(f) = g − H(f)2 + λDf2 • More general regularization: or
J(f) = Q(g − H(f)) + λΩ(Df)
J(f) = ∆1 (g, H(f)) + λ∆2 (f, f ∞ ) Limitations: • Errors are considered implicitly white and Gaussian • Limited prior information on the solution • Lack of tools for the determination of the hyperparameters
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 35
Basic Bayes I
Product rules
P(A, B) = P(AB)P(B) = P(BA)P(A) → P(AB) = I
P(BA)P(A) P(B)
Sum rule P(B) =
X
P(AB)P(B)
A I
Bayes rule (discrete events) P(BA)P(A) P(AB) = P A P(BA)P(A)
I I
P(hypothesisdata) = P(datahypothesis)P(hypothesis) P(data) Bayes rule (Continuous variables with finite parametric models) p(θd) =
p(dθ) p(θ) p(dθ) p(θ) =R ∝ p(dθ) p(θ) p(d) p(dθ) p(θ) dθ
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 36
Basic Bayes P(datahypothesis)P(hypothesis) P(data)
I
P(hypothesisdata) =
I
Bayes rule tells us how to do inference about hypotheses from data.
I
Finite parametric models: p(θd) =
p(dθ) p(θ) p(d)
I
Forward model: p(dθ) called also likelihood of the parameters in data L(θ) = p(dθ)
I
Prior knowledge: p(θ)
I
Posterior knowledge: p(θd) Z Evidence: p(d) = p(dθ) p(θ) dθ
I
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 37
Bayesian inference: simple one parameter case p(θ), L(θ) = p(dθ) −→ p(θd) ∝ L(θ) p(θ) Prior: p(θ)
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 38
Bayesian inference: simple one parameter case p(θ), L(θ) = p(dθ) −→ p(θd) ∝ L(θ) p(θ) Likelihood: L(θ) = p(dθ)
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 39
Bayesian inference: simple one parameter case p(θ), L(θ) = p(dθ) −→ p(θd) ∝ L(θ) p(θ) Posterior: p(θd) ∝ p(dθ) p(θ)
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 40
Bayesian inference: simple one parameter case p(θ), L(θ) = p(dθ) −→ p(θd) ∝ L(θ) p(θ) Prior, Likelihood and Posterior:
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 41
Bayesian inference: simple two parameters case p(θ1 , θ2 ), L(θ1 , θ2 ) = p(dθ1 , θ2 ) −→ p(θ1 , θ2 d) ∝ L(θ1 , θ2 ) p(θ1 , θ2 ) Prior: p(θ1 , θ2 )
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 42
Bayesian inference: simple one parameter case p(θ1 , θ2 ), L(θ1 , θ2 ) = p(dθ1 , θ2 ) −→ p(θ1 , θ2 d) ∝ L(θ1 , θ2 ) p(θ1 , θ2 ) Likelihood: L(θ1 , θ2 ) = p(dθ1 , θ2 )
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 43
Bayesian inference: simple one parameter case p(θ1 , θ2 ), L(θ1 , θ2 ) = p(dθ1 , θ2 ) −→ p(θ1 , θ2 d) ∝ L(θ1 , θ2 ) p(θ1 , θ2 ) Posterior: p(θ1 , θ2 d) ∝ p(dθ1 , θ2 ) p(θ1 , θ2 )
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 44
Bayesian inference: simple one parameter case p(θ1 , θ2 ), L(θ1 , θ2 ) = p(dθ1 , θ2 ) −→ p(θ1 , θ2 d) ∝ L(θ1 , θ2 ) p(θ1 , θ2 ) Prior, Likelihood and Posterior:
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 45
Bayes: 1D case p(θd) = I
p(dθ) p(θ) ∝ p(dθ) p(θ) p(d)
Maximum A Posteriori (MAP) b θ = arg max {p(θd)} = arg max {p(dθ) p(θ)} θ
I
θ
Posterior Mean Z b θ = Ep(θd) {θ} =
I
θp(θd) dθ
Region of high probabilities Z bθ2 b b [θ 1 , θ 2 ] : p(θd) dθ = 1 − α b θ1
I
Sampling and exploring θ ∼ p(θd)
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 46
Bayesian inference: great dimensional case I
Simple Linear case: d = Hθ +
I
Gaussian priors: p(dθ) = N (dHθ, v I) p(θ) = N (θ0, vθ I)
I
Gaussian posterior: b V) b p(θd) = N (θθ, 0 −1 b θ = [H H + λI] H0 d, b = [H0 H + λI]−1 V
λ=
v vθ
I
b can be done via optimization of: Computation of θ J(θ) = − ln p(θd) = 2v1 kd − Hθk2 + 2v1θ kθk2 + c
I
b = [H0 H + λI]−1 needs great dimensional Computation of V matrix inversion.
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 47
Bayesian inference: great dimensional case I
Gaussian posterior: b V), b p(θd) = N (θθ, b = [H0 H + λI]−1 H0 d, V b = [H0 H + λI]−1 , λ = θ
I
I
I
I I
v vθ
b can be done via optimization of: Computation of θ J(θ) = − ln p(θd) = c + kd − Hθk2 + λkθk2 Gradient based methods: ∇J(θ) = −2H0 (d − Hθ) + 2λθ constant step, Steepest descend, ... θ (k+1) = θ (k) −α(k) ∇J(θ (k) ) = θ (k) +2α(k) H0 (d − Hθ) + λθ Conjugate Gradient, ... At each iteration, we need to be able to compute: I I
b = Hθ (k) Forward operation: d b Backward (Adjoint) operation: Ht (d − d)
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 48
Bayesian inference: great dimensional case I
b = [H0 H + λI]−1 needs great dimensional Computation of V matrix inversion.
I
Almost impossible except in particular cases of Toeplitz, Circulante, TBT, CBC,... where we can diagonalize it via Fast Fourier Transform (FFT). b and V b Recursive use of the data and recursive update of θ
I
leads to Kalman Filtering which are still computationally demanding for High dimensional data. I
We also need to generate samples from this posterior: There are many special sampling tools.
I
Mainly two categories: Using the covariance matrix V or its inverse (Precision matrix) Λ = V−1
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 49
Bayesian inference: non Gaussian priors case I
Linear forward model: d = Hθ +
I
Gaussian noise model: 1 2 kd − Hθk2 p(dθ) = N (dHθ, v I) ∝ exp − 2v
I
Sparsity enforcing prior: p(θ) ∝ exp [αkθk1 ]
Posterior: 1 p(θd) ∝ exp − J(θ) with J(θ) = kd−Hθk22 +λkθk1 , λ = 2v α 2v I
I
b can be done via optimization of J(θ) Computation of θ
I
Other computations are much more difficult.
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 50
Bayes Rule for Machine Learning (Simple case) I
Inference on the parameters: Learning from data d: p(θd, M) =
I
Model Comparison: p(Mk d) = with
p(dMk ) p(Mk ) p(d)
Z p(dMk ) =
I
p(dθ, M) p(θM) p(dM)
p(dθ, Mk ) p(θM) dθ
Prediction with selected model: Z p(zMk ) = p(zθ, Mk )p(θd, Mk ) dθ
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 51
Approximation methods
I
Laplace approximation
I
Bayesian Information Criterion (BIC)
I
Variational Bayesian Approximations (VBA)
I
Expectation Propagation (EP)
I
Markov chain Monte Carlo methods (MCMC)
I
Exact Sampling
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 52
Laplace Approximation I I
Data set d, models M1 , · · · , MK , parameters θ 1 , · · · , θ K Model Comparison: p(θ, dM) = p(dθ, M) p(θM) p(θd, M) = Z p(θ, dM)/p(dM)
I
p(dM) = p(dθ, M) p(θM) dθ For large amount of data (relative to number of parameters, m), p(θd, M) is approximated by a Gaussian around its b maximum (MAP) θ: 1 0 −m/2 1/2 b b p(θd, M) ≈ (2π) A exp − (θ − θ) A(θ − θ) 2 d2 θi θj ln p(θd, M)
I
is the m × m Hessian matrix. b p(dM) = p(θ, dM)/p(θd, M) and evaluating it at θ:
I
b Mk )+ln p(θM b k )+ m ln(2π)− 1 ln A ln p(dMk ) ≈ ln p(dθ, 2 2 b Needs computation of θ and A.
where Ai,j =
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 53
Bayesian Information Criteria (BIC) I
BIC is obtained from the Laplace approximation b k ) + p(dθ, b Mk ) + d ln(2π) − 1 ln A ln p(dMk ) ≈ ln p(θM 2 2 by taking the large sample limit (n 7→ ∞) where n is the number of data points: b Mk ) − ln p(dMk ) ≈ p(dθ,
d ln(n) 2
I
Easy to compute
I
It does not depend on the prior
I
It is equivalent to MDL criterion
I
Assumes that as (n 7→ ∞), all the parameters are identifiable.
I
Danger: counting parameters can be deceiving (sinusoid, infinite dim models)
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 54
Bayes Rule for Machine Learning with hidden variables I I
Data: d, Hidden Variables: x, Parameters: θ, Model: M Bayes rule p(x, θd, M) =
I
p(dx, θ, M) p(xθ, M))p(θM) p(dM)
Model Comparison p(Mk d) =
p(dMk ) p(Mk ) p(d)
with Z Z p(dMk ) = I
p(dx, θ, Mk ) p(xθ, M))p(θM) dx dθ
Prediction with a new data z Z Z p(zM) = p(zx, θ, M)p(xθ, M)p(θM)) dx dθ
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 55
Lower Bounding the Marginal Likelihood Jensen’s inequality: Z Z ln p(dMk ) = ln
p(d, x, θMk ) dx dθ Z Z
p(d, x, θMk ) dx dθ = ln q(x, θ) q(x, θ) Z Z p(d, x, θMk ) ≥ q(x, θ) ln dx dθ q(x, θ) Using a factorised approximation for q(x, θ) = q1 (x)q2 (θ): Z Z p(d, x, θMk ) ln p(dMk ) ≥ q1 (x)q2 (θ) ln dx dθ q1 (x)q2 (θ) = FMk (q1 (x), q2 (θ), d) Maximising this free energy leads to VBA.
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 56
Variational Bayesian Learning Z Z
p(d, x, θM) dx dθ q1 (x)q2 (θ) = H(q1 ) + H(q2 ) + hln p(d, x, θM)iq1 q2
FM (q1 (x), q2 (θ), d) =
q1 (x)q2 (θ) ln
Minimising this lower bound with respect to q1 and then q2 leads to EMlike iterative update h i (t+1) q1 (x) ∝ exp hln p(d, x, θM)iq(t) (θ ) Elike step 2 h i (t+1) q2 (θ) ∝ exp hln p(d, x, θM)iq(t+1) (x) Mlike step 1
which can also be written as: h i (t+1) q1 (x) ∝ exp hln p(d, xθ, M)iq(t) (θ ) Elike step 2 h i (t+1) q2 (θ) ∝ p(θM) exp hln p(d, xθ, M)iq(t+1) (x) Mlike step 1
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 57
EM and VBEM algorithms EM for Marginal MAP estimation Goal: maximize p(θd, M) w.r.t. θ E Step: Compute (t+1) q1 (x) = p(xd, θ (t) ) and Q(θ) = hln p(d, x, θM)iq(t+1) (x)
Variational Bayesian EM Goal: lower bound p(dM) VBE Step: Compute (t+1) q1 (x) = p(xd, φ(t) ) and Q(θ) = hln p(d, x, θM)iq(t+1) (x)
M Step: Maximize θ (t+1) = arg maxθ {Q(θ)}
M Step: Maximize (t+1) q2 (θ) = exp [Q(θ)]
1
1
Properties: e I VBEM reduces to EM if q2 (θ) = δ(θ − θ) I VBEM has the same complexity than EM I If we choose q2 (θ) in the conjugate family of p(d, xθ), then φ becomes the expected natural parameters I The main computational part of both methods is in the Estep. We can use belief propagation, Kalman filter, etc. to do it. In VBEM, φ replaces θ.
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 58
Computed Tomography: Seeing inside of a body I
f (x, y ) a section of a real 3D body f (x, y , z)
I
gφ (r ) a line of observed radiography gφ (r , z)
I
Forward model: Line integrals or Radon Transform Z gφ (r ) = f (x, y ) dl + φ (r ) L
ZZ r ,φ = f (x, y ) δ(r − x cos φ − y sin φ) dx dy + φ (r ) I
Inverse problem: Image reconstruction Given the forward model H (Radon Transform) and a set of data gφi (r ), i = 1, · · · , M find f (x, y )
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 59
2D and 3D Computed Tomography 3D
2D
Z gφ (r1 , r2 ) =
Z f (x, y , z) dl
Lr1 ,r2 ,φ
gφ (r ) =
f (x, y ) dl Lr ,φ
Forward probelm: f (x, y ) or f (x, y , z) −→ gφ (r ) or gφ (r1 , r2 ) Inverse problem: gφ (r ) or gφ (r1 , r2 ) −→ f (x, y ) or f (x, y , z)
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 60
Algebraic methods: Discretization S•
y 6
Hij
r
f1 fj
f (x, y )
gi φ
fN

x
P f b (x, y ) j j j 1 if (x, y ) ∈ pixel j •D bj (x, y ) = 0 else g (r , φ) Z N X g (r , φ) = f (x, y ) dl gi = Hij fj + i → g = Hf + f (x, y ) =
L
I I I
j=1
H is huge dimensional: 2D: 106 × 106 , 3D: 109 × 109 . Hf corresponds to forward projection Ht g corresponds to Back projection (BP)
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 61
Bayesian approach for linear inverse problems M:
g = Hf +
I
Observation model M + Information on the noise : p(gf, θ1 ; M) = p (g − Hfθ1 )
I
A priori information
I
Basic Bayes : p(fg, θ1 , θ2 ; M) =
I
p(gf, θ1 ; M) p(fθ2 ; M) p(gθ1 , θ2 ; M)
Unsupervised: p(f, θg, α0 ) =
I
p(fθ2 ; M)
p(gf, θ1 ) p(fθ2 ) p(θα0 ) , p(gα0 )
θ = (θ1 , θ2 )
Hierarchical prior models:
p(f, z, θg, α0 ) =
p(gf, θ1 ) p(fz, θ2 ) p(zθ3 ) p(θα0 ) , p(gα0 )
θ = (θ1 , θ2 , θ3 )
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 62
Bayesian inference for inverse problems Simple case: g = Hf + θ2
θ1
?
?
f
H ?
g vf
v
?
?
f
H ?
g
p(fg, θ) ∝ p(gf, θ 1 ) p(fθ 2 ) – Objective: Infer f bf = arg max {p(fg, θ)} – MAP: f Z – Posterior Mean (PM): bf = f p(fg, θ) df Example: Caussian case: p(gf, v ) = N (gHf, v I) b → p(fg, θ) = N (fbf, Σ) p(fvf ) = N (f0, vf I) bf = arg min {J(f)} with – MAP: f 1 J(f) = v kg − Hfk2 + v1f kfk2 –(Posterior Mean (PM)=MAP: bf = (Ht H + λI)−1 Ht g with λ = b = (Ht H + λI)−1 Σ
v vf .
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 63
Gaussian model: Simple separable and Markovian g = Hf + Separable Gaussian vf
v
?
?
f
H ?
g
GaussMarkov vf , D
v
?
?
f
H ?
g
g = Hf + p(gf, θ1 ) = N (gHf, v I) b → p(fg, θ) = N (fbf, Σ) p(fvf ) = N (f0, vf I) bf = arg min {J(f)} with – MAP: f 1 J(f) = v kg − Hfk2 + v1f kfk2 –(Posterior Mean (PM)=MAP: bf = (Ht H + λI]−1 Ht g with λ = b = v (Ht H + λI]−1 Σ
v vf .
Markovian case: p(fvf , D) = N (f0, vf (DDt )−1 ) – MAP:
J(f) =
1 v kg
− Hfk2 +
1 vf
–(Posterior Mean (PM)=MAP: bf = (Ht H + λDt D]−1 Ht g with λ = b = v (Ht H + λDt D]−1 Σ
kDfk2
ve vf .
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 64
Bayesian inference (Unsupervised case) Unsupervised case: Hyper parameter estimation p(f, θg) ∝ p(gf, θ 1 ) p(fθ 2 ) p(θ) – Objective: Infer (f, θ) b = arg max JMAP: (bf, θ) β0 ?
θ2
α0 ?
θ1
?
?
f
H ?
g
(f ,θ ) {p(f, θg)}
– Marginalization 1: Z p(fg) = p(f, θg) dθ – Marginalization 2: Z p(θg) = p(f, θg) df followed by: n o b b b θ = arg maxθ {p(θg)} → f = arg maxf p(fg, θ) – MCMC Gibbs sampling: f ∼ p(fθ, g) → θ ∼ p(θf, g) until convergence Use samples generated to compute mean and variances – VBA: Approximate p(f, θg) by q1 (f) q2 (θ) Use q1 (f) to infer f and q2 (θ) to infer θ
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 65
JMAP, Marginalization, VBA I
JMAP: p(f, θg) optimization
I
−→ bf b −→ θ
Marginalization p(f, θg) −→
p(θg)
b −→ p(fθ, b g) −→ bf −→ θ
Joint Posterior Marginalize over f I
Variational Bayesian Approximation
p(f, θg) −→
Variational Bayesian Approximation
−→ q1 (f) −→ bf b −→ q2 (θ) −→ θ
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 66
Variational Bayesian Approximation I
Approximate p(f, θg) by q(f, θ) = q1 (f) q2 (θ) and then use them for any inferences on f and θ respectively.
I
Criterion KL(q(f, Z Z Z θg) : p(f,Zθg)) q1 q2 q q1 q2 ln KL(q : p) = q ln = p p Iterative algorithm q1 −→ q2 −→ q1 −→ q2 , · · ·
I
q b1 (f)
h i ∝ exp hln p(g, f, θ; M)ibq2 (θ ) h i q b2 (θ) ∝ exp hln p(g, f, θ; M)ibq1 (f ) p(f, θg) −→
Variational Bayesian Approximation
−→ q1 (f) −→ bf b −→ q2 (θ) −→ θ
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 67
Variational Bayesian Approximation p(g, f, θM) = p(gf, θ, M) p(fθ, M) p(θM) p(g, f, θM) p(f, θg, M) = p(gM) ZZ p(f, θg; M) KL(q : p) = q(f, θ) ln df dθ q(f, θ) ZZ p(g, f, θM) p(gM) = q(f, θ) df dθ q(f, θ) ZZ p(g, f, θM) df dθ ≥ q(f, θ) ln q(f, θ) Free energy: ZZ p(g, f, θM) F(q) = q(f, θ) ln df dθ q(f, θ) Evidence of the model M: p(gM) = F(q) + KL(q : p)
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 68
VBA: Separable Approximation p(gM) = F(q) + KL(q : p) q(f, θ) = q1 (f) q2 (θ) Minimizing KL(q : p) = Maximizing F(q) b2 ) = arg min {KL(q1 q2 : p)} = arg max {F(q1 q2 )} (b q1 , q (q1 ,q2 )
(q1 ,q2 )
KL(q1 q2 : p) is convexe wrt q1 when q2 is fixed and vise versa: b1 = arg minq1 {KL(q1 q b2 : p)} = arg maxq1 {F(q1 q b2 )} q b2 = arg minq2 {KL(b q q1 q2 : p)} = arg maxq2 {F(b q1 q2 )} h i q b1 (f) ∝ exp hln p(g, f, θ; M)ibq2 (θ ) h i q b2 (θ) ∝ exp hln p(g, f, θ; M)ibq1 (f )
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 69
BVA: Choice of family of laws q1 and q2 Case 1 : −→ Joint MAP n o ( e M) ef = arg max p(f, r θg; e e b1 (ff) = δ(f − f) q f n o e = δ(θ − θ) e−→θ= e arg max p(ef, θg; M) b2 (θθ) q θ
I
I
Case 2 : −→ EM e M)i b1 (f) q ∝ p(fθ, g) Q(θ, θ)= hln p(f, θg; q1 (o f θe ) n −→ e e e e b2 (θθ) = δ(θ − θ) θ q = arg maxθ Q(θ, θ)
Appropriate choice for inverse problems e g; M) Accounts for the uncertainties of b1 (f) ∝ p(fθ, q −→ b b2 (θ) ∝ p(θf, g; M) q θ for bf and vice versa. I
Exponential families, Conjugate priors
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 70
JMAP, EM and VBA JMAP Alternate optimization Algorithm: n o e ef = arg max p(f, θg) e −→ef −→ bf θ (0) −→ θ−→ f ↑ ↓ n o b ←− θ←− e e = arg max p(ef, θg) ←−ef θ θ θ EM: e θ (0) −→ θ−→ ↑ b ←− θ←− e θ
e g) q1 (f) = p(fθ, e = hln p(f, θg)i Q(θ, θ) q1o (f ) n e = arg max Q(θ, θ) e θ θ
−→q1 (f) −→ bf ↓ ←− q1 (f)
VBA: h i θ (0) −→ q2 (θ)−→ q1 (f) ∝ exp hln p(f, θg)iq2 (θ ) −→q1 (f) −→ bf ↑ ↓ h i b θ ←− q2 (θ)←− q2 (θ) ∝ exp hln p(f, θg)iq1 (f ) ←−q1 (f)
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 71
Non stationary noise and sparsity enforcing model – Non stationary noise: g = Hf+, i ∼ N (i 0, vi ) → ∼ N (0, V = diag [v1 , · · · , vM ]) – Studentt prior model and its equivalent IGSM : f j vfj ∼ N (f j 0, vfj ) and vfj ∼ IG(vfj αf0 , βf0 ) → f j ∼ St(f j αf0 , βf0 )
αf0 , βf0 α0 , β0 ?
vf
?
v
?
?
f
p(gf, v ) = N (gHf, V ), V = diag [v ] p(fvf ) = N (f0, Vf ), Vf = diag [vf ] Q p(v ) = Qi IG(vi α0 , β0 ) p(vf ) = i IG(vfj αf0 , βf0 ) p(f, v , vf g) ∝ p(gf, v ) p(fvf ) p(v ) p(vf ) Objective: Infer (f, v , vf )
H ?
g
– VBA: Approximate p(f, v , vf g) by q1 (f) q2 (v ) q3 (vf )
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 72
Sparse model in a Transform domain 1
αz0 , βz0 ?
vz
α0 , β0
?
z D? f H ?
g
?
v ?
g = Hf + , f = Dz, z sparse p(gz, v ) = N (gHDf, v I) Vz = diag [vz ] p(zvz ) = N (z0, Vz ), p(v ) = IG(v Q α0 , β0 ) p(vz ) = i IG(vz j αz0 , βz0 ) p(z, v , vz , v ξ g) ∝p(gz, v ) p(zvz ) p(v ) p(vz ) p(v ξ ) – JMAP: (b z, vˆ , b vz ) = arg max {p(z, v , vz g)} (z,v ,vz ) Alternate optimization: b z = arg minz {J(z)} with: −1/2 2 1 zk J(z) = 2vˆ kg − HDzk2 + kVz βz +b z2
j vbzj = αz 0+1/2 0 vˆ = β0 +kg−HDzbk2
α0 +M/2
– VBA: Approximate p(z, v , vz , v ξ g) by q1 (z) q2 (v ) q3 (vz ) Alternate optimization.
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 73
Sparse model in a Transform domain 2
αξ0 , βξ0 αz0 , βz0 ?
vξ
?
vz
?
α0 , β0
?
?
v
z ξ D? R f
?
H ?
g
g = Hf + , f = Dz + ξ, z sparse p(gf, v ) = N (gHf, v I) p(fz) = N (fDz, v ξ I), Vz = diag [vz ] p(zvz ) = N (z0, Vz ), p(v ) = IG(v Q α0 , β0 ) p(vz ) = i IG(vz j αz0 , βz0 ) p(v ξ ) = IG(v ξ αξ0 , βξ0 ) p(f, z, v , vz , v ξ g) ∝p(gf, v ) p(fzf ) p(zvz ) p(v ) p(vz ) p(v ξ ) – JMAP: (bf, b z, vˆ , b vz , vbξ ) = arg max {p(f, z, v , vz , v ξ g)} (f ,z,v ,vz ,v ξ ) Alternate optimization. – VBA: Approximate p(f, z, v , vz , v ξ g) by q1 (f) q2 (z) q3 (v ) q4 (vz ) q5 (v ξ ) Alternate optimization.
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 74
GaussMarkovPotts prior models for images
a0 m 0 , v0 α0 , β0 ? θ
f (r) γ
α0 , β0
? z
? v
? R f
?
H ? g
z(r) c(r) = 1 − δ(z(r) − z(r0 )) g = Hf + p(gf, v ) = N (gHf, v I) p(v ) = IG(v α0 , β0 ) = k,Q mk , vk ) = N (f (r)mk , vk ) p(f (r)z(r) P p(fz, θ) = k r∈Rk ak N (f (r)mk , v k ), θ = {(ak , mk , v k ), k = 1, · · · , K } p(θ) = D(aah0 )N (am0 , v 0)IG(vα0 , β0 ) i 0 p(zγ) ∝ exp γ P P 0 δ(z(r) − z(r )) Potts MRF r r ∈N (r) p(f, z, θg) ∝ p(gf, v ) p(fz, θ) p(zγ) MCMC: Gibbs Sampling VBA: Alternate optimization.
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 75
Mixture Models 1. Mixture models 2. Different problems related to classification and clustering I I I I
Training Supervised classification Semisupervised classification Clustering or unsupervised classification
3. Mixture of Gaussian (MoG) 4. Mixture of Studentt (MoSt) 5. Variational Bayesian Approximation (VBA) 6. VBA for Mixture of Gaussian 7. VBA for Mixture of Studentt 8. Conclusion
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 76
Mixture models I
General mixture model p(xa, Θ, K ) =
K X
ak pk (xk θ k ),
0 < ak < 1,
k=1
K X
ak = 1
k=1
I
Same family pk (xk θ k ) = p(xk θ k ), ∀k
I
Gaussian p(xk θ k ) = N (xk µk , Vk ) with θ k = (µk , Vk )
I
Data X = {xn , n = 1, · · · , N} where each element xn can be in one of the K classes cn .
I
ak = p(cn = k), a = {ak , k = 1, · · · , K }, Θ = {θ k , k = 1, · · · , K }, c = {cn , n = 1, · · · , N} p(X, ca, Θ) =
N Y
p(xn , cn = kak , θ k )
n=1
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 77
Different problems I
I
Training: Given a set of (training) data X and classes c, estimate the parameters a and Θ. Supervised classification: Given a sample xm and the parameters K , a and Θ determine its class k ∗ = arg max {p(cm = kxm , a, Θ, K )} . k
I
Semisupervised classification (Proportions are not known): Given sample xm and the parameters K and Θ, determine its class k ∗ = arg max {p(cm = kxm , Θ, K )} . k
I
Clustering or unsupervised classification (Number of classes K is not known): Given a set of data X, determine K and c.
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 78
Training I
I
I
Given a set of (training) data X and classes c, estimate the parameters a and Θ. Maximum Likelihood (ML): b = arg max {p(X, ca, Θ, K )} . (b a, Θ) (a,Θ) Q Bayesian: Assign priors p(aK ) and p(ΘK ) = K k=1 p(θ k ) and write the expression of the joint posterior laws: p(a, ΘX, c, K ) =
p(X, ca, Θ, K ) p(aK ) p(ΘK ) p(X, cK )
where ZZ p(X, cK ) = I
p(X, ca, ΘK )p(aK ) p(ΘK ) da dΘ
Infer on a and Θ either as the Maximum A Posteriori (MAP) or Posterior Mean (PM).
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 79
Supervised classification I
Given a sample xm and the parameters K , a and Θ determine p(cm = kxm , a, Θ, K ) =
p(xm , cm = ka, Θ, K ) p(xm a, Θ, K )
where p(xm , cm = ka, Θ, K ) = ak p(xm θ k ) and p(xm a, Θ, K ) =
K X
ak p(xm θ k )
k=1 I
Best class k ∗ : k ∗ = arg max {p(cm = kxm , a, Θ, K )} k
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 80
Semisupervised classification I
Given sample xm and the parameters K and Θ (not the proportions a), determine the probabilities p(cm = kxm , Θ, K ) =
p(xm , cm = kΘ, K ) p(xm Θ, K )
where Z p(xm , cm = kΘ, K ) = and p(xm Θ, K ) =
p(xm , cm = ka, Θ, K )p(aK ) da K X
p(xm , cm = kΘ, K )
k=1 I
Best class k ∗ , for example the MAP solution: k ∗ = arg max {p(cm = kxm , Θ, K )} . k
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 81
Clustering or nonsupervised classification I
Given a set of data X, determine K and c.
I
Determination of the number of classes: p(K = LX) =
p(XK = L) p(K = L) p(X, K = L) = p(X) p(X)
and p(X) =
L0 X
p(K = L) p(XK = L),
L=1
where L0 is the a priori maximum number of classes and p(XK = L) =
Z Z YY L
ak p(xn , cn = kθ k )p(aK ) p(ΘK ) da dΘ.
n k=1 I
When K and c are determined, we can also determine the characteristics of those classes a and Θ.
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 82
Mixture of Gaussian and Mixture of Studentt p(xa, Θ, K ) =
K X
ak p(xk θ k ),
k=1 I
0 < ak < 1,
K X
ak = 1
k=1
Mixture of Gaussian (MoG)
p(xk θ k ) = N (xk µk , Vk ), θ k = (µk , Vk ) 1 − 12 − p2 0 −1 N (xk µk , Vk ) = (2π) Vk  exp (xk − µk ) Vk (xk − µk ) 2 I
Mixture of Studentt (MoSt)
p(xk θ k ) = T (xk νk , µk , Vk ), θ k = (νk , µk , Vk ) h i (νk +p) − (ν+p) Γ 2 2 1 − 21 0 −1 T (xk ν, µk , Vk ) = ν 1 + (xk − µk ) Vk (xk − µk ) p p Vk  ν Γ( 2k )ν 2 π 2
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 83
Mixture of Studentt model I
Studentt and its Infinite Gaussian Scaled Model (IGSM): Z ∞ ν ν T (xν, µ, V) = N (xµ, u −1 V) G(u , ) dz 2 2 0 where 1 N (xµ, V)= 2πV− 2 exp − 12 (x − µ)0 V−1 (x − µ) 1 = 2πV− 2 exp − 12 Tr (x − µ)V−1 (x − µ)0 and G(uα, β) =
I
β α α−1 u exp [−βu] . Γ(α)
Mixture of generalized Studentt: T (xα, β, µ, V) p(x{ak , µk , Vk , αk , βk }, K ) =
K X
ak T (xn αk , βk , µk , Vk ).
k=1
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 84
Mixture of Gaussian model I
I
Introducing znk ∈ {0, 1}, zk = {znk , n = 1, · · · , N}, Z = {znk } with P(znk = 1) = P(cn = k) = ak , θ k = {ak , µk , Vk }, Θ = {θ k , k = 1, · · · , K } Q Assigning the priors p(Θ) = k p(θ k ), we can write: YX Y p(X, c, Z, ΘK ) = ak N (xn µk , Vk ) (1−δ(znk )) p(θ k ) n
p(X, c, Z, ΘK ) =
k
k
YY n
I
[ak N (xn µk , Vk )]znk p(θ k )
k
Joint posterior law: p(c, Z, ΘX, K ) =
I
p(X, c, Z, ΘK ) . p(XK )
The main task now is to propose some approximations to it in such a way that we can use it easily in all the above mentioned tasks of classification or clustering.
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 85
Hierarchical graphical model for Mixture of Gaussian γ0 , V 0 ?
Vk
µ0 , η0
k0 p(a) = D(ak0 ) ? p(µk Vk ) = N (µk µ0 1, η0 −1 Vk ) a p(Vk ) = IW(Vk γ0 , V0 ) P(znk = 1) = P(cn = k) = ak ?
?
µk
R xn znk cn
p(X, c, Z, ΘK ) =
YY n
[ak N (xn µk , Vk )]z nk
k
p(ak )p(µk Vk )p(Vk )
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 86
Mixture of Studentt model Introducing U = {unk } θ k = {αk , βk , ak , µk , Vk }, Θ = {θ k , k = 1, · · · , K } Q I Assigning the priors p(Θ) = k p(θ k ), we can write: YY z p(X, c, Z, U, ΘK ) = ak N (xn µk , un,k −1 Vk ) G(unk αk , βk ) nk p(θ k ) I
n I
k
Joint posterior law: p(c, Z, U, ΘX, K ) =
I
p(X, c, Z, U, ΘK ) . p(XK )
The main task now is to propose some approximations to it in such a way that we can use it easily in all the above mentioned tasks of classification or clustering.
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 87
Hierarchical graphical model for Mixture of Studentt ξ0
γ0 , V 0 R
αk
βk R
unk
µ0 , η0
k0 ? ? ? a Vk  µk R ?  xn znk cn
p(X, c, Z, U, ΘK ) =
YY n
p(a) = D(ak0 ) p(µk Vk ) = N (µk µ0 1, η0 −1 Vk ) p(Vk ) = IW(Vk γ0 , V0 ) p(αk ) = E(αk ζ0 ) = G(αk 1, ζ0 ) p(β k ) = E(β k ζ0 ) = G(β k 1, ζ0 ) P(znk = 1) = P(cn = k) = ak p(unk ) = G(unk αk , β k )
[ak N (xn µk , Vk )G(unk αk , β k )]z nk
k
p(ak )p(µk Vk )p(Vk )p(αk )p(β k )
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 88
Variational Bayesian Approximation (VBA) I
I
Main idea: to propose easy computational approximations: q(c, Z, Θ) = q(c, Z)q(Θ) for p(c, Z, ΘX, K ) for MoG model, or q(c, Z, U, Θ) = q(c, Z, U)q(Θ) for p(c, Z, U, ΘX, K ) for MoSt model. Criterion: KL(q : p) = −F(q) + ln p(XK ) where F(q) = h− ln p(X, c, Z, ΘK )iq or F(q) = h− ln p(X, c, Z, U, ΘK )iq
I
I
Maximizing F(q) or minimizing KL(q : p) are equivalent and both give un upper bound to the evidence of the model ln p(XK ). When the optimum q ∗ is obtained, F(q ∗ ) can be used as a criterion for model selection.
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 89
Proposed VBA for Mixture of Studentt priors model I
Dirichlet
P Γ( l kk ) Y kl −1 al D(ak) = Q l Γ(kl ) l
I
Exponential E(tζ0 ) = ζ0 exp [−ζ0 t]
I
Gamma G(ta, b) =
I
b a a−1 t exp [−bt] Γ(a)
Inverse Wishart IW(Vγ, γ∆) =
 12 ∆γ/2 exp − 21 Tr ∆V−1 ΓD (γ/2)V
γ+D+1 2
.
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 90
Expressions of q q(c, Z, Θ) =q(c, QZ)Qq(Θ) = n k [q(cn = kznk ) q(znk )] Q k [q(αk ) q(βk ) q(µk Vk ) q(Vk )] q(a). with:
˜ ˜ = [k˜1 , · · · , k˜K ] q(a) = D(ak), k q(αk ) = G(αk ζ˜k , η˜k ) q(βk ) = G(βk ζ˜k , η˜k ) q(µk Vk ) = N (µk e µ, η˜−1 Vk ) ˜ q(Vk ) = IW(Vk ˜ γ , Σ)
With these choices, we have F(q(c, Z, Θ)) = hln p(X, c, Z, ΘK )iq(c,Z,Θ) =
YY k
Y F1kn + F2k
n
F1kn
= hln p(xn , cn , znk , θ k )iq(cn =kznk )q(znk )
F
= hln p(x , c , z , θ )i
k
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 91
VBA Algorithm step Expressions of the updating expressions of the tilded parameters are obtained by following three steps: I E step: Optimizing F with respect to q(c, Z) when keeping q(Θ) fixed, we obtain the expression of q(cn = kznk ) = ˜ak , q(znk ) = G(znk e αk , βek ). I M step: Optimizing F with respect to q(Θ) when keeping q(c, Z) fixed, we obtain the expression of ˜ ˜ = [k˜1 , · · · , k˜K ], q(αk ) = G(αk ζ˜k , η˜k ), q(a) = D(ak), k ˜ q(βk ) = G(βk ζk , η˜k ), q(µk Vk ) = N (µk e µ, η˜−1 Vk ), and ˜ q(Vk ) = IW(Vk ˜ γ , γ˜ Σ), which gives the updating algorithm for the corresponding tilded parameters. I F evaluation: After each E step and M step, we can also evaluate the expression of F(q) which can be used for stopping rule of the iterative algorithm. I Final value of F(q) for each value of K , noted Fk , can be used as a criterion for model selection, i.e.; the determination of the number of clusters.
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 92
VBA: choosing the good families for q I
Main question: We approximate p(X ) by q(X ). What are the quantities we have conserved? I I I I
a) Modes values: arg maxx {p(X )} = arg maxx {q(X )} ? b) Expected values: Ep (X ) = Eq (X ) ? c) Variances: Vp (X ) = Vq (X ) ? d) Entropies: Hp (X ) = Hq (X ) ?
I
Recent works shows some of these under some conditions.
I
For example, if p(x) = Z1 exp [−φ(x)] with φ(x) convex and symetric, properties a) and b) are satisfied.
I
Unfortunately, this is not the case for variances or other moments.
I
If p is in the exponential family, then choosing appropriate conjugate priors, the structure of q will be the same and we can obtain appropriate fast optimization algorithms.
A. MohammadDjafari, Inverse problems in financial and human sciences, Workshop at AUT, Tehran, Iran, September 2324, 2016 . 93
Conclusions
Bayesian approach with Hierarchical prior model with hidden variables are very powerful tools for inverse problems and Machine Learning. I The computational cost of all the sampling methods (MCMC and many others) are too high to be used in practical high dimensional applications. I We explored VBA tools for effective approximate Bayesian computation. I Application in different inverse problems in imaging system (3D X ray CT, Microwaves, PET, Ultrasound, Optical Diffusion Tomography (ODT), Acoustic source localization,...) I Clustering and classification of a set of data are between the most important tasks in statistical researches for many applications such as data mining in biology. I Mixture models are classical models for these tasks. I We proposed to use a mixture of generalised Studentt distribution model for more robustness. I To obtain fast algorithms andsciences, be able toathandle large data 2324, 2016 . 94 A. MohammadDjafari, Inverse problems in financial and human Workshop AUT, Tehran, Iran, September I