. Summary of multicomponent/multivariate data and time series signal analysis tools developed during C5Sys EraSysBio 2010-2013 Ali Mohammad-Djafari ` Groupe Problemes Inverses Laboratoire des Signaux et Syst`emes UMR 8506 CNRS - SUPELEC - Univ Paris Sud 11 ´ Supelec, Plateau de Moulon, 91192 Gif-sur-Yvette, FRANCE.
[email protected] http://djafari.free.fr http://www.lss.supelec.fr
C5Sys-ERASysBio concortium meeting, April 24-27, 2013, Florence, Italy A. Mohammad-Djafari,
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
1/46
Summary ◮
Visualization tools ◮ ◮ ◮
◮
Modeling time series ◮
◮
◮
◮
Time domain Transformed domain: Fourier, Wavelets, Splines, ... Scatter plots, histograms, ... Parametric: Superposition of sinusoids (COSINOR), Superposition of Gaussians shapes, ... Non Parametric: Fourier, Wavelets, ... Probabilistic: Moving Average (MA), Autoregressive (AR), ARMA, Markovian models, ...
Modeling the relation between data/signals ◮ ◮
Linear / Non linear Training and test data
A. Mohammad-Djafari,
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
2/46
Summary ◮
Simple Analysis: Computing spectra, Estimating periods, ...
◮
Multicomponent/Multivariate data analysis: Dimensional Reduction PCA, FA, ICA, Sparse PCA for dimensional reduction and main factors extraction
◮
Multicomponent/Multivariate Discriminant Analysis with classification: LDA, EDA, RDA, Sparse LDA for finding the most discriminant factors
◮
Blind sources separation
◮
Correlation (Pearson or Spearman) computation and dependency graph visualization
◮
Modelling input-output relations
A. Mohammad-Djafari,
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
3/46
Visualization and simple analysis example: Genes Clock time series and their FT Colon: Time Series
Fourier Transform
Colon: Clock Genes
Colon: Clock Genes
1.5
15 Rev Per2 Bmal1
1
Rev Per2 Bmal1
10
0.5 5
0 −0.5
0
3
6
9
12
15
18
0 −3
21
0.6
−2
−1
0
1
2
3
4
30 Rev Per2 Bmal1
0.4
Rev Per2 Bmal1
20
0.2 10
0 −0.2
0
3
6
9
12
15
18
0 −3
21
1.5
−2
−1
0
1
2
3
4
15 Rev Per2 Bmal1
1
Rev Per2 Bmal1
10
0.5 5
0 −0.5
0
3
6
9
12
15
18
0 −3
21
Liver: Time Series
−2
−1
0
1
2
3
4
Fourier Transform
Liver: Clock Genes
Liver: Clock Genes
4
80 Rev Per2 Bmal1
2
Rev Per2 Bmal1
60 40
0 −2
20 0
3
6
9
12
15
18
0 −3
21
1.5
−2
−1
0
1
2
3
4
150 Rev Per2 Bmal1
1
Rev Per2 Bmal1
100
0.5 50
0 −0.5
0
3
6
9
12
15
18
0 −3
21
10
−2
−1
0
1
2
3
4
80 Rev Per2 Bmal1
5
Rev Per2 Bmal1
60 40
0 −5
20 0
3 6 9 A. Mohammad-Djafari,
12
0
15 18 21 −3 −2 −1 0 1 2 C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
3 4/46
4
Scatterplot of Colon clock genes time series 0.6
1 2 3
Rev
0.4 0.2 0
Per2
1
0.5
0 0.4 Bmal1
0.3 0.2 0.1 0 −0.1 0
A. Mohammad-Djafari,
0.2 Rev
0.4
0.6
0
0.5
1 Per2
0
0.2 Bmal1
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
0.4
5/46
Temperature and activity Time series before, during and after some treatment 800
36
600
34
400
32
200
30 0
24
Activity MVT
Temperature C
BOD: before 38
0
48
800
36
600
34
400
32
200
30 0
24
Activity MVT
Temperature C
BOD: during 38
0
48
800
36
600
34
400
32
200
30
0 A. Mohammad-Djafari,
24 48 C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
0
6/46
Activity MVT
Temperature C
BOD: after 38
Temperatures Fourier domaine analysis 40
3000 2000
35 1000 30
0
24
48
40
0
24 12 8 6
4
3
2
1
24 12 8 6
4
3
2
1
24 12 8 6
4
3
2
1
800 600
35
400 200
30
0
24
48
40
0
1500 1000
35 500 30
0
24
48
Time series A. Mohammad-Djafari,
0
Spectra
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
7/46
Activity Fourier domaine analysis 4
600 3
x 10
400 2
200
1
0 −200
0
24
48
0
24 12 8 6
4
3
2
1
4
3
2
1
4
3
2
1
4
600
3
x 10
400 2
200 1
0 −200
0
24
48
0
24 12 8 6 4
600
6
x 10
400 4
200 2
0 −200
0
24
48
Time series A. Mohammad-Djafari,
0
24 12 8 6
Spectra
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
8/46
Temperatures, before, during and after changes U1
37 36 35 34 33
AUE during 36
36 35 34 33 37
36
36
L1
L1
35 34
U2
U1 U2
AUE before 37 36 35 34 33
34
35
36
L2
L2
34 32 37 36 35 34 33
34 33 300
act
400
act
35
200
200 100
0
0 33 34 35 36 37 U1
34
36
32
34 L1
U2
36
33 34 35 36 37 L2
0
200 400 act
34
35 U1
36
34 U2
AUE after
U1
37 36 35 34 33
U2
37
37 36 35 34 33
U1
36 35 34 33 37
U2
36 35 34 33 36
34
36 L1
34
36
0
100 200 300 act
0
200 400 act
L2
36
35
L1
L1
36
AUE: before(blue), during(green), after(red)
34
34 33 35
L2
L2
36 34 33
32 37 36 35 34 33
400
act
act
400 200 0
200 0
34
36 U1
34
36 U2
A. Mohammad-Djafari,
34
36 L1
34
36 L2
0
200 act
400
33 34 35 36 37 U1
34
36 U2
32
34 L1
36
33 34 35 36 37 L2
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
9/46
Temperatures, before, during and after changes: LDA and SLDA components space SLDA−Time
LDA−Time 14
1 2 3
12
1 2 3
2 1
10
0 −1
8
1.5
6
1 0.5 0
−24
−0.5
−26
−1 2
−28
1.5
−30 1
−32
0.5
−34
0
6
8
10
12
14
A. Mohammad-Djafari,
−34
−32
−30
−28
−26
−24
−1
0
1
2
−1
0
1
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
0
0.5
10/46
1
1.5
2
Simple Analysis tools: Period estimation. What do we mean by period ? Case of 3 sinusoids 1
0.7 0.8
0.6
0.6
0.4
0.5
0.2
0.4 0
0.3
−0.2
−0.4
0.2 −0.6
0.1 −0.8
−1
0
24
48
72
0
24 period in hours
time (hours)
Case of 3 sinusoids+noise 1
0.7
0.8
0.6 0.6
0.4
0.5
0.2
0.4 0
0.3
−0.2
−0.4
0.2
−0.6
0.1 −0.8
−1
0
24
48
72
0
24
12 period in hours
time (hours)
Case of few sinusoids+noise 1
0.45
0.8
0.4
0.6
0.35
0.4
0.2
0.3
0
0.25
−0.2
0.2
−0.4
0.15
0.1
−0.6
0.05
−0.8
−1
0
24
48 time (hours)
A. Mohammad-Djafari,
72
0 Inf
36
24
14.4 period in hours
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
11/46
How to define a period from Spectra S(ω) ? ◮
Consider S(ω) as a distribution Z
S(ω) dω = 1)
(normailise such that 0.7
0.45
0.6
0.4
0.35 0.5 0.3 0.4 0.25
0.3
0.2
0.15 0.2 0.1 0.1 0.05
0
0 Inf
24
36
24
period in hours
principal harmonic:
◮
mean harmonic:
◮
Lower and upper limits:
◮
principal period:
◮
mean period:
◮
Lower and upper limits:
A. Mohammad-Djafari,
14.4 period in hours
ωmod = arg max {S(ω)} ω Z ωmean = ω S(ω) dω
◮
ωL , ωU
pmod = arg max {S(p)} p Z pmean = p S(p) dp pL , pU
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
12/46
Fourier Transform, Autocorrelation and Spectra ◮
Monovariate time series: ◮ ◮ ◮
◮
Time serie: g(t) Aucorrelation function: γ(τ ) Fourier transform: f (ω)
Spectral density function definitions: S(ω) ◮
◮
Deterministic: Z f (ω) = g(t) exp [−jωt] dt −→ S(ω) = |f (ω)|2 Probabilistic:
γ(τ ) = E {g(t)g(t + τ )} −→ S(ω) = ◮
◮
(1)
Z
γ(τ ) exp [−jωτ ] dτ
(2)
S(ω) = S(2πν) = S(2π/p)
Multivariate time series ◮ ◮
g1 (t), · · · , gN (t) Estimating common factors spectra
A. Mohammad-Djafari,
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
13/46
How to estimate Spectra S(ω) ? ◮
Fast Fourier Transformn (FFT): g(t) −→ FFT −→ f (ω) −→ S(ω) = |f (ω)| ◮ ◮
Advantages: Well-known and understood, fast Drawbacks: linear in frequencies ν, but not equidistance in periods ν = [0, · · · , N − 1] −→ p = [∞, N, N/2, · · · , N/(N − 1)]
0.45
0.45
0.4
0.4
0.35
0.35
0.3
0.3
0.25
0.25
0.2
0.2
0.15
0.15
0.1
0.1
0.05
0 Inf
0.05
36 24
0 Inf
14.4
36
24
14.4
period in hours
0.45
period in hours
0.45
0.4
0.4
0.35
0.35
0.3
0.3
0.25
0.25
0.2
0.2
0.15
0.15
0.1
0.1
0.05
0.05
0 Inf
36
24
A. Mohammad-Djafari,
14.4 period in hours
0 Inf
36
24
14.4 period in hours
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
14/46
How to estimate Spectra S(ω) ? ◮
Autocorrelation function: γ(τ ) ◮
◮ ◮
If g(t) is periodic, then γ(τ ) is also periodic, but much smoother γ(0) = 1 γ(τ ) ≤ γ(τ ), ∀τ Distance between γ(0) and the next maximum gives the main period
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
−0.2
−0.2
−0.4
−0.4
−0.6
−0.6
−0.8
−1
period=24
−0.8
0
24
48
72
−1
0
12
time (hours)
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
−0.2
−0.2
−0.4
−0.4
−0.6
60
period=24
−0.6
−0.8
−1
24 36 48 autocorrelation lags (time in hours)
−0.8
0
24
48 time (hours)
g(t) A. Mohammad-Djafari,
72
−1
0
12
24 36 48 autocorrelation lags (time in hours)
60
γ(τ )
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
15/46
Period estimation 1
1
1
0.8
0.8
0.8
0.6
0.6
period=24
0.6
0.4
0.4
0.2
0.2
0.4
0.2
0
0
0
−0.2
−0.2
−0.2
−0.4
−0.4
−0.4
−0.6
−0.6
−0.6
−0.8
−0.8
period=23
period=24
−1
0
12
24
36 48 60 autocorrelation lags (time in hours)
0.4
−0.8
0
12
24
36 48 60 autocorrelation lags (time in hours)
0.4
period−mod
0.35
=24
period−low =18
0.25
0.05
A. Mohammad-Djafari,
24
period−mod
36 48 60 autocorrelation lags (time in hours)
period−low =17
period−mean=24
period−high=40
0.15
amplitude−mean=0.55853
=24
amplitude−mod =0.21788
0.3
0.2
period−high=42
0.1
0 period in hours
12
0.25
period−mean=24
0.05
24
0
0.35
period−low =17
0.15
amplitude−mean=0.65906
0.1
=24
amplitude−mod =0.24778
0.2
period−high=37
0.15
period−mod
0.3
0.25
period−mean=24
0.2
−1
0.4
0.35
amplitude−mod =0.30294
0.3
0
−1
amplitude−mean=0.57151
0.1
0.05
0
24 period in hours
24 period in hours
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
16/46
How to estimate Spectra S(ω) ? ◮
Fast Fourier Transform (FFT): g(t) −→ FFT −→ f (ω) −→ S(ω) = |f (ω)| ◮ ◮
◮
Autocorrelation function: γ(τ ) ◮
◮ ◮
◮ ◮
Advantages: Well-known and understood, fast Drawbacks: linear in frequencies ν, but not equidistance in periods ν = [0, · · · , N − 1] −→ p = [∞, 1, · · · , 1/(N − 1)] If g(t) is periodic, then γ(τ ) is also periodic, but much smoother γ(0) = 1 γ(τ ) ≤ γ(τ ), ∀τ Distance between γ(0) and the next maximum gives the main period
Autocorrelation function and FT: γ(τ ) −→ FFT S(ω) Inverse problem approach: (Q: Compute spectra for given values of periods) ◮
◮
f (p) −→ g(t) is a linear forward operation −→ f −→ H −→ g g = Hf + ǫ −→ b f
A. Mohammad-Djafari,
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
17/46
How to estimate Spectra S(ω) ? Inverse Problem Approach ◮
f (p) −→ g(t) is a linear forward operation X g(tm ) = f (ωn ) exp [j(2π/pn )tm ] −→ g = Hf + ǫ, n
Hmn = exp [j(2π/pn )tm ] , ◮
◮
m = 1, · · · M n = 1, · · · N
Discrete (Fast) Fourier Transform (DFT/FFT): g = Hf If M = N and if pm are chosen such that ωn = 2π/pn = [0 : N − 1]ω0 with ω0 = 2π/δt, then H is the DFT matrix and H ′ H = I −→ b f = H −1 g = H ′ g
General case: For example when we want to compute f (ω) for equidistance valued periods. Then, H ′ H 6= I and even the numbers data and unknowns different
A. Mohammad-Djafari,
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
18/46
How to estimate Spectra S(ω) ? Inverse Problem Approach ◮
General case g = Hf
◮
◮
Generalized inverse or pseudoinverse o n b f = (H ′ H)−1 H ′ g f = arg min kg − Hf k2 −→ b f o n b f = H ′ (HH ′ )−1 g f = arg min kf k2 s.t. Hf = g −→ b f
◮
Still better, if we account for errors and uncertainties
Better if we account for ill-conditioning of the H
g = Hf + ǫ
A. Mohammad-Djafari,
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
19/46
How to estimate Spectra S(ω) ? Inverse Problem Approach ◮
◮
◮
Regularization: Better if we account for ill-conditioning of the H o n b f = (H ′ H +λI)−1 H ′ g f = arg min kg − Hf k2 + λkf k2 −→ b f Still better, if we account for errors and uncertainties g = Hf + ǫ Bayesian approach: ◮ ◮ ◮ ◮ ◮
Assign the Likelihood : p(g|f ) Assign the prior law: p(f ) p(g |f ) p(f ) Use the Bayes rule : p(f |g) = p(g ) Use this posterior law to infer on f . For example MAP:
b f = arg max {p(f |g)} = arg min {J(f )} f f A. Mohammad-Djafari,
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
20/46
Bayesian estimation of spectra ◮ ◮
Bayesian approach: p(f |g) ∝ p(g|f ) p(f ) Use this posterior law to infer on f , for example MAP: b f = arg max {p(f |g)} = arg min {J(f )} f f
◮
but there are other possibilities: Posterior mean, median, ... Assuming Gaussian noise and Gaussian prior J(f ) = kg − Hf k2 + λkf k2
◮
Different priors (Gaussian, Generalized Gaussian, Cauchy,...) J(f ) = kg − Hf k2 + λΩ(f ) ◮ ◮ ◮
P Gaussian Ω(f ) = j |fj |2 P Generalized Gaussian Ω(f ) = j |fj |β , P Cauchy Ω(f ) = j ln 1 + |fj |2
A. Mohammad-Djafari,
2≤β≤2
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
21/46
Bayesian estimation of spectra with priors enforcing sparsity ◮
Sparsity: For any periodic signal, the spectrum is a set of Diracs
◮
Biological signals related to clocks: a few independent oscillators
◮
Spectrum has a few non zero elements in any given interval
◮
How to translate this information ?
◮
Use a heavy tailed prior law like Double exponential or Cauchy
◮
Use a hierarchical prior with hidden variables
◮
See my paper in Eurasip journal of Advances in signal processing
A. Mohammad-Djafari,
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
22/46
Available tools for spectral and period estimation ◮
Spectral estimation: f=spectral estimate(t,g,method,periods) methods: ◮ ◮ ◮ ◮ ◮
◮
FFT (range of periods is imposed) Autocorr+FFT (range of periods is imposed) IP:Gaussian (range of periods can be provided as desired) IP:GG (range of periods can be provided as desired) IP:Cauchy (range of periods can be provided as desired)
Period estimation: [p mod,a mod,f,p l,p u,p mean,a l,a u,a mean]= period estimate(t,g,method,periods) methods: ◮ ◮ ◮ ◮ ◮ ◮
Autocorr maxima FFT Autocorr+FFT IP:Gaussian (range of periods can be provided as desired) IP:GG (range of periods can be provided as desired) IP:Cauchy (range of periods can be provided as desired)
A. Mohammad-Djafari,
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
23/46
Multicomponent period estimation FFT
g1(t)
1
2
0.9 1.5
0.8 1
0.7
0.6
Amplitude
Amplitude
0.5
0
0.5
0.4 −0.5
0.3 −1
0.2 −1.5
−2
0.1
0
12
24
36
48 time
60
72
0
84
8
12
18
24
28
32
24
28
32
24
28
32
periods
FFT
g2(t)
1
2
0.9 1.5
0.8 1
0.7
0.6
Amplitude
Amplitude
0.5
0
0.5
0.4 −0.5
0.3 −1
0.2 −1.5
−2
0.1
0
12
24
36
48 time
60
72
0
84
8
12
18 periods
FFT
g (t) 3
1
2
0.9 1.5
0.8 1
0.7
0.6
Amplitude
Amplitude
0.5
0
0.5
0.4 −0.5
0.3 −1
0.2 −1.5
−2
0.1
0
12
24
36
48 time
60
72
84
0
8
12
18 periods
g k = Hf k + ǫk f k have some common spectra. A. Mohammad-Djafari,
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
24/46
Dimension reduction, PCA, Factor Analysis, ICA ◮
PCA, Factor Analysis and ICA try to answer to the question: How many Principal components (Factors, Independent Components) can describe the observed data? A : (M × N) Loading matrix , N ≤ M g(t) = Af (t)+ǫ(t), f (t) : factors, sources
◮
How to find both A and factors f (t) ?
◮
Deterministic methods: o n b b (A, f ) = arg min kg − Af k2 s.t. constraints on A and f (A,f
◮
Bayesian methods:
b b (A, f ) = arg max {p(A, f |g)} = arg min {ln p(g|A, f ) − ln p(A) − ln p(f )} (A,f ) (A,f ) A. Mohammad-Djafari,
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
25/46
Deterministic and Bayesian Factor Analysis ◮
Deterministic methods: o n b b (A, f ) = arg min kg − Af k2 s.t. constraints on A and f (A,f Uncorrelated (PCA), Independent (ICA)
◮
Bayesian methods: p((A, f |g) ∝ p(g(t)|A, f (t)) p(f (t)) p(A)
b b (A, f ) = arg max {p(A, f |g)} = arg min {ln p(g|A, f ) − ln p(A) − l (A,f ) (A,f ) o n b b (A, f ) = arg min kg − Af k2 + λ1 kAkβ1 + λ2 kf kβ2 (A,f )
β1 = 1 and β2 = 1 leads to sparse solutions ◮
These analysis can be done either directly on time series or on FT amplitudes.
A. Mohammad-Djafari,
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
26/46
How to determine the number of factors
◮
Model selection
◮
Bayesian or Maximum likelihood methods
◮
Gaussian case: p(g|A, N) = N (0, AΣf At + Σǫ )
◮
To determine the number of factors we do the analyze with different N factors and use two criteria: ◮
◮
-log likelihood − ln p(g|A, N) of the observations and DFE: Degrees of freedom error (N − M)2 − (N + M))/2 related to AIC or BIC model selection criteria.
A. Mohammad-Djafari,
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
27/46
Factor Analysis: 2 factors: Colon
Time series
FT Amplitudes
1
P53
Ccna2 0.8
0.8
Bax
Rev
P53
1
1
Ccnb2
UGT
Bax
Per2 Bmal1 DBP
0.8
Mdm2
0.6
0.6
Bax
Bcl2 0.4
Ccnb2
Rev
Ccna2 0.2
Wee1 DBP
0
UGT
Component 2
0.4
Bcl2
Wee1
P53 Bcl2 DBP
−0.2
Top1
DBP
0.4
−0.8
Bmal1
−0.8
Per2
0
Per2 −0.6
Rev
−1
−1 −1
1
−0.6
CE2
Per2
−0.4
Bax P53
−0.2 −0.4
0.2
Wee1
−0.6
Bmal1
0
Top1
−0.2 CE2
Wee1 Ccna2 CE2 Top1
0.2
UGT −0.4
Top1
0.6
Ccna2
Mdm2 CE2 0
Ccnb2 Mdm2 Bcl2
0.4
Ccnb2
UGT
Bmal1
0.2
0.6
0.8
Component 2
Mdm2
2
A. Mohammad-Djafari,
−0.8
−0.6
−0.4
−0.2
0 0.2 Component 1
0.4
0.6
0.8
1
Rev
−1
1
−0.8
−0.6
−0.4
−0.2
2
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
0 0.2 Component 1
0.4
0.6
28/46
0.8
1
Factor Analysis: Time series, colon
1 P53
P53
Bax
Bax
0.8
Mdm2
Mdm2
Bcl2
0.6 Bcl2
P53 0.8
0.8
Bax Mdm2
We1
Bax
0.8
0.6
Bcl2
Bax
Ccnb2
Ccnb2
0.4
Bcl2
0.6
Ccna2
Ccna2
Ccnb2
Ccnb2
We 1
0.2 Wee1
0.2 Wee1
0.4
Ccna2
DBP
0.2
Wee1
UGT
0
DBP
Wee1
UGT
Top1
−0.2 CE2
CE2
Bmal1
−0.4 Bmal1
−0.4 Bmal1
Per2
−0.4 Per2
Per2
Per2
Rev
Rev
.
1
2
−0.2
CE2 −0.4
0
Top1 −0.2
CE2
Bmal1
Bmal1
Rev 2
3
.
1
2
3
A. Mohammad-Djafari,
4
.
−0.6 1
2
3
4
5
10
0
.
−0.2
Per2
Rev
−0.6 1
2
3
4
5
6
20
0
CE2 Bmal1
Per2
Rev
0.2
UGT
−0.4
Per2
−0.6
−0.6 1
30
Top1
−0.4
Rev
.
40
DBP
UGT
0
Top1
−0.2
CE2
Bmal1
−0.6
50
0.4
Wee1
DBP
UGT
Top1
−0.2
−0.2
0.6
Ccna2
0
UGT
Top1
60
0.2
DBP
0
0
0.8
Ccnb2 0.4
Ccna2
0.2
Top1
1
Mdm2
Bcl2
−Log L DFE
0.4
Ccna2
DBP
CE2
Mdm2
80
70
Bax
0.6 0.4
1
P53
0.8
0.8
Bcl2
0.2 DBP UGT
1
P53
1
Mdm2
0.6 Bcl2
Ccnb2 0.4
Ccna2
P53
Bax Mdm2
0.6
Ccnb2
P53
−10
Rev 1
2
3
4
5
6
7
−20
1
2
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
3
4
29/46
5
6
7
Factor Analysis for each class: FT, Liver Two Factors: Class 1
Class 2
P53
Class 3 1
P53
0.8
Bax
P53 1
Bax
0.6
Bax
0.6
0.8
Mdm2
Mdm2 0.4
Bcl2
0.8
P53
0.9
Bax
P21
All Classes 1.2
1
0.2
Wee1
Mdm2
Bcl2
0.7
Bcl2
P21
0.6
P21
Wee1
Mdm2
0.4
0
Wee1
DBP
DBP
Wee1
0.4
−0.2
UGT
Top1
−0.4
Top1
0.3
Top1
CE2
0.2
CE2
−0.6
CE2
Bmal1
−0.8
Bmal1
CE2 −0.6 Bmal1
UGT
Per2
2
1
0
−0.2
Per2 −1
0
Rev
Three Factors: Class 1
0.2 Top1
0.1 Per2
−1 1
UGT −0.4
Bmal1 −0.8
0.4
DBP
−0.2
UGT
Rev
0.6
P21
0.5
0 DBP
Per2
0.8
Bcl2 0.2
Rev
2
Rev 1
Class 2
2
1
Class 3
2
All Classes 1
1
1.2
1 P53
0.8
P53
Bax
0.6
Bax
0.8
P53
P53
0.9
Mdm2
Bax 0.8
Mdm2
0.6
Mdm2
0.2
P21 Wee1
0
DBP
Bcl2
0.7
Bcl2
P21
0.6
P21
Wee1
CE2
−0.6
CE2
0.2
0.4 DBP
−0.2
UGT 0.3
0.6
P21 Wee1
0.4
Top1
0.8
Bcl2
0 DBP
UGT −0.4
Top1
0.2
Wee1
−0.2 UGT
0.4
0.5 DBP
1
Mdm2
0.4 Bcl2
Bax
Top1
−0.4
CE2
UGT
0.2
Top1 0
CE2 −0.6
Bmal1
Bmal1
0.1
−0.8 Per2
Per2 −1
Rev
Bmal1
Bmal1 −0.2
Per2
−0.8
Per2
0 Rev
Rev
Rev
−0.4
−1 1
2
3
A. Mohammad-Djafari,
1
2
3
1
2
3
1
2
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
3
30/46
Sparse PCA ◮
In classical PCA, FA and ICA, one looks to obtain principal (uncorrelated or independent) components.
◮
In Sparse PCA or FA, one looks for sparsest components. This leads to least variables selections. PCA
SPCA
PCA
SPCA 1
1 P53
0.8
Bax
P53
0.6
Bax
P53
0.8
Bax
0.6
P53
0.6
Bax
0.6 Mdm2
0.4
Mdm2
Mdm2
0.4
Mdm2 0.4
0.4
Bcl2 P21
0.2
Wee1
Bcl2
Bcl2 0.2
P21 Wee1
0.2
P21 Wee1
0
0.2
P21 Wee1
0
0 DBP
Bcl2
DBP
0 DBP
DBP −0.2
UGT
−0.2
Top1
−0.4
CE2
UGT
−0.2
Top1
UGT
UGT −0.4
Top1
CE2
−0.4
CE2
−0.6
−0.2
Top1 CE2
−0.4
−0.6 Bmal1
Bmal1
Bmal1
Bmal1 −0.8
−0.8
Per2 Rev
−1 1
2
A. Mohammad-Djafari,
−0.6
Per2 Rev
Per2 −1
Rev 1
2
−0.6
Per2
1
2
3
Rev 1
2
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
3
31/46
Discriminant Analysis ◮
◮
When we have data and classes, the question to answer is: What are the most discriminant factors? There are many variants: ◮ ◮ ◮ ◮
Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Exponential Discriminant Analysis (EDA), Regularized LDA (RLDA), ...
◮
One can also ask for Sparsest Linear Discriminant factors (SLDA)
◮
Deterministic point of view (Geometrical distances)
◮
Probabilistic point of view (Mixture densities)
◮
Mixture of Gaussians models: Each classe is modelled by a Gaussian pdf
A. Mohammad-Djafari,
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
32/46
Discriminant Analysis: Time series, Colon 1 2 3
6 4 2
P53
12
Bax
10
Mdm2 8
0 Bcl2
−2 Ccnb2
6
−6
Ccna2
4
−8
Wee1
2
DBP
1
UGT
−4
2
0
0
Top1
−2
CE2
−4
−1 −2
Bmal1
−3
−6
−4
Per2
−5
−8
Rev
−6 −5
0
A. Mohammad-Djafari,
5
−6
−4
−2
0
2
1
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
2
33/46
Sparse Discriminant Analysis
◮
The question to answer here is: What are the sparsest discriminant factors?
A. Mohammad-Djafari,
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
34/46
Sparse Discriminant Analysis: Time series, colon 12
1 2 3
10
P53
12
Bax
8 Mdm2 10
6 Bcl2
4 Ccnb2 8
50
Ccna2
40
Wee1 6
30
DBP
20
UGT
10
4
Top1
20 CE2
15
2
Bmal1
10
Per2
5
0
Rev
0 4
6
8
10
12 10
A. Mohammad-Djafari,
20
30
40
50
0
5
10
15
20
1
2
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
3
35/46
Sparse Discriminant Analysis: Time series, Liver 50 1 2 3
40
12 P53
30 Bax
10
20 Mdm2
10 Bcl2
8
P21
60
Wee1
40
6
DBP
20
UGT
30
Top1
4
CE2
20
2 Bmal1
10
Per2 0
0
Rev
10
20
30
40
50
A. Mohammad-Djafari,
20
40
60
0
10
20
30
1
2
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
3
36/46
LDA and SLDA study on time serie: 1:before, 2:during, 3:after SLDA−Time
LDA−Time −2
1 2 3
−4 −6
1 2 3
3 2 1
−8
0
−10
−1
−12
1.5 1
−14
0.5 0
13
−0.5
12
−1
11
2
10 9
1
8 0
7 6
−1
5 −14
−12
−10
−8
−6
−4
A. Mohammad-Djafari,
−2
6
8
10
12
−1
0
1
2
3
−1
0
1
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
−1
0
37/46
1
2
Dependancy graphs ◮
The main objective here is to show the dependencies between variables
◮
Three different measures can be used: Pearson ρ, Spearman ρs and Kendall τ
◮
In this study we used ρs
◮
A table of 2 by 2 mutual ρs are computed and used in different forms: Hinton, Adjacency table and Graphical network representation Hinton
Adjacency 1
P53
Network 1
P53
Bax
0.8
Mdm2 Bcl2
Rev
0.9
Bax Mdm2
P53 Bax
Bcl2
Ccnb2
0.4
Ccna2
Per2
0.8
0.6
Bmal1
0.7
Ccnb2 Ccna2
0.6
Mdm2
CE2
Bcl2
Top1
0.2 Wee1
Wee1 0.5
DBP
0
UGT
DBP 0.4
UGT −0.2
Top1
Top1
CE2
−0.4
Bmal1
0.3
Ccnb2
CE2
Ccna2
Per2
Rev
−0.8 Rev Per2 Bmal1 CE2 Top1 UGT DBP Wee1 Ccna2Ccnb2 Bcl2 Mdm2 Bax P53
A. Mohammad-Djafari,
UGT
0.2
Bmal1 −0.6
Per2
0.1
DBP Wee1
Rev Rev Per2 Bmal1 CE2 Top1 UGT DBP Wee1 Ccna2Ccnb2 Bcl2 Mdm2 Bax P53
0
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
38/46
Graph of Dependancies: Colon, Class 1 Time series 1 P53
1 P53
Bax
0.8
Mdm2
P53 Bax
Bcl2
Ccnb2
0.4
Ccna2
Per2
0.8
0.6 Bcl2
Rev
0.9
Bax Mdm2
Bmal1
0.7
Ccnb2 Ccna2
0.6
Mdm2
CE2
Bcl2
Top1
0.2 Wee1
Wee1 0.5
DBP
0
UGT
DBP 0.4
UGT −0.2
Top1
Top1
CE2
−0.4
Bmal1
0.3
Ccnb2
CE2
Ccna2
Per2
Rev
−0.8
UGT
0.2
Bmal1 −0.6
Per2
0.1
DBP Wee1
Rev
Rev Per2 Bmal1 CE2 Top1 UGT DBP Wee1 Ccna2Ccnb2 Bcl2 Mdm2 Bax P53
Rev Per2 Bmal1 CE2 Top1 UGT DBP Wee1 Ccna2Ccnb2 Bcl2 Mdm2 Bax P53
0
FT amplitudes 1 P53
P53
Bax Mdm2 Bcl2
Rev
0.9
Bax 0.8
Mdm2
P53
0.6
Ccnb2 Ccna2
Bax
0.4
Bmal1
0.7
Ccnb2 Ccna2
Wee1
Per2
0.8
Bcl2
0.6
Mdm2
CE2
Wee1 0.5
DBP
DBP
UGT
0.2
0.4
UGT
Top1
Top1
CE2
CE2
Ccna2
Per2
0.1
−0.2 Rev
UGT
0.2
Bmal1
Per2
Top1
Ccnb2
0 Bmal1
Bcl2
0.3
DBP Wee1
Rev Rev Per2 Bmal1 CE2 Top1 UGT DBP Wee1 Ccna2Ccnb2 Bcl2 Mdm2 Bax P53
A. Mohammad-Djafari,
Rev Per2 Bmal1 CE2 Top1 UGT DBP Wee1 Ccna2Ccnb2 Bcl2 Mdm2 Bax P53
0
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
39/46
Graph of Dependancies: Colon, Class 3 Time series 1 P53
1
Top1
P53
Bax
0.8
Mdm2
0.8
0.6 Bcl2 0.4
Wee1
DBP
Bcl2
Ccnb2
Per2
0.9
Bax Mdm2
0.7
Ccnb2
Bmal1 Ccna2
Ccna2
0.6
P53
CE2
Rev
0.2 Wee1
Wee1 0.5
DBP
0
UGT
DBP
UGT
0.4
UGT
Mdm2
−0.2 Top1
Top1
CE2
−0.4
Bmal1
0.3
0.2
Per2
Rev
−0.8
Bax
CE2 Bmal1
−0.6 Per2
Ccnb2
0.1
Ccna2 Bcl2
Rev
Rev Per2 Bmal1 CE2 Top1 UGT DBP Wee1 Ccna2Ccnb2 Bcl2 Mdm2 Bax P53
Rev Per2 Bmal1 CE2 Top1 UGT DBP Wee1 Ccna2Ccnb2 Bcl2 Mdm2 Bax P53
0
FT amplitudes 1 P53
P53
Bax
Bax 0.8
Mdm2 Bcl2
Rev
0.9
Mdm2
P53
Ccnb2
0.6
Ccna2
Bax
Bmal1
0.7
Ccnb2 Ccna2
Wee1
Per2
0.8
Bcl2
0.6
Mdm2
CE2
Wee1 0.4
DBP
0.5 DBP
UGT
0.4
UGT 0.2
Top1 CE2
Top1
Bmal1
0
Per2
Rev
Rev
−0.2 Rev Per2 Bmal1 CE2 Top1 UGT DBP Wee1 Ccna2Ccnb2 Bcl2 Mdm2 Bax P53
A. Mohammad-Djafari,
Top1
Ccnb2
UGT
0.2
Bmal1
Per2
Bcl2
0.3
CE2
Ccna2 0.1
Rev Per2 Bmal1 CE2 Top1 UGT DBP Wee1 Ccna2Ccnb2 Bcl2 Mdm2 Bax P53
DBP Wee1
0
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
40/46
Classification tools
◮
Supervised classification ◮ ◮ ◮
◮
K nearest neighbors methods Needs Training sets data Must be careful to measure the performances of the classification on a different set of data (Test set)
Unsupervised classification ◮ ◮ ◮ ◮
Mixture models Expectation-Maximization methods Bayesian versions of EM Bayesian Variational Approximation (VBA)
A. Mohammad-Djafari,
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
41/46
Classification tools 4
4
x 10
3.5
3
2.5
2
1.5
1
0.5
0
0
10
20
30
40
4
4
50 60 Time in hours
70
80
8000
x 10
15 16 17
3.5
7000
6000
3
90
100
110
4500 23 24 25 26 27 28 29 30 31 32
5 6 7 8 9
4000
3500
3000
5000
2.5
2500 2
4000
1.5
3000
1
2000
0.5
1000
2000
1500
0
1000
0
10
20
30
40
50 60 Time in hours
70
80
90
100
110
4
2
0
500
0
10
20
30
40
50 60 Time in hours
70
80
90
100
110
0
0
10
20
30
40
50 60 Time in hours
70
80
90
100
110
4500
x 10
1 2 3 4 10 11 12 18 33 34
1.8
1.6
1.4
13 14 19 20 21 22 35 36 37
4000
3500
3000
1.2
2500 1
2000 0.8
1500 0.6
1000 0.4
500
0.2
0
0
10
20
30
40
50 60 Time in hours
70
A. Mohammad-Djafari,
80
90
100
110
0
0
10
20
30
40
50 60 Time in hours
70
80
90
100
110
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
42/46
Input-Output modeling using training data and test data
◮
Linear models
◮
Bayesian framework, MAP estimation with hyperparameter estimation
◮
Careful identification and learning conditions
◮
See work of Mircea Dumitru et al for weight loss prediction from the two genes expressions of Bmal1 and Rev-erb-alpha
A. Mohammad-Djafari,
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
43/46
Application to C5Sys data ◮
Classification of data ◮
outputs of the cell cycle tracking: 3 curves per cell
◮
First classification based only on clock activity More general classification base on all variables When classification is done, then we can study the relation between CC and clock
◮ ◮
◮
Discrimination parameters between classes
◮
Analyzing data before and after some clocks knockdown
◮
We are applying these techniques on temperature-activity data before, during and after some treatment for Chronotherapy
A. Mohammad-Djafari,
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
44/46
Future ◮
Application to other C5Sys data ◮
◮
Modeling and model parameter estimation ◮
◮
With Jean, we are going to re-examine the estimation of the parameters of a Gamma pdf
Input-Output modeling using training and test data ◮
◮
´ erique, ´ With Jean and Fred we are going to re-examine the estimation of the parameters of a Gamma pdf
Inverse problems ◮
◮
´ Some difficulties but it will done with Franck and Celine
With Mircea, Xiaome and Francis, we processed some data relating two genes expressions and toxicity
Causalities ◮
Theoretical studies
A. Mohammad-Djafari,
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
45/46
Publications in relation with C5Sys ◮
J. Lapuyade-Lahorgue and A. Mohammad-Djafari, Nearest neighbors and correlation dimension for dimensionality estimation. Application to factor analysis of real biological time series data, in ESANN 2011 Proceedings, Gent, Belgium. ISBN 978-2-87419-044-5
◮
A. Mohammad-Djafari, G. Khodabandelou and J. Lapuyade-Lahorgue, A Matlab toolbox for data reduction, visualization, classification and knowledge extraction of complex biological data, BIOCOMP2011, Las Vegas, USA
◮
A. Mohammad-Djafari, Bayesian approach with prior models which enforce sparsity in signal and image processing, Review paper accepted for publication in European Association for Signal, Speech, and Image Processing (EURASIP) special issue on Sparsity in signal and image processing.
A. Mohammad-Djafari,
C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,
46/46