multivariate data and time series signal

Apr 27, 2013 - See work of Mircea Dumitru et al for weight loss prediction from the two genes ... Application to C5Sys data. ▷ Classification ... relating two genes expressions and toxicity ... Application to factor analysis of real biological time.
35MB taille 2 téléchargements 331 vues
. Summary of multicomponent/multivariate data and time series signal analysis tools developed during C5Sys EraSysBio 2010-2013 Ali Mohammad-Djafari ` Groupe Problemes Inverses Laboratoire des Signaux et Syst`emes UMR 8506 CNRS - SUPELEC - Univ Paris Sud 11 ´ Supelec, Plateau de Moulon, 91192 Gif-sur-Yvette, FRANCE. [email protected] http://djafari.free.fr http://www.lss.supelec.fr

C5Sys-ERASysBio concortium meeting, April 24-27, 2013, Florence, Italy A. Mohammad-Djafari,

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

1/46

Summary ◮

Visualization tools ◮ ◮ ◮



Modeling time series ◮







Time domain Transformed domain: Fourier, Wavelets, Splines, ... Scatter plots, histograms, ... Parametric: Superposition of sinusoids (COSINOR), Superposition of Gaussians shapes, ... Non Parametric: Fourier, Wavelets, ... Probabilistic: Moving Average (MA), Autoregressive (AR), ARMA, Markovian models, ...

Modeling the relation between data/signals ◮ ◮

Linear / Non linear Training and test data

A. Mohammad-Djafari,

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

2/46

Summary ◮

Simple Analysis: Computing spectra, Estimating periods, ...



Multicomponent/Multivariate data analysis: Dimensional Reduction PCA, FA, ICA, Sparse PCA for dimensional reduction and main factors extraction



Multicomponent/Multivariate Discriminant Analysis with classification: LDA, EDA, RDA, Sparse LDA for finding the most discriminant factors



Blind sources separation



Correlation (Pearson or Spearman) computation and dependency graph visualization



Modelling input-output relations

A. Mohammad-Djafari,

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

3/46

Visualization and simple analysis example: Genes Clock time series and their FT Colon: Time Series

Fourier Transform

Colon: Clock Genes

Colon: Clock Genes

1.5

15 Rev Per2 Bmal1

1

Rev Per2 Bmal1

10

0.5 5

0 −0.5

0

3

6

9

12

15

18

0 −3

21

0.6

−2

−1

0

1

2

3

4

30 Rev Per2 Bmal1

0.4

Rev Per2 Bmal1

20

0.2 10

0 −0.2

0

3

6

9

12

15

18

0 −3

21

1.5

−2

−1

0

1

2

3

4

15 Rev Per2 Bmal1

1

Rev Per2 Bmal1

10

0.5 5

0 −0.5

0

3

6

9

12

15

18

0 −3

21

Liver: Time Series

−2

−1

0

1

2

3

4

Fourier Transform

Liver: Clock Genes

Liver: Clock Genes

4

80 Rev Per2 Bmal1

2

Rev Per2 Bmal1

60 40

0 −2

20 0

3

6

9

12

15

18

0 −3

21

1.5

−2

−1

0

1

2

3

4

150 Rev Per2 Bmal1

1

Rev Per2 Bmal1

100

0.5 50

0 −0.5

0

3

6

9

12

15

18

0 −3

21

10

−2

−1

0

1

2

3

4

80 Rev Per2 Bmal1

5

Rev Per2 Bmal1

60 40

0 −5

20 0

3 6 9 A. Mohammad-Djafari,

12

0

15 18 21 −3 −2 −1 0 1 2 C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

3 4/46

4

Scatterplot of Colon clock genes time series 0.6

1 2 3

Rev

0.4 0.2 0

Per2

1

0.5

0 0.4 Bmal1

0.3 0.2 0.1 0 −0.1 0

A. Mohammad-Djafari,

0.2 Rev

0.4

0.6

0

0.5

1 Per2

0

0.2 Bmal1

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

0.4

5/46

Temperature and activity Time series before, during and after some treatment 800

36

600

34

400

32

200

30 0

24

Activity MVT

Temperature C

BOD: before 38

0

48

800

36

600

34

400

32

200

30 0

24

Activity MVT

Temperature C

BOD: during 38

0

48

800

36

600

34

400

32

200

30

0 A. Mohammad-Djafari,

24 48 C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

0

6/46

Activity MVT

Temperature C

BOD: after 38

Temperatures Fourier domaine analysis 40

3000 2000

35 1000 30

0

24

48

40

0

24 12 8 6

4

3

2

1

24 12 8 6

4

3

2

1

24 12 8 6

4

3

2

1

800 600

35

400 200

30

0

24

48

40

0

1500 1000

35 500 30

0

24

48

Time series A. Mohammad-Djafari,

0

Spectra

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

7/46

Activity Fourier domaine analysis 4

600 3

x 10

400 2

200

1

0 −200

0

24

48

0

24 12 8 6

4

3

2

1

4

3

2

1

4

3

2

1

4

600

3

x 10

400 2

200 1

0 −200

0

24

48

0

24 12 8 6 4

600

6

x 10

400 4

200 2

0 −200

0

24

48

Time series A. Mohammad-Djafari,

0

24 12 8 6

Spectra

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

8/46

Temperatures, before, during and after changes U1

37 36 35 34 33

AUE during 36

36 35 34 33 37

36

36

L1

L1

35 34

U2

U1 U2

AUE before 37 36 35 34 33

34

35

36

L2

L2

34 32 37 36 35 34 33

34 33 300

act

400

act

35

200

200 100

0

0 33 34 35 36 37 U1

34

36

32

34 L1

U2

36

33 34 35 36 37 L2

0

200 400 act

34

35 U1

36

34 U2

AUE after

U1

37 36 35 34 33

U2

37

37 36 35 34 33

U1

36 35 34 33 37

U2

36 35 34 33 36

34

36 L1

34

36

0

100 200 300 act

0

200 400 act

L2

36

35

L1

L1

36

AUE: before(blue), during(green), after(red)

34

34 33 35

L2

L2

36 34 33

32 37 36 35 34 33

400

act

act

400 200 0

200 0

34

36 U1

34

36 U2

A. Mohammad-Djafari,

34

36 L1

34

36 L2

0

200 act

400

33 34 35 36 37 U1

34

36 U2

32

34 L1

36

33 34 35 36 37 L2

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

9/46

Temperatures, before, during and after changes: LDA and SLDA components space SLDA−Time

LDA−Time 14

1 2 3

12

1 2 3

2 1

10

0 −1

8

1.5

6

1 0.5 0

−24

−0.5

−26

−1 2

−28

1.5

−30 1

−32

0.5

−34

0

6

8

10

12

14

A. Mohammad-Djafari,

−34

−32

−30

−28

−26

−24

−1

0

1

2

−1

0

1

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

0

0.5

10/46

1

1.5

2

Simple Analysis tools: Period estimation. What do we mean by period ? Case of 3 sinusoids 1

0.7 0.8

0.6

0.6

0.4

0.5

0.2

0.4 0

0.3

−0.2

−0.4

0.2 −0.6

0.1 −0.8

−1

0

24

48

72

0

24 period in hours

time (hours)

Case of 3 sinusoids+noise 1

0.7

0.8

0.6 0.6

0.4

0.5

0.2

0.4 0

0.3

−0.2

−0.4

0.2

−0.6

0.1 −0.8

−1

0

24

48

72

0

24

12 period in hours

time (hours)

Case of few sinusoids+noise 1

0.45

0.8

0.4

0.6

0.35

0.4

0.2

0.3

0

0.25

−0.2

0.2

−0.4

0.15

0.1

−0.6

0.05

−0.8

−1

0

24

48 time (hours)

A. Mohammad-Djafari,

72

0 Inf

36

24

14.4 period in hours

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

11/46

How to define a period from Spectra S(ω) ? ◮

Consider S(ω) as a distribution Z

S(ω) dω = 1)

(normailise such that 0.7

0.45

0.6

0.4

0.35 0.5 0.3 0.4 0.25

0.3

0.2

0.15 0.2 0.1 0.1 0.05

0

0 Inf

24

36

24

period in hours

principal harmonic:



mean harmonic:



Lower and upper limits:



principal period:



mean period:



Lower and upper limits:

A. Mohammad-Djafari,

14.4 period in hours

ωmod = arg max {S(ω)} ω Z ωmean = ω S(ω) dω



ωL , ωU

pmod = arg max {S(p)} p Z pmean = p S(p) dp pL , pU

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

12/46

Fourier Transform, Autocorrelation and Spectra ◮

Monovariate time series: ◮ ◮ ◮



Time serie: g(t) Aucorrelation function: γ(τ ) Fourier transform: f (ω)

Spectral density function definitions: S(ω) ◮



Deterministic: Z f (ω) = g(t) exp [−jωt] dt −→ S(ω) = |f (ω)|2 Probabilistic:

γ(τ ) = E {g(t)g(t + τ )} −→ S(ω) = ◮



(1)

Z

γ(τ ) exp [−jωτ ] dτ

(2)

S(ω) = S(2πν) = S(2π/p)

Multivariate time series ◮ ◮

g1 (t), · · · , gN (t) Estimating common factors spectra

A. Mohammad-Djafari,

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

13/46

How to estimate Spectra S(ω) ? ◮

Fast Fourier Transformn (FFT): g(t) −→ FFT −→ f (ω) −→ S(ω) = |f (ω)| ◮ ◮

Advantages: Well-known and understood, fast Drawbacks: linear in frequencies ν, but not equidistance in periods ν = [0, · · · , N − 1] −→ p = [∞, N, N/2, · · · , N/(N − 1)]

0.45

0.45

0.4

0.4

0.35

0.35

0.3

0.3

0.25

0.25

0.2

0.2

0.15

0.15

0.1

0.1

0.05

0 Inf

0.05

36 24

0 Inf

14.4

36

24

14.4

period in hours

0.45

period in hours

0.45

0.4

0.4

0.35

0.35

0.3

0.3

0.25

0.25

0.2

0.2

0.15

0.15

0.1

0.1

0.05

0.05

0 Inf

36

24

A. Mohammad-Djafari,

14.4 period in hours

0 Inf

36

24

14.4 period in hours

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

14/46

How to estimate Spectra S(ω) ? ◮

Autocorrelation function: γ(τ ) ◮

◮ ◮

If g(t) is periodic, then γ(τ ) is also periodic, but much smoother γ(0) = 1 γ(τ ) ≤ γ(τ ), ∀τ Distance between γ(0) and the next maximum gives the main period

1

1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0

0

−0.2

−0.2

−0.4

−0.4

−0.6

−0.6

−0.8

−1

period=24

−0.8

0

24

48

72

−1

0

12

time (hours)

1

1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0

0

−0.2

−0.2

−0.4

−0.4

−0.6

60

period=24

−0.6

−0.8

−1

24 36 48 autocorrelation lags (time in hours)

−0.8

0

24

48 time (hours)

g(t) A. Mohammad-Djafari,

72

−1

0

12

24 36 48 autocorrelation lags (time in hours)

60

γ(τ )

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

15/46

Period estimation 1

1

1

0.8

0.8

0.8

0.6

0.6

period=24

0.6

0.4

0.4

0.2

0.2

0.4

0.2

0

0

0

−0.2

−0.2

−0.2

−0.4

−0.4

−0.4

−0.6

−0.6

−0.6

−0.8

−0.8

period=23

period=24

−1

0

12

24

36 48 60 autocorrelation lags (time in hours)

0.4

−0.8

0

12

24

36 48 60 autocorrelation lags (time in hours)

0.4

period−mod

0.35

=24

period−low =18

0.25

0.05

A. Mohammad-Djafari,

24

period−mod

36 48 60 autocorrelation lags (time in hours)

period−low =17

period−mean=24

period−high=40

0.15

amplitude−mean=0.55853

=24

amplitude−mod =0.21788

0.3

0.2

period−high=42

0.1

0 period in hours

12

0.25

period−mean=24

0.05

24

0

0.35

period−low =17

0.15

amplitude−mean=0.65906

0.1

=24

amplitude−mod =0.24778

0.2

period−high=37

0.15

period−mod

0.3

0.25

period−mean=24

0.2

−1

0.4

0.35

amplitude−mod =0.30294

0.3

0

−1

amplitude−mean=0.57151

0.1

0.05

0

24 period in hours

24 period in hours

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

16/46

How to estimate Spectra S(ω) ? ◮

Fast Fourier Transform (FFT): g(t) −→ FFT −→ f (ω) −→ S(ω) = |f (ω)| ◮ ◮



Autocorrelation function: γ(τ ) ◮

◮ ◮

◮ ◮

Advantages: Well-known and understood, fast Drawbacks: linear in frequencies ν, but not equidistance in periods ν = [0, · · · , N − 1] −→ p = [∞, 1, · · · , 1/(N − 1)] If g(t) is periodic, then γ(τ ) is also periodic, but much smoother γ(0) = 1 γ(τ ) ≤ γ(τ ), ∀τ Distance between γ(0) and the next maximum gives the main period

Autocorrelation function and FT: γ(τ ) −→ FFT S(ω) Inverse problem approach: (Q: Compute spectra for given values of periods) ◮



f (p) −→ g(t) is a linear forward operation −→ f −→ H −→ g g = Hf + ǫ −→ b f

A. Mohammad-Djafari,

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

17/46

How to estimate Spectra S(ω) ? Inverse Problem Approach ◮

f (p) −→ g(t) is a linear forward operation X g(tm ) = f (ωn ) exp [j(2π/pn )tm ] −→ g = Hf + ǫ, n

Hmn = exp [j(2π/pn )tm ] , ◮



m = 1, · · · M n = 1, · · · N

Discrete (Fast) Fourier Transform (DFT/FFT): g = Hf If M = N and if pm are chosen such that ωn = 2π/pn = [0 : N − 1]ω0 with ω0 = 2π/δt, then H is the DFT matrix and H ′ H = I −→ b f = H −1 g = H ′ g

General case: For example when we want to compute f (ω) for equidistance valued periods. Then, H ′ H 6= I and even the numbers data and unknowns different

A. Mohammad-Djafari,

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

18/46

How to estimate Spectra S(ω) ? Inverse Problem Approach ◮

General case g = Hf





Generalized inverse or pseudoinverse o n b f = (H ′ H)−1 H ′ g f = arg min kg − Hf k2 −→ b f o n b f = H ′ (HH ′ )−1 g f = arg min kf k2 s.t. Hf = g −→ b f



Still better, if we account for errors and uncertainties

Better if we account for ill-conditioning of the H

g = Hf + ǫ

A. Mohammad-Djafari,

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

19/46

How to estimate Spectra S(ω) ? Inverse Problem Approach ◮





Regularization: Better if we account for ill-conditioning of the H o n b f = (H ′ H +λI)−1 H ′ g f = arg min kg − Hf k2 + λkf k2 −→ b f Still better, if we account for errors and uncertainties g = Hf + ǫ Bayesian approach: ◮ ◮ ◮ ◮ ◮

Assign the Likelihood : p(g|f ) Assign the prior law: p(f ) p(g |f ) p(f ) Use the Bayes rule : p(f |g) = p(g ) Use this posterior law to infer on f . For example MAP:

b f = arg max {p(f |g)} = arg min {J(f )} f f A. Mohammad-Djafari,

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

20/46

Bayesian estimation of spectra ◮ ◮

Bayesian approach: p(f |g) ∝ p(g|f ) p(f ) Use this posterior law to infer on f , for example MAP: b f = arg max {p(f |g)} = arg min {J(f )} f f



but there are other possibilities: Posterior mean, median, ... Assuming Gaussian noise and Gaussian prior J(f ) = kg − Hf k2 + λkf k2



Different priors (Gaussian, Generalized Gaussian, Cauchy,...) J(f ) = kg − Hf k2 + λΩ(f ) ◮ ◮ ◮

P Gaussian Ω(f ) = j |fj |2 P Generalized Gaussian Ω(f ) = j |fj |β ,  P Cauchy Ω(f ) = j ln 1 + |fj |2

A. Mohammad-Djafari,

2≤β≤2

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

21/46

Bayesian estimation of spectra with priors enforcing sparsity ◮

Sparsity: For any periodic signal, the spectrum is a set of Diracs



Biological signals related to clocks: a few independent oscillators



Spectrum has a few non zero elements in any given interval



How to translate this information ?



Use a heavy tailed prior law like Double exponential or Cauchy



Use a hierarchical prior with hidden variables



See my paper in Eurasip journal of Advances in signal processing

A. Mohammad-Djafari,

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

22/46

Available tools for spectral and period estimation ◮

Spectral estimation: f=spectral estimate(t,g,method,periods) methods: ◮ ◮ ◮ ◮ ◮



FFT (range of periods is imposed) Autocorr+FFT (range of periods is imposed) IP:Gaussian (range of periods can be provided as desired) IP:GG (range of periods can be provided as desired) IP:Cauchy (range of periods can be provided as desired)

Period estimation: [p mod,a mod,f,p l,p u,p mean,a l,a u,a mean]= period estimate(t,g,method,periods) methods: ◮ ◮ ◮ ◮ ◮ ◮

Autocorr maxima FFT Autocorr+FFT IP:Gaussian (range of periods can be provided as desired) IP:GG (range of periods can be provided as desired) IP:Cauchy (range of periods can be provided as desired)

A. Mohammad-Djafari,

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

23/46

Multicomponent period estimation FFT

g1(t)

1

2

0.9 1.5

0.8 1

0.7

0.6

Amplitude

Amplitude

0.5

0

0.5

0.4 −0.5

0.3 −1

0.2 −1.5

−2

0.1

0

12

24

36

48 time

60

72

0

84

8

12

18

24

28

32

24

28

32

24

28

32

periods

FFT

g2(t)

1

2

0.9 1.5

0.8 1

0.7

0.6

Amplitude

Amplitude

0.5

0

0.5

0.4 −0.5

0.3 −1

0.2 −1.5

−2

0.1

0

12

24

36

48 time

60

72

0

84

8

12

18 periods

FFT

g (t) 3

1

2

0.9 1.5

0.8 1

0.7

0.6

Amplitude

Amplitude

0.5

0

0.5

0.4 −0.5

0.3 −1

0.2 −1.5

−2

0.1

0

12

24

36

48 time

60

72

84

0

8

12

18 periods

g k = Hf k + ǫk f k have some common spectra. A. Mohammad-Djafari,

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

24/46

Dimension reduction, PCA, Factor Analysis, ICA ◮

PCA, Factor Analysis and ICA try to answer to the question: How many Principal components (Factors, Independent Components) can describe the observed data?  A : (M × N) Loading matrix , N ≤ M g(t) = Af (t)+ǫ(t), f (t) : factors, sources



How to find both A and factors f (t) ?



Deterministic methods: o n b b (A, f ) = arg min kg − Af k2 s.t. constraints on A and f (A,f



Bayesian methods:

b b (A, f ) = arg max {p(A, f |g)} = arg min {ln p(g|A, f ) − ln p(A) − ln p(f )} (A,f ) (A,f ) A. Mohammad-Djafari,

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

25/46

Deterministic and Bayesian Factor Analysis ◮

Deterministic methods: o n b b (A, f ) = arg min kg − Af k2 s.t. constraints on A and f (A,f Uncorrelated (PCA), Independent (ICA)



Bayesian methods: p((A, f |g) ∝ p(g(t)|A, f (t)) p(f (t)) p(A)

b b (A, f ) = arg max {p(A, f |g)} = arg min {ln p(g|A, f ) − ln p(A) − l (A,f ) (A,f ) o n b b (A, f ) = arg min kg − Af k2 + λ1 kAkβ1 + λ2 kf kβ2 (A,f )

β1 = 1 and β2 = 1 leads to sparse solutions ◮

These analysis can be done either directly on time series or on FT amplitudes.

A. Mohammad-Djafari,

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

26/46

How to determine the number of factors



Model selection



Bayesian or Maximum likelihood methods



Gaussian case: p(g|A, N) = N (0, AΣf At + Σǫ )



To determine the number of factors we do the analyze with different N factors and use two criteria: ◮



-log likelihood − ln p(g|A, N) of the observations and DFE: Degrees of freedom error (N − M)2 − (N + M))/2 related to AIC or BIC model selection criteria.

A. Mohammad-Djafari,

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

27/46

Factor Analysis: 2 factors: Colon

Time series

FT Amplitudes

1

P53

Ccna2 0.8

0.8

Bax

Rev

P53

1

1

Ccnb2

UGT

Bax

Per2 Bmal1 DBP

0.8

Mdm2

0.6

0.6

Bax

Bcl2 0.4

Ccnb2

Rev

Ccna2 0.2

Wee1 DBP

0

UGT

Component 2

0.4

Bcl2

Wee1

P53 Bcl2 DBP

−0.2

Top1

DBP

0.4

−0.8

Bmal1

−0.8

Per2

0

Per2 −0.6

Rev

−1

−1 −1

1

−0.6

CE2

Per2

−0.4

Bax P53

−0.2 −0.4

0.2

Wee1

−0.6

Bmal1

0

Top1

−0.2 CE2

Wee1 Ccna2 CE2 Top1

0.2

UGT −0.4

Top1

0.6

Ccna2

Mdm2 CE2 0

Ccnb2 Mdm2 Bcl2

0.4

Ccnb2

UGT

Bmal1

0.2

0.6

0.8

Component 2

Mdm2

2

A. Mohammad-Djafari,

−0.8

−0.6

−0.4

−0.2

0 0.2 Component 1

0.4

0.6

0.8

1

Rev

−1

1

−0.8

−0.6

−0.4

−0.2

2

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

0 0.2 Component 1

0.4

0.6

28/46

0.8

1

Factor Analysis: Time series, colon

1 P53

P53

Bax

Bax

0.8

Mdm2

Mdm2

Bcl2

0.6 Bcl2

P53 0.8

0.8

Bax Mdm2

We1

Bax

0.8

0.6

Bcl2

Bax

Ccnb2

Ccnb2

0.4

Bcl2

0.6

Ccna2

Ccna2

Ccnb2

Ccnb2

We 1

0.2 Wee1

0.2 Wee1

0.4

Ccna2

DBP

0.2

Wee1

UGT

0

DBP

Wee1

UGT

Top1

−0.2 CE2

CE2

Bmal1

−0.4 Bmal1

−0.4 Bmal1

Per2

−0.4 Per2

Per2

Per2

Rev

Rev

.

1

2

−0.2

CE2 −0.4

0

Top1 −0.2

CE2

Bmal1

Bmal1

Rev 2

3

.

1

2

3

A. Mohammad-Djafari,

4

.

−0.6 1

2

3

4

5

10

0

.

−0.2

Per2

Rev

−0.6 1

2

3

4

5

6

20

0

CE2 Bmal1

Per2

Rev

0.2

UGT

−0.4

Per2

−0.6

−0.6 1

30

Top1

−0.4

Rev

.

40

DBP

UGT

0

Top1

−0.2

CE2

Bmal1

−0.6

50

0.4

Wee1

DBP

UGT

Top1

−0.2

−0.2

0.6

Ccna2

0

UGT

Top1

60

0.2

DBP

0

0

0.8

Ccnb2 0.4

Ccna2

0.2

Top1

1

Mdm2

Bcl2

−Log L DFE

0.4

Ccna2

DBP

CE2

Mdm2

80

70

Bax

0.6 0.4

1

P53

0.8

0.8

Bcl2

0.2 DBP UGT

1

P53

1

Mdm2

0.6 Bcl2

Ccnb2 0.4

Ccna2

P53

Bax Mdm2

0.6

Ccnb2

P53

−10

Rev 1

2

3

4

5

6

7

−20

1

2

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

3

4

29/46

5

6

7

Factor Analysis for each class: FT, Liver Two Factors: Class 1

Class 2

P53

Class 3 1

P53

0.8

Bax

P53 1

Bax

0.6

Bax

0.6

0.8

Mdm2

Mdm2 0.4

Bcl2

0.8

P53

0.9

Bax

P21

All Classes 1.2

1

0.2

Wee1

Mdm2

Bcl2

0.7

Bcl2

P21

0.6

P21

Wee1

Mdm2

0.4

0

Wee1

DBP

DBP

Wee1

0.4

−0.2

UGT

Top1

−0.4

Top1

0.3

Top1

CE2

0.2

CE2

−0.6

CE2

Bmal1

−0.8

Bmal1

CE2 −0.6 Bmal1

UGT

Per2

2

1

0

−0.2

Per2 −1

0

Rev

Three Factors: Class 1

0.2 Top1

0.1 Per2

−1 1

UGT −0.4

Bmal1 −0.8

0.4

DBP

−0.2

UGT

Rev

0.6

P21

0.5

0 DBP

Per2

0.8

Bcl2 0.2

Rev

2

Rev 1

Class 2

2

1

Class 3

2

All Classes 1

1

1.2

1 P53

0.8

P53

Bax

0.6

Bax

0.8

P53

P53

0.9

Mdm2

Bax 0.8

Mdm2

0.6

Mdm2

0.2

P21 Wee1

0

DBP

Bcl2

0.7

Bcl2

P21

0.6

P21

Wee1

CE2

−0.6

CE2

0.2

0.4 DBP

−0.2

UGT 0.3

0.6

P21 Wee1

0.4

Top1

0.8

Bcl2

0 DBP

UGT −0.4

Top1

0.2

Wee1

−0.2 UGT

0.4

0.5 DBP

1

Mdm2

0.4 Bcl2

Bax

Top1

−0.4

CE2

UGT

0.2

Top1 0

CE2 −0.6

Bmal1

Bmal1

0.1

−0.8 Per2

Per2 −1

Rev

Bmal1

Bmal1 −0.2

Per2

−0.8

Per2

0 Rev

Rev

Rev

−0.4

−1 1

2

3

A. Mohammad-Djafari,

1

2

3

1

2

3

1

2

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

3

30/46

Sparse PCA ◮

In classical PCA, FA and ICA, one looks to obtain principal (uncorrelated or independent) components.



In Sparse PCA or FA, one looks for sparsest components. This leads to least variables selections. PCA

SPCA

PCA

SPCA 1

1 P53

0.8

Bax

P53

0.6

Bax

P53

0.8

Bax

0.6

P53

0.6

Bax

0.6 Mdm2

0.4

Mdm2

Mdm2

0.4

Mdm2 0.4

0.4

Bcl2 P21

0.2

Wee1

Bcl2

Bcl2 0.2

P21 Wee1

0.2

P21 Wee1

0

0.2

P21 Wee1

0

0 DBP

Bcl2

DBP

0 DBP

DBP −0.2

UGT

−0.2

Top1

−0.4

CE2

UGT

−0.2

Top1

UGT

UGT −0.4

Top1

CE2

−0.4

CE2

−0.6

−0.2

Top1 CE2

−0.4

−0.6 Bmal1

Bmal1

Bmal1

Bmal1 −0.8

−0.8

Per2 Rev

−1 1

2

A. Mohammad-Djafari,

−0.6

Per2 Rev

Per2 −1

Rev 1

2

−0.6

Per2

1

2

3

Rev 1

2

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

3

31/46

Discriminant Analysis ◮



When we have data and classes, the question to answer is: What are the most discriminant factors? There are many variants: ◮ ◮ ◮ ◮

Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Exponential Discriminant Analysis (EDA), Regularized LDA (RLDA), ...



One can also ask for Sparsest Linear Discriminant factors (SLDA)



Deterministic point of view (Geometrical distances)



Probabilistic point of view (Mixture densities)



Mixture of Gaussians models: Each classe is modelled by a Gaussian pdf

A. Mohammad-Djafari,

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

32/46

Discriminant Analysis: Time series, Colon 1 2 3

6 4 2

P53

12

Bax

10

Mdm2 8

0 Bcl2

−2 Ccnb2

6

−6

Ccna2

4

−8

Wee1

2

DBP

1

UGT

−4

2

0

0

Top1

−2

CE2

−4

−1 −2

Bmal1

−3

−6

−4

Per2

−5

−8

Rev

−6 −5

0

A. Mohammad-Djafari,

5

−6

−4

−2

0

2

1

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

2

33/46

Sparse Discriminant Analysis



The question to answer here is: What are the sparsest discriminant factors?

A. Mohammad-Djafari,

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

34/46

Sparse Discriminant Analysis: Time series, colon 12

1 2 3

10

P53

12

Bax

8 Mdm2 10

6 Bcl2

4 Ccnb2 8

50

Ccna2

40

Wee1 6

30

DBP

20

UGT

10

4

Top1

20 CE2

15

2

Bmal1

10

Per2

5

0

Rev

0 4

6

8

10

12 10

A. Mohammad-Djafari,

20

30

40

50

0

5

10

15

20

1

2

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

3

35/46

Sparse Discriminant Analysis: Time series, Liver 50 1 2 3

40

12 P53

30 Bax

10

20 Mdm2

10 Bcl2

8

P21

60

Wee1

40

6

DBP

20

UGT

30

Top1

4

CE2

20

2 Bmal1

10

Per2 0

0

Rev

10

20

30

40

50

A. Mohammad-Djafari,

20

40

60

0

10

20

30

1

2

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

3

36/46

LDA and SLDA study on time serie: 1:before, 2:during, 3:after SLDA−Time

LDA−Time −2

1 2 3

−4 −6

1 2 3

3 2 1

−8

0

−10

−1

−12

1.5 1

−14

0.5 0

13

−0.5

12

−1

11

2

10 9

1

8 0

7 6

−1

5 −14

−12

−10

−8

−6

−4

A. Mohammad-Djafari,

−2

6

8

10

12

−1

0

1

2

3

−1

0

1

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

−1

0

37/46

1

2

Dependancy graphs ◮

The main objective here is to show the dependencies between variables



Three different measures can be used: Pearson ρ, Spearman ρs and Kendall τ



In this study we used ρs



A table of 2 by 2 mutual ρs are computed and used in different forms: Hinton, Adjacency table and Graphical network representation Hinton

Adjacency 1

P53

Network 1

P53

Bax

0.8

Mdm2 Bcl2

Rev

0.9

Bax Mdm2

P53 Bax

Bcl2

Ccnb2

0.4

Ccna2

Per2

0.8

0.6

Bmal1

0.7

Ccnb2 Ccna2

0.6

Mdm2

CE2

Bcl2

Top1

0.2 Wee1

Wee1 0.5

DBP

0

UGT

DBP 0.4

UGT −0.2

Top1

Top1

CE2

−0.4

Bmal1

0.3

Ccnb2

CE2

Ccna2

Per2

Rev

−0.8 Rev Per2 Bmal1 CE2 Top1 UGT DBP Wee1 Ccna2Ccnb2 Bcl2 Mdm2 Bax P53

A. Mohammad-Djafari,

UGT

0.2

Bmal1 −0.6

Per2

0.1

DBP Wee1

Rev Rev Per2 Bmal1 CE2 Top1 UGT DBP Wee1 Ccna2Ccnb2 Bcl2 Mdm2 Bax P53

0

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

38/46

Graph of Dependancies: Colon, Class 1 Time series 1 P53

1 P53

Bax

0.8

Mdm2

P53 Bax

Bcl2

Ccnb2

0.4

Ccna2

Per2

0.8

0.6 Bcl2

Rev

0.9

Bax Mdm2

Bmal1

0.7

Ccnb2 Ccna2

0.6

Mdm2

CE2

Bcl2

Top1

0.2 Wee1

Wee1 0.5

DBP

0

UGT

DBP 0.4

UGT −0.2

Top1

Top1

CE2

−0.4

Bmal1

0.3

Ccnb2

CE2

Ccna2

Per2

Rev

−0.8

UGT

0.2

Bmal1 −0.6

Per2

0.1

DBP Wee1

Rev

Rev Per2 Bmal1 CE2 Top1 UGT DBP Wee1 Ccna2Ccnb2 Bcl2 Mdm2 Bax P53

Rev Per2 Bmal1 CE2 Top1 UGT DBP Wee1 Ccna2Ccnb2 Bcl2 Mdm2 Bax P53

0

FT amplitudes 1 P53

P53

Bax Mdm2 Bcl2

Rev

0.9

Bax 0.8

Mdm2

P53

0.6

Ccnb2 Ccna2

Bax

0.4

Bmal1

0.7

Ccnb2 Ccna2

Wee1

Per2

0.8

Bcl2

0.6

Mdm2

CE2

Wee1 0.5

DBP

DBP

UGT

0.2

0.4

UGT

Top1

Top1

CE2

CE2

Ccna2

Per2

0.1

−0.2 Rev

UGT

0.2

Bmal1

Per2

Top1

Ccnb2

0 Bmal1

Bcl2

0.3

DBP Wee1

Rev Rev Per2 Bmal1 CE2 Top1 UGT DBP Wee1 Ccna2Ccnb2 Bcl2 Mdm2 Bax P53

A. Mohammad-Djafari,

Rev Per2 Bmal1 CE2 Top1 UGT DBP Wee1 Ccna2Ccnb2 Bcl2 Mdm2 Bax P53

0

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

39/46

Graph of Dependancies: Colon, Class 3 Time series 1 P53

1

Top1

P53

Bax

0.8

Mdm2

0.8

0.6 Bcl2 0.4

Wee1

DBP

Bcl2

Ccnb2

Per2

0.9

Bax Mdm2

0.7

Ccnb2

Bmal1 Ccna2

Ccna2

0.6

P53

CE2

Rev

0.2 Wee1

Wee1 0.5

DBP

0

UGT

DBP

UGT

0.4

UGT

Mdm2

−0.2 Top1

Top1

CE2

−0.4

Bmal1

0.3

0.2

Per2

Rev

−0.8

Bax

CE2 Bmal1

−0.6 Per2

Ccnb2

0.1

Ccna2 Bcl2

Rev

Rev Per2 Bmal1 CE2 Top1 UGT DBP Wee1 Ccna2Ccnb2 Bcl2 Mdm2 Bax P53

Rev Per2 Bmal1 CE2 Top1 UGT DBP Wee1 Ccna2Ccnb2 Bcl2 Mdm2 Bax P53

0

FT amplitudes 1 P53

P53

Bax

Bax 0.8

Mdm2 Bcl2

Rev

0.9

Mdm2

P53

Ccnb2

0.6

Ccna2

Bax

Bmal1

0.7

Ccnb2 Ccna2

Wee1

Per2

0.8

Bcl2

0.6

Mdm2

CE2

Wee1 0.4

DBP

0.5 DBP

UGT

0.4

UGT 0.2

Top1 CE2

Top1

Bmal1

0

Per2

Rev

Rev

−0.2 Rev Per2 Bmal1 CE2 Top1 UGT DBP Wee1 Ccna2Ccnb2 Bcl2 Mdm2 Bax P53

A. Mohammad-Djafari,

Top1

Ccnb2

UGT

0.2

Bmal1

Per2

Bcl2

0.3

CE2

Ccna2 0.1

Rev Per2 Bmal1 CE2 Top1 UGT DBP Wee1 Ccna2Ccnb2 Bcl2 Mdm2 Bax P53

DBP Wee1

0

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

40/46

Classification tools



Supervised classification ◮ ◮ ◮



K nearest neighbors methods Needs Training sets data Must be careful to measure the performances of the classification on a different set of data (Test set)

Unsupervised classification ◮ ◮ ◮ ◮

Mixture models Expectation-Maximization methods Bayesian versions of EM Bayesian Variational Approximation (VBA)

A. Mohammad-Djafari,

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

41/46

Classification tools 4

4

x 10

3.5

3

2.5

2

1.5

1

0.5

0

0

10

20

30

40

4

4

50 60 Time in hours

70

80

8000

x 10

15 16 17

3.5

7000

6000

3

90

100

110

4500 23 24 25 26 27 28 29 30 31 32

5 6 7 8 9

4000

3500

3000

5000

2.5

2500 2

4000

1.5

3000

1

2000

0.5

1000

2000

1500

0

1000

0

10

20

30

40

50 60 Time in hours

70

80

90

100

110

4

2

0

500

0

10

20

30

40

50 60 Time in hours

70

80

90

100

110

0

0

10

20

30

40

50 60 Time in hours

70

80

90

100

110

4500

x 10

1 2 3 4 10 11 12 18 33 34

1.8

1.6

1.4

13 14 19 20 21 22 35 36 37

4000

3500

3000

1.2

2500 1

2000 0.8

1500 0.6

1000 0.4

500

0.2

0

0

10

20

30

40

50 60 Time in hours

70

A. Mohammad-Djafari,

80

90

100

110

0

0

10

20

30

40

50 60 Time in hours

70

80

90

100

110

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

42/46

Input-Output modeling using training data and test data



Linear models



Bayesian framework, MAP estimation with hyperparameter estimation



Careful identification and learning conditions



See work of Mircea Dumitru et al for weight loss prediction from the two genes expressions of Bmal1 and Rev-erb-alpha

A. Mohammad-Djafari,

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

43/46

Application to C5Sys data ◮

Classification of data ◮

outputs of the cell cycle tracking: 3 curves per cell



First classification based only on clock activity More general classification base on all variables When classification is done, then we can study the relation between CC and clock

◮ ◮



Discrimination parameters between classes



Analyzing data before and after some clocks knockdown



We are applying these techniques on temperature-activity data before, during and after some treatment for Chronotherapy

A. Mohammad-Djafari,

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

44/46

Future ◮

Application to other C5Sys data ◮



Modeling and model parameter estimation ◮



With Jean, we are going to re-examine the estimation of the parameters of a Gamma pdf

Input-Output modeling using training and test data ◮



´ erique, ´ With Jean and Fred we are going to re-examine the estimation of the parameters of a Gamma pdf

Inverse problems ◮



´ Some difficulties but it will done with Franck and Celine

With Mircea, Xiaome and Francis, we processed some data relating two genes expressions and toxicity

Causalities ◮

Theoretical studies

A. Mohammad-Djafari,

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

45/46

Publications in relation with C5Sys ◮

J. Lapuyade-Lahorgue and A. Mohammad-Djafari, Nearest neighbors and correlation dimension for dimensionality estimation. Application to factor analysis of real biological time series data, in ESANN 2011 Proceedings, Gent, Belgium. ISBN 978-2-87419-044-5



A. Mohammad-Djafari, G. Khodabandelou and J. Lapuyade-Lahorgue, A Matlab toolbox for data reduction, visualization, classification and knowledge extraction of complex biological data, BIOCOMP2011, Las Vegas, USA



A. Mohammad-Djafari, Bayesian approach with prior models which enforce sparsity in signal and image processing, Review paper accepted for publication in European Association for Signal, Speech, and Image Processing (EURASIP) special issue on Sparsity in signal and image processing.

A. Mohammad-Djafari,

C5Sys-ERASysBio consortium meeting, April 24-27, 2013, Florence, Italy,

46/46