Estimating the periodic components of a biomedical signal through

law is the Student-t distribution, viewed as a marginal distribution of a Infinite ... to the searched (via prior knowledge) periodic components (in the experiment.
168KB taille 2 téléchargements 267 vues
Estimating the periodic components of a biomedical signal through inverse problem modelling and Bayesian inference with sparsity enforcing prior Mircea Dumitru∗,† and Ali-Mohammad Djafari∗ ∗

Laboratoire des signaux et systèmes (L2S), UMR 8506 CNRS–SUPÉLEC–Univ. Paris-Sud, SUPÉLEC, Plateau de Moulon, 91192 Gif-sur-Yvette, France † Rythmes Biologiques et Cancers (RBC), UMR 776 INSERM–Univ. Paris-Sud, Campus CNRS, 94801 Villejuif, France

Abstract. The recent developments in chronobiology need a periodic components variation analysis for the signals expressing the biological rhythms. A precise estimation of the periodic components vector is required. The classical approaches, based on FFT methods, are inefficient considering the particularities of the data (short length). In this paper we propose a new method, using the sparsity prior information (reduced number of non-zero values components). The considered law is the Student-t distribution, viewed as a marginal distribution of a Infinite Gaussian Scale Mixture (IGSM) defined via a hidden variable representing the inverse variances and modelled as a Gamma Distribution. The hyperparameters are modelled using the conjugate priors, i.e. using Inverse Gamma Distributions. The expression of the joint posterior law of the unknown periodic components vector, hidden variables and hyperparameters is obtained and then the unknowns are estimated via Joint Maximum A Posteriori (JMAP) and Posterior Mean (PM). For the PM estimator, the expression of the posterior law is approximated by a separable one, via the Bayesian Variational Approximation (BVA), using the Kullback-Leibler (KL) divergence. Finally we show the results on synthetic data in cancer treatment applications. Keywords: Bayesian Parameter Estimation (BPE), Variational Bayesian Approximation (VBA), Kullback-Leibler (KL) divergence, Gaussian Scale Mixture (GSM), Non-Informative Prior Law (NIPL), Joint Maximum A Posteriori (JMAP), Posterior Mean (PM), Chronobiology.

INTRODUCTION Chronobiology is a field of biology that examines periodic (cyclic) phenomena in living organisms and their adaptation to certain exterior rhythms. These cycles are known as biological rhythms. The mammalian circadian timing system consists of a master pacemaker in the suprachiasmatic nucleus of hypothalamus and subsidiary molecular clock in most peripheral cell types, being syncronized by the day-night cycle, generating circadian (∼ 24h) oscillations, [4]. The development of in vivo bioluminescence recording technologies enables to monitor the circadian biomarkers in peripheral tissues during a certain number of consecutive days, providing time series data. One major interest is estimating these periodic components and study the stability of the dominant periods. The study of the stability requires a dominant periodic components variation analysis. The major limitation when studying such signals is given by their reduced length. Such signals are often obtained in cancer treatment experiments, when the behavior of the pe-

riodic components for biomedical signals observed for healthy organisms is compared to the one corresponding to organisms inoculated with cancer, and as a consequence, as the cancer tumor grows every day until the death of mice used in the experiments, the result is a non-stationary biomedical signal, with an increasing trend and a very short length. The objective of an accurate description of the periodic components variation during the evolution of cancer tumor phase can be formulated as the need of a method that can give a precise estimation of the periodic components from a limited number of data.

CLASSICAL FFT BASED METHODS Spectral analysis of time series is a well known subject in literature for a very long time. The most common methods are Fourier Transform (FT) based methods, which are widely used for many applications in the signal processing community, due to several obvious advantages: well known, well understood and fast, via FFT. Nevertheless, the particularities of the biomedical signals considered in chronobiology experiments shows that the classical methods have certain limitations. In particular, for short time series relative to the searched (via prior knowledge) periodic components (in the experiment considered in this article, a 96h recorded signal relative to a 24h periodic component, linked with the circadian clock) the precision given by the FFT methods is insufficient to determine the underlying periodic components. This is a consequence of the fact that via the FFT based methods, even if the frequencies are linear, the periods are not equidistant. In particular, for a four day (96h) recorded biomedical signal, beside the 24h corresponding periodic components, the nearest amplitudes in the periodic components vector correspond to the 32h, respectively 19h periodic components. More general, if the prior knowledge sets the principal period around a value P, it is easy to see that in order to obtain a period vector that contains the period P and also the periods P − 1 and P + 1, the signal must be observed for (P − 1)(P + 1) periods, i.e. (P − 1)P(P + 1), which is beyond the real experiments.

INVERSE PROBLEM APPROACH AND GENERAL BAYESIAN INFERENCE The proposed method for improving the precision consists in formulating the problem as an inverse problem: M

g(tn ) =



1

f (pm )e2π j pm tn + ε n ,

n ∈ {1, ..., N}

(1)

m=1

where g is the observed time series and pm are periods. With the notation g(tn ) = gn and f (pm ) = f m and defining the vectors f = [ f 1 , f 2 , . . . , f M ]T , g = [g1 , g2 , . . . , gN ]T and ε = [ε 1 , ε 2 , . . . , ε N ]T we obtain the following: g = H f +ε

(2)

In this paper we adopt a Bayesian approach, using the prior knowledge to estimate f through the Bayes Rule. Such an approach requires a probabilistic model that assigns particular distributions for the prior and likelihood laws. A necessary extension of the Bayesian Inference for real world application is the case where the hyperparameters θ = (θ 1 , θ 2 ) involved are not known and have also to be estimated from the joint posterior law: p( f , θ 1 , θ 2 |g) ∝ p(g| f , θ 1 ) p( f |θ 2 ) p(θ 1 )p(θ 2 ) (3) In this way we obtain an unsupervised method. The main steps are then the assignment of p(g| f , θ 1 ), p( f |θ 2 ), p(θ 1 ), p(θ 2 ) and then the computation of the posterior p( f , θ 1 , θ 2 |g) to infer on f and the hyperparameters θ = (θ 1 , θ 2 )

HIERARCHICAL MODEL AND SPARSITY ENFORCING PRIOR MODEL The first step for constructing the Hierarchical Model is to model the errors in equation (2). The lack of other prior information except the zero mean and limited variance vε leads to a Multivariate Normal distribution model for the error vector p(ε) = N (ε|0, vε I), from which we deduce the likelihood: p(g| f , vε ) = N (g|H f , vε I)

(4)

The next crucial step is assigning a distribution for the prior law, that uses our prior knowledge for f , namely, sparsity. In literature [1], certain classes of distributions (heavy-tailed, mixture models) are well known as sparsity inducing priors. In this work, we propose to consider a Student-t distribution. A direct assignment of a Student-t distribution for p( f ) leads to a non-quadratic criterion when estimating f : bf = arg min f J( f ), 2 J( f ) = kg − H f k2 + λ ∑M−1 j=0 log(1 + f j ). However, the Student-t distribution can be expressed as an Infinite Gaussian Mixture via a hidden variable, z, which provides the inverse variance of f :  p( f |z, v f ) = N ( f |0, v f Z −1 ), Z = diag[z1 , z2 , . . . , zM ] (5) p(z|αz , βz ) = ∏Nj=1 G (z j |αz j , βz j )

The Bayesian approach requires the assignment of distributions also for hyperparameters. The computationally convenient conjugate prior concept leads to a Inverse Gamma distribution assigned for both variances:  p(vε |αε0 , βε0 ) = I G (vε |αε0 , βε0 ) (6) p(v f |α f0 , β f0 ) = I G (v f |α f0 , β f0 ) The likelihood, the prior, and the hyperparameters distributions yield the posterior law: p( f , z, vε , v f |g) ∝ p(g| f , vε ) p( f |z, v f ) p(z|αz , βz )p(vε |αε0 , βε0 ) p(v f |α f0 , β f0 )

(7)

BAYESIAN COMPUTATION AND PROPOSED ALGORITHM The JMAP estimation of all the unknowns f , z, vε , v f on the basis of the available data, g is defined as:   bf , bz, vbε , vbf = arg max p( f , z, vε , v f |g) = arg min J ( f , z, vε , v f ), (8) f , z , v , v f , z , v , v ( ( ε ε f) f)  where we defined J ( f , z, vε , v f ) = − ln p( f , z, vε , v f |g) . Alternate optimization with respect to the different unknowns f , zi , vε and v f which result to the following algorithm:  1   ef = H T H + λ Z −1 H T g , λ = vε ; zej = αz j − 22   vf f  βz j + 2vj

    veε =

βε0 + 21 k

g−H f

αε0 +1+ M 2

k2

vef =

;

f

1

(9)

Z − 2 f k2

β f 0 + 21 k

α f 0 +1+ N2

Another possible estimation is the Posterior Mean (PM). An advantage of this estimator is that it minimize the Mean Square Error (MSE). However, the posterior distribution obtained from the considered hierarchical model is not a separable distribution, complicating the analytical computation of PM. One way the compute the PM in this case is to first approximate the posterior law p( f , z, vε , v f |g) with a separable law q( f , z, vε , v f |g): N

p( f , z, vε , v f |g) ≈ q( f , z, vε , v f ) = q1 ( f ) q3 (vε ) q4 (v f )

∏ q2 j (z j ),

(10)

j=1

in such a way that the approximate q( f , z, vε , v f ) is obtained by minimizing of the Kullback Leibler divergence: ZZ

KL (q : p) =

Z

...

q( f , z, vε , v f ) ln

q( f , z, vε , v f ) d f dz dvε dv f p( f , z, vε , v f |g)

(11)

via alternate optimization. Thanks to the choice of the exponential families for the priors and the conjugate priors for the hyperparameters, we obtain:  n o



−1 1 1 2 2 v−1  e k Y f k q ( f ) ∝ exp − ||g − H f || − v  1 f ε  2 2 q4 (v f ) q3 (vε )       D E  

−1  1 1 2   q2 j (z j ) ∝ exp αz j − 2 ln z j − βz j z j − 2 v f kYe − j f k q4 (v f ) q1 ( f ) (12) n 

 o    M 1  q3 (vε ) ∝ exp − αε0 + 1 + 2 ln vε − 2 kg − H f k2 q ( f ) + βε0 v−1  ε  1    n 

 o    −1  q4 (v f ) ∝ exp − α f 0 + 1 + N ln v f − 1 kY f k2 + β v f0 f 2 2 q1 ( f )q2 (z)  

− 12 − 12 1

− 12 − 12 − where Ye − j = diag hz1 iq (z ) , . . . , z j−1 q (z ) , z j 2 , z j+1 q (z ) , . . . , hzN iq (zN ) , 21 1 2 j−1 j−1 2 j+1 j+1 2N   N −1 −1 Ye = diag hz1 iq 2(z ) , . . . , hzN iq 2 (zN ) and q2 (z) = ∏ q j (z j ). The argument of the expo21

1

2N

j=1

nential proportional to the q1 ( f ) can be written as a criterion J( f ) which is quadratic in hv f −1 iq4 (v f ) 2 2 e e e f : J( f ) = kg − H f k + λ kY f k , where λ = −1 , showing that q1 ( f ) is a Normal hvε iq3 (vε ) Distribution. The mean is given by the solution that minimize the criterion J( f ), the same criterion that appeared in the MAP estimation technique for f , with the only formal difference being the regularization parameter that this time is not fixed and must be estimated. The covariance matrix is obtained by identification:   −1 hv f −1 iq4 (v f )  T T e e e  m    e = H H +λZ H g, λ = −1 hvε iq3 (vε ) e , (13) q1 ( f ) = N f |e m, Σ   −1   e = H T H + λe Z e  Σ D E 2 e For q2 j (z j ), the integral kY − j f k can be computed using the Multivariate D q1 ( f ) E e +m e T Ye T− jYe − j m e. Normal Distribution parameters: kYe − j f k2 = Tr(Ye T− jYe − j Σ) q1 ( f ) Since the relation between q2 j (z j ) and the analytical expression is a proportionality one, the only terms that matter from the integral are the terms that contain Dz j . Summarizing all other terms by a constant C the integral can be written E 2 as: kYe − j f k = C + z j Σ j j + z j m2j . The expression of q2 j (z j ) then becomes n q1 ( f )    o 

q2 j (z j ) ∝ exp αz j − 12 ln z j − βz j + 12 Σ j j + m j v f −1 q (v ) z j : 4

  f q2 j (z j ) = G z j |αf z j , βz j ,

f

 1  αf z j = αz j + 2    βf = β + 1 Σ + m2 v −1 j j zj zj f j 2 q

(14)

4 (v f )

Both q3 (vε ) and q4 (v f ) are Inverse Gamma Distributions. In general, for a multivariate Normal distribution r( f ) with parameters m, Σ, then the equalities h f ir( f ) = m and

T f A f r( f ) = Tr (AΣ) + mT Am hold for every symmetric matrix A. Since kg − H f k2 =

kgk2 −2gT H f +kH f k2 with H T H symmetric, it is easy to show that kg − H f k2 q( f ) =   e so for q3 (vε ) the parameters are: e k2 + Tr H T H Σ kg − H m  fε = αε0 + M2  α    (15)  βe = β + 1 kg − H m 2 + Tr H T H Σ e e k ε ε0 2 D  E  

e e so for q4 (v f ) eΣ e k2 + Tr Z Σ e k2 + Tr Z Also, kY f k2 q ( f )q (z) = kY m = kYe m 1 2 q2 (z) the parameters are:  ff = α f 0 + N2    α ff ,    ff , β q4 (v f ) = I G v f |α (16)  β 2 ff = β f 0 + 1 kYe m e e e k + Tr Z Σ 2   fε , βeε , q3 (vε ) = I G vε |α

Because q3 (vε ) and q4 (v f ) are Inverse Gamma Distributions, with known parameters, e λ can be determined via equation (13). Using the definition of Ye , the fact that q2i (zi ) fzi and the fact that for any Gamma fzi and β is a Gamma Distribution, with parameters α Distribution, with parameters α and β , G (x|α, β ) the equality hxiG (x|α,β ) = αβ holds, Ye can be also determined:  !− 1 !− 1 !− 1  2 2 2 e ff βε g α αf αf α zj zN z1 e e   λ= ; Y = diag ..., ... (17) fε β ff f f f α βz1 βz j βzN After the estimation, all the unknowns are expressed in analytic forms, leading to an iterative algorithm described as it follows: (a) Initialization; (b) Use equation (13) e (c) Use equation (14) and (17) to compute Z e (d) Use equae , Σ; and (17) to compute m e ff . For initialfε , βε ; (e) Use equation (16) to compute α ff , β tion (15) to compute α (0) fε (0) and βeε , we izing the parameters are set as follows: (a) For the parameters α (0) fε (0) and fε (0) = αε0 , βeε = βε0 . For the parameters α consider the initialization:α (0) βeε , corresponding to the Inverse Gamma Distribution describing the probability density function q3 (vε ) we assign the value of αε0 and βε0 corresponding to the Inverse Gamma Distribution considered in the Hierarchical Model describing the noise variance, I G (vε |αε0 , βε0 ). The mean and the variance of vε are expressed via the two Inverse Gamma parameters, αε0 and βε0 :

E [vε ] =

βε0 , for αε0 > 1 αε0 − 1

,

Var [vε ] =

βε20 (αε0 − 1)2 (αε0 − 2)

, for αε0 > 2 (18)

Both values, E [vε ] and Var [vε ] correspond to the noise variance (the mean of the noise variance and the variance of the noise variance), an unknown of our model. Nevertheless, the known input g contains the error vector (i.e. the input g represents the theoretical signal plus the noise) and in a Bayesian framework, this information can be exploited to obtain numerical values for E [vε ] and Var [vε ]. Considering the mean of the known data E[g], the mean of the noise variance E [vε ] can be defined as the squared norm of the difference between the data and its mean, while the variance of the noise variance Var [vε ] can be defined in the same manner: E [vε ] = kg − E [g] k2 , Var [vε ] = k (g − E [g]) − E [g − E [g]] k2 . Equations (18) links the parameters αε0 and βε0 with numerical values that are derived from the known data:   E [vε ] E [vε ] αε0 = + 2 ; βε0 = E [vε ] +1 (19) Var [vε ] Var [vε ] In particular, such an approach, is consistent to a non-supervised algorithm; (b) For the parameter Z (0) we consider the initialization: Z (0) = I. The choice of such initialization is motivated by the structure of the solution corresponding to ef obtained via the proposed algorithm, presented in equation (13). Evidently, with Z = I, the estimation is written ef =  −1 T e H H +λI H T g, which is the solution that corresponds to three classical methods:

Quadratic Regularization (QR), Minimum Norm Least Square (MNLS) and Bayesian Approach with Gaussian prior. The interpretation of this choice is that we chose as step zero value for f the classical solution given by the classical methods; (c) For the ff (0) , we consider the initialization α ff (0) = β f . A ff (0) and β ff (0) = α f0 , β parameters α 0 natural choice in this case is Non Informative Prior Law (NIPL). The Inverse Gamma Distribution is weak for parameters α f → 0 and β f → 0, so one possible choice is α f = β f = 0.001.

SIMULATIONS For validating the proposed method, we analysed synthetic data. In the real case the theoretical f is unknown, so the only possible comparison is between the available g (representing the real data) and the estimated e g (obtained via the reconstruction done over the estimated ef ), an important step for validating the method is to consider signals with known corresponding periodic components vector, which gives the possibility to compare f and the estimated ef . We consider the following protocol: (a) Consider a sparse amplitude periodic components vector f , Figure 1, (a). For the simulations used in this article, we analysed a periodic components vector for the interval associated with the circadian domain, i.e. the interval [18, 32], with one hour precision; (b) Compute Theoretical Amplitudes

Original Signal 1.5

0.5 1

Amplitude

Amplitude

0.4 0.3 0.2

0.5 0 −0.5

0.1 −1 0 −1.5 8

11

14

17

20

23

26

29

32

0

12

24

36

Periods

FIGURE 1.

48

60

72

84

Time(h)

Periodic Components f and the corresponding theoretical signal g = H f

the corresponding signal g via the the linear model that doesn’t account for the errors, representing the theoretical input, (4 days length), Figure 1, (b). The matrix operator H used is a real matrix obtained in the same manner as the Fourier Transform Matrix using the considered periods and is defined as a sum of a sine and cosine; (c) Compute the Noisy Signal

FFT Spectrum 0.4

0.3

1

Amplitude

Amplitude

2

0 −1 −2

0.2

0.1

0 0

12

24

36

48

Time(h)

60

72

84

8

8.72

9.6

10.66

12

13.71

16

19.2

24

32

Periods

FIGURE 2. Noisy Signal g = H f + ε (a) and the corresponding FFT Spectrum

noisy signal g (input for the proposed algorithm) from the original one by adding noise (random normal distributed vector, 0.4 amplitude), Figure 2, (a) and its corresponding

spectrum obtained via Fast Fourier Transform, Figure 2, (b). (d) Compare the periodic Theoretical & Estimated PC L norm: 0.126

0.4

Theoretical Estimated Rezidual Error

1

2

Amplitude

Amplitude

Theoretical & Reconstructed Signal 1.5

Theoretical Estimated

0.5

0.3 0.2

L norm: 0.0235 2

0.5 0 −0.5

0.1 −1 0 −1.5 8

11

14

17

20

23

26

29

32

0

12

Periods

FIGURE 3.

24

36

48

60

72

84

Time(h)

(IP:Student-t) Method vs. Theoretical Data

components vector ( f ) with the estimated ef one, Figure 3, (a). The proposed method also indicates the variances; (e) Compare the original signal g and the reconstructed one e g, Figure 3, (b).

CONCLUSIONS In this article we have presented a new method that can estimate the spectrum of short signals. Comparing the result of the proposed method and the result of the Fast Fourier Transform, we can see that due to the length of the signal the FFT method is not correctly estimating the spectrum, while the proposed method accurately estimates all (three) picks from the spectrum. Future work involves a more detailed analysis of the initialization part of the algorithm, i.e. a better understanding of the impact of the hyperparameters. An implementation of proposed algorithm on real data is currently in progress.

REFERENCES 1. 2. 3. 4.

Mohammad-Djafari A., “Bayesian approach with prior models which enforce sparsity in signal and image processing,” EURASIP Journal on Advances in Signal Processing, vol. 1, pp. 52–71, 2012. Wainwright M.J., Simoncelli E.P. “Scale Mixtures of Gaussians and the Statistics of Natural Images,” Advances in Neural Information Processing Systems, vol. 12, 2000. Bouman C.A., Sauer K.D., “A generalized Gaussian image model for edge-preserving MAP estimation,” IEEE Transactions Image Processing, vol. 2, pp. 296–310, 1993. Lévi Fr., Okyar A., Dulong S. et al, “Circadian Timing in Cancer Treatments,” Annual Review of Pharmacology and Toxicology, vol. 50, pp. 377–421, 2010.