Separation of Mixed Hidden Markov Model Sources

t is the assumed model of the mixture structure. .... t is the number of Gaus- ..... √'0s о√'h % t , a white Gaussian noise is added to the mixture with a covariance ...
131KB taille 3 téléchargements 308 vues
Separation of Mixed Hidden Markov Model Sources Hichem Snoussi and Ali Mohammad-Djafari

 Laboratoire des Signaux et Systèmes (L2S), Supélec, Plateau de Moulon, 91192 Gif-sur-Yvette Cedex, France Abstract. In this contribution, we consider the problem of source separation in the case of noisy instantaneous mixtures. In a previous work [1], sources have been modeled by a mixture of Gaussians leading to an hierarchical Bayesian model by considering the labels of the mixture as hidden variables. However, in that work, labels have been assumed to be i.i.d. We extend this modelization to incorporate a Markovian structure for the labels. This extension is important for practical applications which are abundant: unsupervised classification and segmentation, pattern recognition, speech signal processing, ... In order to estimate the mixing matrix and the a priori model parameters, we consider observations as incomplete data. The missing data are sources and labels : sources are missing data for observations and labels are missing data for incomplete missing sources. This hierarchical modelization leads to specific restoration maximization type algorithms. Restoration step can be held in three different manners: (i) Complete likelihood is estimated by its conditional expectation. This leads to the EM (expectation-maximization) algorithm [2], (ii) Missing data are estimated by their maximum a posteriori. This leads to JMAP (Joint maximum a posteriori) algorithm [3], (iii) Missing data are sampled from their a posteriori distributions. This leads to the SEM (stochastic EM) algorithm [4]. A Gibbs sampling scheme is implemented to generate missing data.

INTRODUCTION We consider the problem of source separation in the noisy linear instantaneous case:

    is the -vector of observations, ! the " -vector of sources, # an additive Gaussian white noise with covariance $&% and the (')" mixing matrix. Given the observations *,+-.+/0.121 3 , our purpose is to estimate the mixing matrix and then to solve the inverse linear problem to extract the sources 4 5+*6+/0.121 3 . The mixing matrix is estimated by maximizing its a posteriori distribution ([5], [6], [7], [1]):

7 :9=@?)9BA 4 HG2I0.121 3 KJLFM 8 C DFE

J

is the assumed model of the mixture structure. The posterior distribution of obtained by integrating over the joint distribution of and :



is

QSRUTV V W 4 . #0.121 3OG2I0.121 3@*NOYX 0.121 3 E (1) Z QSRUTV V W *I0.121 3OG2 [\ 0.121 3@*N^] 4 _G`N^ab 4 #0.121 3cGNOde X 0.121 3 E E E The vector Nf gN ] KN a hN d  contains all the hyperparameters of the parametric distributions of the noise E 450.121 3iGNj]5 , the mixing matrix E * _GNjaK and the sources E * GN,d . E

4 HG2I0.121 3 KNOP

choice of prior distributions Sources model We modelize the component kFl by a hidden Markov chain distribution. A basic presentation of this model is to consider it as a double stochastic process: 1. A continuous stochastic process  k l 0  k l m 2. A hidden discrete stochastic process -p

>n k l3  taking its values in o . l0 \p lm >n\p 3l  taking its values in D rq l M . The *p +l 6+/0.121 3 form an homogeneous Markov chain with initial probability vector s Et vu)*p l 0 :w-6x /0.121 yz and transition matrix u t{ s u)*p +l |0 }~Gp +l vw-6x /0.121 yz t t€ { and3 conditionally to this chain the source k l is time independent: E  k l 0.121 3 Gp l 0.121 3 ~  +/0  l+ Gp +l  and has a Gaussian law E  k l + Gp +l :w* ‚ƒ t *„ t  . E k l l This modelization is very convenient for at least two reasons: It is an interesting alternative to non parametric modeling. • It is a convenient representation of weakly dependent phenomena.



Mixing matrix model We suppose that the mixing matrix coefficients …~† follow Gaussian laws:

l  …~†  [‚84‡ † * „ a †  E l l l which can be interpreted as knowing every element *‡ †  l

with some uncertainty

„ a †  . l

DATA AUGMENTATION ALGORITHMS The sources 4 5+-6+/0.121 3 are not directly observed, so that they form a second level of hidden variables, the first level being represented by the labels *p +l 6+/0.121 3 of the density mixture. Thus, the separation problem consists of two mixing operations, a mixture of densities which is a mathematical representation of our a priori distribution with

unknown hyperparameters N,d and a real physical mixture of sources with unknown mixing matrix : p ˆ

ˆ ‰ ˆŠ Mixing densities ˆ Mixing sources ˆ

‹

‹

N d

‹

? Œ ? incomplete data problem. The incomplete data are the observations *,We +-.+/0.have 121 3 , theanmissing data are the sources 4 5+-6+/0.121 3 and the vector labels *Ž+*.+/0.121 3 . The parameters to be estimated are  * [ $&% KNOde . This incomplete data structure suggests the development of restoration-maximization algorithms: Starting with an initial point  , perform two steps: Restoration: Given the current estimate‘  { , any function of the missing data ‘ * ’4  is replaced by an attributed value { . |0 • Maximization : Find  { which maximizes the penalized complete likelihood -“. ’4”G      . E E •

The restoration step can be carried into three different manners:

‘

‘

1. { is the conditional expectation of 4 •\  , which leads to the EM algorithm. 2. The hidden variables are replaced by their maximum a posteriori. 3. The hidden variables are sampled according to their a posteriori distribution.

In the following, we give an overview of each strategy.

Exact EM algorithm The functional – ˜— sš™œ› = E -“. ’4”G   ™œ› = E   ’GY  { x , computed in the first step a of the EM algorithm, is separable into three functionals – , –hž and –Ÿ4 

– – a   – hž  –Ÿ   a depends on and $&% . • The first functional – N¢¡£ ¤ 4„  /0.121 ] /0.121 y¦ : means and • The second functional –hž depends on ¥t { t{ t  { variances of the Gaussian mixture. ©ª \«  /0.121 ] initial probabilities and • The third functional –~Ÿ4  depends on §b¨ t t t transition matrices of the Markov chains.

¬L­

-maximization:

The functional to be optimized at each iteration is:

B¯ ±  ¯  ¯  ²´³µ¶¸· 0  – * [ & $ % G    ® n™ › =°G & $ % Ge®  $º%¹ $f»F» ®L $ d » ® ½ $ d¼ » ¼  $ d4d ¼ 6¾

(

.¿> Í with Í t Ï6rq t  . q t is the number of Gaus] vectors Ì in the previous sians of each source component. Thus, we have q&0q m nrq sum. The a posteriori expectations, given the variables ) :Ì , are easily derived:

À

0 0 0 0  0 Â Æ ¥· 5+UG2,+.   \Ž+# :ÌU¾ g·¥ ¼ $ %¹  $ †¹ ¾ ¹ · ¼ $ %¹ ,+S $ †Ñ ¹ Ð † ¾ ‡[+ † 0 0 Õ +Õ + 0  Ã Æ ¥· 5+- + G O+\ 4Ž+• ÒÌU¾ Ó· $ †¹ ¾ ¹ Ö † ¼† ¼   ¼ $ %¹ Ô However, the computation of the marginal probabilities E *Ž+# :ÌG2I0.121 3@    represents the major part of the computation cost. The Baum-Welsh procedure [? ] can be extended to the case when the sources are not directly observed. We define the Forward × +eÌ and Backward Ø +Ì variables by: À Â × +-ÌÙ u)*Ž+Ž :Ì GIË 0.121 +.   (2) à +>-Ì ¨ ÇrÚBÛÝÜ TTVV VV WjW@ÞÞ ß Û T/ V V Ýà ÈÈ Ø ¨ ÇrÚSÛšÜ Ú Û Ýà The computation of these variables is performed by recurrence formula (complexity á q m  ): â Ë

× ä0 Ì ‡Ò0 E ‚ ã × +eÌ@ ‡ê+ Ęë

îï Ç C@å£æ  Cjç æ€C¢è | ç é È ·I0.¾ × + ¹ 0äìu l † ‚ Ç C åÅæ  Cjç@æC^è | ç é È ·íO+œ¾

À

Â Ø ¢3 Ìj Ô à + Ì ‡ê+|0 Ä ë +|0bìeu ‚ C å z C^ç z C^è | ç é È ·,+|06¾ Ø Ø †l Ç 

where the ‡ê+ are normalization constants:

À

Ë Ë 0 Ò ‡

0 Ä s E ‚ Ç C åÅæ  Cjç@æ€C^è | ç é È ·í°0\¾nx ¹ Á 0 Áà ‡ê+# ñð Ä Ë Ä ë × + 0äìeu † ‚ C åÅæ Cjç@æ€C^è | ç@é È ·íO+œ¾nò ¹ l Ç  ¹ and

Ð † ôóõ Then E

-Ž+ ÒÌG2I0.121 3@   

T û > > û bø þþ „ †m ü úóú û „ †€m ý > > û ù  f $ † õ .. .. . . . . > . „ †Ým ÷

ö† T øù

.. .

ö†Ý÷

is easily derived as:

E

-Ž+ ÒÌG2I0.121 3@    × +e-Ì Ø +-Ì

The spatial independence of sources components or more precisely the spatial independence of the labels implies: À

Ë

 ] / 0  E t E à u Ë ë  ] /0 t

† ¦ E † T ' E †šý  > E †š÷ u † t¦ ¦ l

where E † ¦ is the initial probability vector of the Markov chain of the component w and u its transition matrix.

¬½ÿ

t

-Maximization:

In order to establish the connection with the estimation of the parameters of hidden Markov models when the sources are directly observed and to elucidate the origin of the high computational cost of the hyperparameter re-estimation, we begin by the vectorial Ë formula followed by the scalar expressions of interest: ] Ì   e   0      n      Ë designs the vector label m T 4„m m >*¼„ .m The vector Ð designs T  ö† vector  ö†The ý n ö† ÷  ¼ . The  X \   ³

 4 „ matrix $ designs † †ý † ÷ .

The re-estimation of the vectorial means Ë andË covariances yields:

W T R Þ ß / Ë ß / Û WÚ Û Û Ç Û Ð  Û Û  T   Ç ß Û / àÞÚ  TV V W  à W T  R R è È Ë åW è å $ Û Ç Û Û ¹ Û Û   T ¹  Ç ß Û / Ë ê ‡ + Æ · ¸+>G2,+.\Ž+# :Ìh   ¾ . with Ë

Þ Ú T V V W È È  à  Ë Ë Ûè  | å  å è Ç ß Û / Þ Ú T V V W È à  Þ Ú TV V W È  à 

The re-estimation of the scalar means and variances is obtained by a spatial marginalization of the vector labels in the previous expressions:

t¥{ „ m t¥{





Ë Ë ß Þ TV V W È  R Þ ß / È ¦ / Ë # ¦ " "

 !   W T $ Ç ¦#Û " Ú Û " Û ß  à%/  Þ  TV V Ç W Û È Ú  à%

Û   !   $  Ç Û Ë Ú à  Ë Ë W T R R Þ ß  R Þ ß ß Þ T V V W È     / È  ¦ & ¦ ¦ / È Ý ¦ | / Û   !   #¦ " $ "  Ç Û Ûè W Ú Û T  Û  ¹(' $ ß Ç / Ë ÚÞ Û T V V W Û  È ' ¦ý $   Ç Û Ú  à% ¦#" "

Û  ) !   $  Ç Û Ú à  W T Û

We can see clearly that, in addition to the marginalization in time to compute the quantities u -Ž+ ÒÌG I0.121 3j    , we have to perform another marginalization in the spatial domain.

¬Lÿ*

-maximization:

The re-estimation of the initial probabilities and the stochastic matrices for the vectorial labels yields:

E

Ìj :u -0 :ÌG2I0.121 3@ !  W ß T / Ë ß Ë / ë Þ TV V W È u)-ÌBì  Û ý W  Ç Û,+ ß T  / Û Þ TV Ú V W È à Ú  à

Û  ý  Ç -Û +

By the same way, the probabilities of the scalar labels are derived from the above expressions by spatial marginalization :

ËË ÌB*w-@ }# Ä Ç Þ Ç Èn/ È ) u *0 :ÌG °0.121 3j    E t { W Ë T ß ß / ë Þ TV V W È / Ë & . # ¦ " & . # ¦ " " u)-ÌB*w- Ò²B ì¢*w- k  Û  ý  W !   / ¦# " 10"  ßÇ Û-+ T / Þ  TÛ V V W Ú È  à  Ú à 

Û  ý  !   1/  Ç Û-+ The expressions of u)*Ž+ 0^ ˜ÌhhŽ+ Ñì G I0.121 3@   are obtained directly from the For¹ ward and Backward variables defined by (2): u)*Ž+ 0 :ÌhSŽ+# ì G2I0.121 3@ ’  × +  0 -Ì