Image fusion and unsupervised joint segmentation using a HMM and

Kluwer Academic Publ., Santa Fe, NM, K. Hanson edition, 1995. [2] T. Bass, “Intrusion detection systems and multisensor data fusion,” in Comm. of the. ACM, vol.
555KB taille 7 téléchargements 346 vues
Image fusion and unsupervised joint segmentation using a HMM and MCMC algorithms Olivier F´eron† and Ali Mohammad-Djafari† † Laboratoire

des signaux et syst`emes (LSS), UMR8506 (CNRS-Sup´elec-UPS) Sup´elec, plateau de Moulon, 3 rue Joliot Curie 91192 Gif sur Yvette, France

Abstract In this paper we propose a Bayesian framework for unsupervised image fusion and joint segmentation. More specifically we consider the case where we have observed images of the same object through different imaging processes or through different spectral bands (multi or hyper spectral images). The objective of this work is then to propose a coherent approach to combine these images and obtain a joint segmentation which can be considered as the fusion result of these observations. The proposed approach is based on a Hidden Markov Modeling (HMM) of the images where the hidden variables represent the common classification or segmentation labels. These label variables are modeled by the Potts Markov Random Field (PMRF). We propose two particular models for the pixels in each segment (iid. or Markovian) and develop appropriate Markov Chain Monte Carlo (MCMC) algo-

1

rithms for their implementations. Finally we present some simulation results to show the relative performances of these models and mention the potential applications of the proposed methods in medical imaging and survey and security imaging systems. key words : Data fusion, Segmentation, Markov random field, multi spectral images, HMM, MCMC, Gibbs sampling.

1 Introduction Data fusion and multi-source information has become a very active area of research in many domains : industrial non destructive testing and evaluation ([1]), industrial inspection ([2]), and medical imaging ([3, 4, 5, 6, 7]). In all these domains the main objective of image fusion schemes is to extract all the useful information from the source images, which will be represented in a single image. There is a large literature describing techniques of image fusion which use different approaches : • Pixel-based approach : Those methods are the simplest and work directly on the pixels of the source images ([8]). For example the very intuitive method of averaging consists in constructing a pixel of the fused image by averaging the corresponding pixels of the source images. These methods can be used if different images represent the same physical quantity (luminance for example) with the same scale. The main limitation of these methods is the fact that very often different images do not represent the same physical quantity.

2

• Feature-based and Transform domain approach : The main idea here is to extract some particular features of the images (contours, regions) which are more robust to pixel values scaling and variations and then use data fusion techniques to obtain common features. This domain is more developed in the literature of fusion scheme and considers that the fused image must preserve all the features of the source images. For extracting those features typical methods use pyramid transforms (Wavelet, Laplacian, Gradient,...) ([9, 10]), which was particularly developed because it gives information on contours or contrast changes, in which the human vision is particularly sensitive. In these methods the coefficients in the transform domain represent the characteristics of the source images. The fusion consists then in selecting the main coefficient of the sensor images, with certain criteria, and constructing a fused image in the transform domain, and finally make the inverse transform to obtain the resulting fused image. • Image fusion after PCA or ICA : When the number of images to fusion becomes more important (hyper spectral images) it may be necessary to extract the principal (Principal Component Analysis PCA [11]) or independent (Independent Component Analysis ICA [12, 13]) components first and then use image fusion techniques on these components. • Probabilistic model-based approach : This type of approach consists in introducing a model which represents a relationship between the observed images and the source images or some particular features of them ([5, 7, 8, 14]). The model can also take into account noise and unknown parameters of the model such as the 3

registration parameters of the images. These methods may be supervised or not. In supervised case a training step, or more generally a pre-processing, is used to estimate the parameters of the images model ([15]). In unsupervised case these parameters are estimated from the data themselves. Those different approaches are not exhaustive and not independent, and they can be mixed in hybrid methods. In all these methods there are two different objectives : • to obtain an image which represents all the information of the sources. Because the human vision is very sensitive on contrast changes in the image, this objective is often reduced to construct a segmentation in which all the regions and contours of the different sources are represented. • to involve the reconstruction of an image by using complementary information present in other data sets. The method presented in this work can be classified in probabilistic model-based approach and our objective is to obtain a common segmentation and to involve the reconstruction at the same time. The main problem is how to combine the information contents of different sets of data gi (r). Very often the data sets gi , and corresponding images fi , do not represent the same quantities. A general model for these problems can be the following : gi (r) = [Hi fi ](r) + εi (r),

i = 1, . . . , M

(1)

where Hi are the functional operators of the measuring systems, or registration operators if the observations have to be registered. We may note that estimating fi given each set of data gi is an inverse problem by itself. 4

(a)

(b) Figure 1: Examples of images for data fusion and joint segmentation. a) T1-weighted, T2-weighted and T1-weighted with contrast agent transversal slices of a 3D brain MR images. b) Two observations from transmission and backscattering X rays in security systems (with the permission of American Science and Engineering, Inc., 2003) In this paper we consider the case where the measuring data systems can be assumed almost perfect and the observations are registered, which means that we can write : gi (r) = fi (r) + εi (r),

i = 1, . . . , M

(2)

for r ∈ Z 2 . Note that if we consider images, the pixels r belong to a finite lattice S, and we will note S the number of pixels of this lattice. In the following we also use the notations : g i = fi + ε i

or

g =f +ε

(3)

where gi = {gi (r), r ∈ S} and g = {gi , i = 1, . . . , M }. Figure 1 shows two examples of image fusion problem. The first sets of data are multi spectral noisy images of transversal slices of 3D brain MR images. The second example shows a multimodal case with transmission and backscattering X-rays acquisitions of a 5

suit-case. As we can see in the observed images gi , the only thing these images have really in common is their anatomy (contours and regions). In this work we introduce a label variable z(r) for the regions and consider the region labels as common feature between all images. Thus the data fusion becomes then the estimation of joint segmentation labels z = {z(r), r ∈ S}. The problem of segmentation is a long standing problem in computer vision. Recently works on medical imaging propose methods to construct a segmentation from multispectral images ([4, 6, 15]), which can be considered as fusion problem. Probabilistic framework for unsupervised segmentation is a very active area and has still shown effective results in many domains. In [16] and [17] the authors propose a Monte Carlo Markov Chain (MCMC) method for image segmentation, using Bayesian framework and Markov field prior probability. In this paper we propose to use these types of methods in the case of multiple source images. The Bayesian approach we propose models the observed data through p(gi |fi ), the images through p(fi |z) and the classification labels z through P (z). When these priors are appropriately assigned we obtain the expression of the a posteriori p(f , z|g) from which we infer not only on z but also on f . Our aim is then to obtain a common segmentation of M observations and to reconstruct fi , i = 1, . . . , M at the same time. This paper is organized as follows : In section 2 we introduce the common feature z, model the relation between the images fi to it through p(fi |z) and its proper characteristics through a prior law P (z). In section 3 we give detailed expressions of the a posteriori laws. The section 4 gives the general structure of the MCMC algorithm we used to estimate f and z. In section 5 we introduce a more complex model accounting for a spatial 6

dependency of f |z in order to decrease the noise of the observations. In section 6 we present some simulation results to show the performances of the proposed methods and their potential applications in medical imaging and security imaging systems. Finally in section 7 we discuss about the estimation of the number labels.

2 Modeling for Bayesian data fusion Within the observation model (3) the expression of the posterior law p(f , z|g) is given by the relation : (4)

p(f , z|g) ∝ p(g|f ) p(f |z) P (z).

We need then to give precise expressions of p(g|f ), p(f |z) and P (z) according to appropriate hypothesis on the noise model, the image model and the labels model.

2.1 Observation noise model and the likelihood Assuming independent noises εi among the different observations we have

p(g|f ) =

M Y i=1

p(gi |fi ) =

M Y i=1

pεi (gi − fi )

Assuming εi centered, white and Gaussian p(εi ) = N (0, σε2i I), and S the number of pixels of an image, we have : p(gi |fi ) =

N (fi , σε2i I)

=



1 2πσε2i

7

 S2



1 exp − 2 ||gi − fi ||2 2σεi



2.2 Hidden Markov modeling of images As we want to reconstruct an image with statistically homogeneous regions, it is natural to introduce a hidden variable z = (z(1), . . . , z(S)) ∈ {1, . . . , K}S which represents a common classification of the images fi . The problem is now to estimate the set of variables (f , z) using the Bayesian approach :

(5)

p(f , z|g) = p(f |z, g) P (z|g)

Thus to be able to give an expression for p(f , z|g) using the Bayes formula, we need to define p(gi |fi ) and p(fi |z) for p(f |z, g), and p(gi |z) and P (z) for P (z|g). To assign p(fi |z) we first define the sets of pixels which are in the same class : Rk = {r : z(r) = k},

|Rk | = nk

fi k = {fi (r) : z(r) = k} In this paper, in a first step, we assume that all the pixels fi k of an image fi which are in the same class k will be characterized by a mean mik and a variance σi2 k : p(fi (r)|z(r) = k) = N (mi k , σi2 k )

∀r ∈ S

With these notations we have : p(fi k ) = N (mik 1, σi2 k I)

(6)

and thus p(fi |z) = =

K Y

k=1 K Y

k=1

N (mi k 1, σi2 k I) 1 p 2πσi2 k

! nk



 1 2 exp − 2 ||fi k − mik 1|| , 2σi k 8

i = 1, . . . , M(7)

where 1 is a vector with all components equal to 1. As we will see in section 5, we will extend this model to the case where the pixels in different regions are assumed independent but inside any homogeneous region we account for their local correlation by using a Gauss-Markov model.

2.3 Potts-Markov modeling of labels Finally we have to assign P (z). As we introduced the hidden variable z for finding statistically homogeneous regions in images, it is natural to define a spatial dependency on these labels. The simplest model to account for this desired local spatial dependency is a Potts Markov Random Field model :     X X 1 exp α δ(z(r) − z(s)) , P (z) =   T (α)

(8)

r∈S s∈V(r)

where S is the set of pixels, δ(0) = 1, δ(t) = 0 if t 6= 0, V(r) denotes the neighborhood

of the pixel r (here we consider a neighborhood of 4 pixels), T (α) is the partition function or the normalization constant and α represents the degree of the spatial dependency of the variable z. There are many studies on the influences of this parameter. In [18], D. Higdon showed that there exists a critical value αc which depends on the size of the images and the number of classes. For values α < αc the Potts model realizations are strongly noisy with a great number of small regions. For values α > αc , the realizations consist mainly of a few large regions which become fast prevalent and a homogeneous background. The Potts model appears then not appropriate for segmenting small regions. However it is used and gives satisfactory results in the case of images with a homogeneous background. In practice we fix the value of α largely greater than the critical point αc in order to force 9

the spatial dependency. We have now all the necessary prior laws p(gi |fi ), p(fi |z), p(gi |z) and P (z) and then we can give an expression for p(f , z|g). However these probability laws have in general unknown parameters such as σε2i in p(gi |fi ) or mi k and σi2 k in p(fi |z). In a full Bayesian approach, we have to assign prior laws to these ”hyperparameters”.

2.4 Conjugate priors for the hyperparameters Let mi = (mi k )k=1,...,K and σ 2i = (σi2 k )k=1,...,K be the means and the variances of the pixels in different regions of the images fi as defined before. We define θ i as the set of all the parameters which must be estimated : θ i = (σε2i , mi , σ 2i ),

i = 1, . . . , M

and we note θ = (θ i )i=1,...,M . The choice of prior laws for the hyperparameters is still an open problem. In [19] the authors used differential geometry tools to construct particular priors which contain as particular case the entropic and conjugate priors. In this paper we choose this last one. When applied the particular priors of ([19]) for our case, we find the following conjugate priors : • Inverse Gamma IG(α0εi , β0εi ) and IG(αi 0 , βi 0 ) respectively for the variances σε2i and σi2 k , • Gaussian N (mi 0 , σi2 0 ) for the means mi k . The hyper-hyperparameters αi 0 , βi 0 , mi 0 and σi2 0 are fixed and the results are not in general too sensitive to their exact values. However in case of noisy images we can 10

constrain small value on σi2 0 in order to force the reconstruction of homogeneous regions.

3 A posteriori distributions for the Gibbs algorithm The Bayesian approach consists now in estimating the whole set of variables (f , z, θ) following the joint a posteriori distribution p(f , z, θ|g). It is difficult to simulate a joint ˆ directly from his joint a posteriori distribution. However we can note ˆ θ) sample (fˆ , z, that considering the prior laws defined before, we are able to simulate the conditional a posteriori laws p(f , z|g, θ) and p(θ|g, f , z). That is the main reason to propose a Gibbs ˆ splitting first this set of variables into two subsets, (f , z) ˆ θ), algorithm to estimate (fˆ , z, and (θ), and then into three subsets f , z and θ using the following relation : p(f , z|g, θ) = p(f |z, g, θ)P (z|g, θ),

(9)

Then the sampling of the joint distribution p(f , z|g, θ) is obtained by sampling first P (z|g, θ) and then sampling p(f |z, g, θ). We will now define the conditional a posteriori distributions we use for the Gibbs algorithm. Sampling z using P (z|g, θ) : For this step we have : P (z|g, θ) ∝ p(g|z, θ) P (z) =

M Y i=1

p(gi |z, θ i ) P (z)

where using the relation (3) and the laws p(fi |z) and p(εi ) we obtain p(gi |z, θ i ) =

Y

p(gi (r)|z(r), θ i )

r∈S

11

and (10)

p(gi (r)|z(r) = k) = N (mi k , σi2 k + σε2i )

As we chose a Potts Markov Random Field model for the labels z, we may note that an exact sampling of the a posteriori distribution P (z|g, θ) is still impossible. However we may note that P (z|g, θ) is still a PMRF where the probabilities are weighted by the likelihood p(g|z, θ). We use this fact to propose in section 4 a parallel implementation of a Gibbs sampling for this PMRF. Sampling fi using p(fi |gi , z, θ i ) : We can write the a posteriori law p(fi (r)|gi (r), z(r), θ i ) as follows : apost

p(fi (r)|gi (r), z(r) = k, θ i ) = N (mi apost , σi2 k k

)

where mi apost k apost

σi2 k



gi (r) mi k + 2 = σε2i σi k  −1 1 1 = + 2 2 σε i σi k apost σi2 k



sampling θ i using p(θ i |fi , gi , z) : We have the following relation : p(θ i |fi , gi , z) ∝ p(mi , σ 2i |fi , z) p(σε2i |fi , gi ) For the first term p(mi , σ 2i |fi , z) we have to use a Gibbs algorithm and then sample following the conditional distributions p(mi |σ 2i , fi , z) and p(σ 2i |mi , fi , z). Using again the Bayes formula, the a posteriori distributions are calculated from the prior selection fixed before and we have 12

• mi k |fi , z, σi2 k , mi 0 , σi2 0 ∼ N (µi k , vi2 k ), with µi k vi2 k

1 X mi 0 fi (r) + = vi2 k σi2 0 σi2 k r∈R k −1  1 nk + 2 = 2 σi k σi 0

!

• σi2 k |fi , z, mi k , αi 0 , βi 0 ∼ IG(αi k , βi k ), with nk 2 1 X = βi 0 + (fi (r) − mi k )2 2 r∈R

αi k = αi 0 + βi k

k

• σε2i |fi , gi ∼ IG(αi , βi ), with S + α0εi , S = number of pixels 2 1 ||gi − fi ||2 + β0εi = 2

αi = βi

4 Parallel implementation of the sampling of p(z|g, θ) As we could see in previous section, to generate samples from p(f , z, θ|g) we generate alternatively samples z from P (z|g, θ), then f from p(f |g, z, θ) and finally θ from p(θ|f , g, z). The second step is easy because p(f |z, θ, g) is Gaussian. The last step is also easy because we have to generate samples from either a Gaussian or an Inverse Gamma distribution. The first step, i.e. sampling z from P (z|g, θ), is not easy and by itself needs a Gibbs sampler. However, as we chose a first order neighborhood system for the a priori PMRF of the labels P (z), the a posteriori is still a PMRF with the same neighborhood. We can then decompose the whole set of pixels into two subsets (odd and

13

even position) forming a chess board (see figure 2). In this case if we fix the black (respectively white) labels, then the white (respectively black) labels become independent. This decomposition reduces the complexity of the Gibbs algorithm because we can simulate the whole set of labels in only two steps. black labels

white labels

Figure 2: Chess board decomposition of the labels z The Parallel Gibbs algorithm we implemented is then the following : given an initial state ˆ (0) , (θˆ1 , θˆ2 , z) repeat 1.

2.

Parallel Gibbs sampling until convergence 

ˆ (n−1) simulate zˆB (n) ∼ p z|zˆW (n−1) , g, θ   (n−1) (n) (n) ˆ ˆ ˆ simulate zW ∼ p z|zB , g, θ   (n) (n−1) simulate fˆi ∼ p fi |gi , zˆ(n) , θˆi   (n) (n) (n) ˆ ˆ simulate θ i ∼ p θ i |fi , zˆ , gi



5 Accounting for local spatial dependency inside regions We want now to introduce a local dependency between pixels of fi k which are in a same homogeneous region k. In previous section we assumed that these pixels are independent even if they share the same mean and variance. In this section we want to relax this hypothesis by accounting for possible local correlation. Our aim is to improve the 14

reconstructed images and then (because our algorithm is iterative) improve the quality of our classification. We will now describe this new modelization and the modifications it implies.

5.1 New modelization on the images fi We now consider that pixels fi (r) inside a same region are locally dependent. However pixels being in different regions stay independent. Note that this is our a priori hypothesis. All the pixels either inside a given region or in different regions are a posteriori interdependent. To be able to distinguish between the pixels in different regions we introduce a hidden ”contour” variable q = {q(r), r ∈ S} as follows : q(r) = 0, = 1,

if {z(s), s ∈ V(r)} are in a same region, else

We may note that when z is given, q is obtained in a deterministic way. So, q(r) is related to z(r) and then the distribution of q is related to the distribution of z by the following relation : P (q(r) = 1|z) = 1 −

Y

s∈V(r)

δ(z(r) − z(s))

(11)

Then we have : p(fi |z, q, θ i ) =

K Y

k=1

p(fi k |z, q, θ i )

Let note fi V (r) = {fi (s), s ∈ V(r)}, where V(r) stands for the neighborhood of r and |V| is its size (the number of pixels of the neighborhood system which is 4 here). Then

15

we can write : if q(r) = 1

p(fi (r)|z(r) = k, q(r), fi V (r), θ i ) = N (µk , σk2 )

σk2 1 X fi (s), ) = N( 4 4

if q(r) = 0

(12)

s∈V(r)

where

1 4

P

s∈V(r)

fi (s) is the mean value of the four neighboring pixels around the pixel

position r. Note also that we can group these two cases together by noting mfi (r) = q(r)µk + (1 − q(r)) σf2i (r) = q(r)σk2 + (1 − q(r))

1 X fi (s) 4 s∈V(r)

σk2 4

With these notations we can write the distribution of the likelihood p(gi (r)|z(r) = k, q(r), fi V (r), θ i ) as in section 2 : p(fi (r)|z(r) = k, q(r), fi V (r), θ i ) = N (mfi (r) , σf2i (r) ), and p(gi (r)|z(r) = k, q(r), fi V (r), θ i ) = N (mfi (r) , σf2i (r) + σε2i )

5.2 A posteriori distributions As we chose a spatial dependency between pixels fi (r) with a neighborhood system of 4 pixels, we have the same problem as for the labels. Then we have to decompose the set of variables fi into two subsets, fi W and fi B , which represent respectively odd numbered position (labeled white) and the even numbered position (labeled black) pixels of the image fi . Let note also f W = {fi W }i=1,...,M and f B = {fi B }i=1,...,M . For this case we propose then to decompose directly the set of variables into three subsets 16

: (f W , zW ), (f B , zB ) and θ and then we have to sample them with their conditional a posteriori distributions. For the first two subsets we can use the same decomposition of (9) : p(f W , zW |f B , zB , g, θ, q) = p(f W |f B , z, g, θ, q) P (zW |f B , zB , g, θ, q) p(f B , zB |f W , zW , g, θ, q) = p(f B |f W , z, g, θ, q) P (zB |f W , zW , g, θ, q) and we have also p(f W |f B , z, g, θ, q) = p(f B |f W , z, g, θ, q) =

M Y i=1 M Y i=1

p(fi W |fi B , z, gi , θ i , q) p(fi B |fi W , z, gi , θ i , q)

Sampling fi B and fi W using p(fi B |fi W , z, gi , θ i , q) and p(fi W |fi B , z, gi , θ i , q) With this decomposition we have the following relations : p(fi B |gi , fi W , z, q, θ i ) = p(fi W |gi , fi B , z, q, θ i ) =

Y

r black

Y

r white

p(fi (r)|gi (r), fi V (r), z(r), q(r), θ i ) p(fi (r)|gi (r), fi V (r), z(r), q(r), θ i ),

and with the same method of section 4, we obtain the a posteriori distribution : 2 ), p(fi (r)|gi (r), z(r), q(r), fi V (r), θ i ) = N (mapost , σapost

with mapost 2 σapost

gi (r) mfi (r) 2 = σapost + 2 σε2i σfi (r) !−1 1 1 + 2 = 2 σεi σfi (r) 17

!

Sampling zB and zW using P (zB |f W , zW , g, θ, q) and P (zW |f B , zB , g, θ, q) Using the Bayes rule we have P (zB |zW , g, f W , q, θ) ∝ p(g|z, f W , q, θ) p(f W |z, q, θ) P (zB |zW )

(13)

Due to the term p(f W |z, q, θ) in the right hand side of 13, we can not obtain an explicite expression for the a posteriori distribution P (zB |zW , g, f W , q, θ). We propose then, for this step, two different approximations. The first one is to approximate p(f W |z, q, θ) by its expected value with respect to zB : p(f W |z, q, θ) ≈

X zB

p(f W |z, q, θ) P (zB |zW ) = p(f W |zW , q, θ),

(14)

which becomes a constant with respect to zB . Indeed this approximation can be interpreted as a mean field approximation method ([20]). The approximated a posteriori distribution we propose to use for this step is : Pˆ (zB |zW , g, f W , q, θ) ∝ p(g|z, f W , q, θ) P (zB |zW ) ∝ P (zB |zW )

M Y Y

i=1 r black

(15)

p(gi (r)|z(r), fi V (r), q(r), θ i )

We also have the symmetric relation P (zW |zB , g, f B , q, θ) ∝ P (zW |zB )

M Y Y

i=1 r white

(16) p(gi (r)|z(r), fi V (r), q(r), θ i )

Note that the likelihood function p(gi (r)|z(r), fi V (r), q(r), θ i ) = N (mfi (r) , σf2i (r) + σε2 ) is different from p(gi (r)|z(r), θ i ) = N (mk , σk2 + σε2 ) in section 3 and more expensive in computer time. The second approximation we propose then is to use the second expression in place of the first one in this step. 18

Updating q As we mentionned before, given z, q is determined in a deterministic way and is updated using the current variable z and the relation (11). Sampling θ i |z, gi , fi , q We still use the same method to obtain the a posteriori distributions of the parameters of θ i . However we have here to decompose the set Rk into two subsets as follows : Rk = Rk0 ∪ Rk1 with Rki = {r; z(r) = k, q(r) = i}. Let also note nik = |Rki |. With this decomposition we can calculate the a posteriori distributions of θ i : • mi k |fi , z, q, σi2 k , mi 0 , σi2 0 ∼ N (µi k , vi2 k ), with   X 1 mi µi k = vi2 k  2 0 + 2 fi (r) σi 0 σi k 1 r∈Rk −1  1 1 nk + 2 vi2 k = 2 σi k σi 0 An approximation of these equations can be to replace Rk1 by the whole Rk in the determination of µi k . Indeed even if we have changed the model by introducing spatial dependency on fi , we have still in mind that pixels fi (r) which are in a same homogeneous region must have the same mean. Then we grow the number of pixels for calculating µi k when we replace Rk1 by Rk .

19

• σi2 k |fi , z, q, mi k , αi 0 , βi 0 ∼ IG(αi k , βi k ), with αi k = αi 0 + βi k = βi 0 +

nk 2 1 X

2

r∈Rk1

(fi (r) − mi k )2 + 2

X

r∈Rk0

(fi (r) −

1 X fi (r))2 4

(17)

s∈V(r)

• σε2i |fi , gi ∼ IG(νi , Σi ), with S + α0εi , S = total number of pixels 2 1 = ||gi − fi ||2 + β0εi 2

νi = Σi

5.3 New Gibbs algorithm The difference between the algorithm of section 4 is in the decomposition of the set of variables. The Gibbs algorithm we have implemented is then : repeat 1.

2.

3. 4.

Parallel Gibbs sampling until convergence   (n−1) ˆ (n−1) , qˆ(n−1) simulate zˆW (n) ∼ Pˆ zW |zˆB (n−1) , fˆB , g, θ   (n) (n−1) (n−1) (n−1) ˆ (n−1) ˆ , qˆ , θi simulate fi W ∼ p fi W |fi B gi , zˆ   (n) ˆ (n−1) , qˆ(n−1) simulate zˆB (n) ∼ Pˆ zB |zˆW (n) , fˆW , g, θ   (n) (n) (n) ˆ (n−1) (n−1) ˆ simulate fi B ∼ p fi B |fi W gi , zˆ , θ i , qˆ compute qˆ(n) using zˆ(n)   (n) (n) simulate θˆi ∼ p θ i |fˆi , zˆ(n) , gi

20

6 Simulation and results In this section we present results of our two models in different cases. First we test our methods on fully simulated data sets to evaluate the different performances in case of presence of noise. Then we present results on MRI images, but with the addition of an artificial noise to compare the two methods. This second test permits us to have noisy registered data sets of the same objects. Then those images are also considered here as test images to compare the two proposed methods. Finally we test our algorithm in a real application of security system using two X-ray images of the same object : X-ray in transmission and in backscattering. These data sets are courtesy of the permission of American Science and Engineering, Inc, 2003 (www.as-e.com). In the following we note by ”HMMI” the first method described on this paper, and by ”HMMC” the second method where we have introduced a local spatial correlation on the pixels of the images fi .

6.1 Simulated data Here we have constructed two (256 × 256) normalized images, noted by f1 and f2 with individual and common regions (fig. 3-a) . We then added independent Gaussian noises then to obtain the noisy images g1 and g2 (fig. 3-b) wich we used as data for our proposed data fusion methods.. The performances of these methods are evaluated using the following measure between two images u and v : d(u, v) =

||u − v||2 ||u||2

We compared then the performances of these methods as a function of the variance of HMMI and noise in the observations. The estimated images are respectively noted by fˆi

21

σε2i

d (f1 , g1 )

0 0.001 0.002 0.005 0.01 0.02 0.05 0.1

0 0.0126 0.0258 0.0644 0.1288 0.2575 0.6438 1.2876

“ ” HMMI d f1 , fˆ1

0.0000 0.0053 0.0109 0.0279 0.0568 0.1239 0.3215 0.6805

“ d f1 , fˆ1

HMMC

0.0000 0.0014 0.0016 0.0021 0.0058 0.0094 0.0279 0.0628



d (f2 , g2 )

0 0.0013 0.0027 0.0067 0.0134 0.0268 0.0670 0.1340

“ d f2 , fˆ2

HMMI

0.0000 0.0006 0.0011 0.0029 0.0064 0.0142 0.0367 0.0766



“ ” HMMC d f2 , fˆ2

0.0000 0.0001 0.0002 0.0003 0.0011 0.0019 0.0049 0.0105

Table 1: Comparison of the two methods with noisy data σε2i 0 0.001 0.002 0.005 0.01 0.02 0.05 0.1

sˆεHMMI 1 0.0003 0.0007 0.0011 0.0020 0.0040 0.0068 0.0155 0.0310

sˆεHMMI 2 0.0003 0.0007 0.0011 0.0019 0.0038 0.0069 0.0146 0.0254

sˆεHMMC 1 0.0002 0.0010 0.0019 0.0048 0.0094 0.0193 0.0486 0.0966

sˆεHMMC 2 0.0002 0.0010 0.0018 0.0046 0.0088 0.0180 0.0453 0.0937

Table 2: Estimation of the noise variance HMMC . fˆi

Results of reconstruction and segmentation : Figure 3 and 4 show some results with different values of noise’s variance. When the observations have not any noise, both methods give perfect results of segmentation. However we can note that in this case the first method converges faster than the second. In the presence of noise, the degradation of the segmentation appears in the smallest regions. Indeed the results of segmentation show the loss of small regions, especially with the first method. This is due to the fact that even if the Gibbs algorithm asymptotically ensures the convergence to the global minimum, it can be locked in a local minimum. This particular case appears when observations are too noisy. We can also remark that the first method does not significantly increase the quality of the reconstructed images.

22

(a) f1 , f2

(b) g1 , g2

HMMI HMMI (c) fˆ1 , fˆ2 , zˆHMMI

HMMC HMMC , zˆ HMMC , fˆ2 (d) fˆ1

Figure 3: Results of data fusion with high SNR (σε2i = 0.001) : (a,b) original images f1 and f2 and (b) their corresponding observations g1 and g2 . (c) results of data fusion (7 HMMI HMMI and zˆ HMMI ). (d) results of , fˆ2 labels) with the first model (from right to left : fˆ1 HMMC HMMC and , fˆ2 data fusion (7 labels) with the second model : (from right to left : fˆ1 HMMC zˆ .) The algorithm seems to reconstruct exactly the data without canceling noise. However the second method gives denoised images and a better segmentation. We can also note that individual regions of the first data set g1 (resp. g2 ) appears in the reconstructed image fˆ2 (resp. fˆ1 ). This is due to the modeling we have chosen, where we considered a unique segmentation and reconstructed both images from it, which means that the two images consist of the same objects. Noise estimation : Table 1 summarizes the performances of the second method. Indeed the denoising part of this method implies a reduction of the measure d by a factor 20 in 23

(a) f1 , f2

(b) g1 , g2

HMMI HMMI (c) fˆ1 , fˆ2 , zˆ HMMI

HMMC HMMC , zˆ HMMC , fˆ2 (d) fˆ1

Figure 4: Results of data fusion with low SNR (σε2i = 0.01) : (a,b) original images f1 and f2 and (b) their corresponding observations g1 and g2 . (c) results of data fusion with the HMMI HMMI and zˆ HMMI . (d) results of data fusion with , fˆ2 first model (from right to left : fˆ1 HMMC HMMC and zˆ HMMC . , fˆ2 the second model : (from right to left : fˆ1 relation to the initial measure between the real data and the noisy observations. The first method permits to reduce this measure only by a factor 2. The gain of performance of the second method is then significant when the observations are noisy. Also in table 2 we can see that the noise is better estimated by the second algorithm, which confirms the better quality of the denoising step. However even if the estimated images are better, the common segmentation is not changed in relation to the first method.

24

6.2 Medical imaging

(a) f1 , f2 , f3

(b) g1 , g2 , g3

HMMI HMMI HMMI (c) fˆ1 , fˆ2 , fˆ3 , zˆ HMMI

HMMC HMMC HMMC (d) fˆ1 , fˆ2 , fˆ3 and zˆ HMMC

Figure 5: Data fusion of medical images : (a) original data. (b) noisy observations with a Gaussian noise of variance 0.005. (c) Estimation with the first method (7 labels). (d) Estimation with the second method (7 labels). Here we illustrate an example of MRI noisy images : T1-weighted, T2-weighted and T1weighted with contrast agent slices of a MR brain image, which are (289 × 236) images. 25

Here we used these images as the test images f1 , f2 and f3 . Then we added Gaussian iid noises to them to obtain the simulated observations g1 , g2 and g3 , according to the observation model 3. Figure 5 shows the reconstruction and joint segmentation results of our algorithms. This confirms the remarks made on simulated data : the reconstructions are largely better with the second method. However we can see that the segmentation results are almost the same except in the central part where some single pixels are badly classified with the first method. In case of high signal-to-noise ratio it is not necessary to use the second method. Finally we have to note that we did not introduce any physiological information on particular tissues or on the characteristics of the MRI images. Those data sets are only used as test images. We can expect for better results if we study more in detail the particularities of MRI images.

6.3 Imaging in security systems Here we test our algorithms on two images (transmission and backscattering X-rays data) of a suitcase which are (141 × 198) images. We compared then our two fusion methods to some other classical algorithms provided by a Matlab fusion Toolbox ([21]): Average, Principle Component Analysis (PCA), Laplacian pyramid and Shift Invariant Discrete Wavelet Transform (SIDWT). Figure 6 then shows the different results of the fusion methods. In all the methods of the Matlab fusion toolbox the right gun is not detected because it has not enough contrast changes, in relation to the the details present in the same location in the backscattering Xray image. Because our algorithms produce a segmentation the right gun appears clearly 26

(a) g1 and g2

(b) Average

(c) PCA

(d) Laplacian pyramid

(e) SIDWT

(f) HMMI

(g) HMMC

Figure 6: Data fusion in X-ray security system images : (a) original data. Fusion result of different methods : (b) Average, (c) PCA-transform, (d) Laplacian Pyramid, (e) SIDWT, (f) our first method (8 labels), (g) our second method (8 labels). after convergence. In particular our first method presents good results of detection of the two guns. However we can expect for better results on these images if we implement a texture classification. This can be possible if we extend our second model by considering the neighborhood of a pixel fi (r) differently than computing the mean. This remains an open problem.

27

7 Estimation of K In the proposed model we must have a quite precise idea about the number K of labels. Indeed in the case of the simulated data we chose K = 7 even if the theoretical perfect segmentation consists of only 6 regions and thus 6 labels. The two algorihtms canceled one label during the iterations and resulted to a good segmentation. In simulations of the section 6.3 too, we obtained quite similar results with K ∈ {8, 9, 10}. However our algorithms need to have fixed value of this parameter. In a fully unsupervised joint segmentation we have to estimate K. There are a great number of works on the estimation of K. In [17], K can vary along the iterations of the algorithm using a Reversible Jump MCMC method, but this solution is too expensive to be implemented in real applications. A more tractable solution consists ([22, 23]) in estimating this parameter by a preprocessing step using prior information or fixing bounded value of K. In particular the authors in [22] propose to use the minimum description length (MDL) as a function of K in the case of a Finite Normal Mixture (FNM) model. For our case we can write the MDL function as follows : ˆ + 0.5(Ka )log(S), M DL(K) = −logL(Θ) ˆ is the ML estimate of Θ = {{P (z(r) = k)}r∈S,k=1,...,K }, θ}, L(Θ) is the likewhere Θ lihood of the model parameter, and Ka is the number of degrees of freedom of the model. Considering the HMM model and the assumptions of section 3 we have : p(g1 (r), . . . , gM (r)|Θ) = =

K X

k=1 K X

P (z(r) = k) p(g1 (r), . . . , gM (r)|z(r) = k, θ) P (z(r) = k)

M Y i=1

k=1

28

p(gi (r)|z(r) = k, θ i )

Because we chose a PMRF model on the labels the computation of P (z(r) = k) is quite impossible. Then for this preprocessing we propose to make the approximation that the labels are independent, P (z(r) = k) = πk , which is the case of FNM model, and we can write : p(g1 (r), . . . , gM (r)|Θ) =

K X

πk

k=1

M Y

p(gi (r)|z(r) = k, θ i )

i=1

ˆ = {π1 , . . . , πK , θ} is done by a hybrid method The computation of the ML estimate Θ using Expectation-Maximization (EM) and Classification-Maximization (CM) algorithms ([22]). K MDL (without noise) MDL (with noise)

3 4 5 6 -6037 -8305 -9236 -10026 -1225 -1213 -1199 -1239

7 8 -10024 -10021 -1227 -1214

9 -10019 -1202

Table 3: Estimation K for the simulated data K M DL(g1 ) M DL(g2 )

5 -13359 -16149

6 7 -13587 -13648 -16599 -16816

8 9 -13715 -13717 -17026 -17134

10 -1.3695 -17175

ˆ K 9 10

Table 4: Estimation K fro the sut-cases taken independently 4

−3.7

x 10

−3.8 −3.9

MDL

−4 −4.1 −4.2 −4.3 −4.4 −4.5 5

10

15

K

20

25

30

Figure 7: Estimation of joint K for the suit-cases Table (3) shows the results of the estimation of K for the simulated data. The method seems to give good results of estimation both for no nose data and noisy data. In the case of inspection imaging this method gives also reasonable estimation of K if we take each 29

ˆ K 6 6

image independently (Table 4. However in the case of the joint segmentation problem the MDL objective function did not reach the minimum (Figure 7) and seems to be quite constant between 20 and 30 labels. These results are due to the fact that both images present a lot of small regions with different grey-scale values. The rough approximation of the likelihood of the HMM model by the likelihood of the FNM model seems then to be not efficient. Our future studies is then on a criterion which take into account the HMM model.

Aknowledgement The authors would like to thank the referees for their useful remarks and suggestions which improved the content of this paper.

8 Conclusion We proposed a Bayesian approach for data fusion of images using a hierarchical Markov modeling which permits us to obtain a joint segmentation for these images as data fusion result. The proposed MRF for the labels is the Potts MRF. We proposed then two particular models for the pixels of images in each segment. The first model considers these pixels independent and the second model introduces a local spatial dependency on these pixels. We then developed appropriate Gibbs sampling for the two models and illustrated how joint segmentation and reconstruction can be obtained in cases of simulated data sets. We showed then how denoising and fusion can be obtained at the same time with an MCMC algorithm. We showed also that our approach gives better results of fusion than classical 30

methods, in the case of X-ray inspection images. However we assume for the moment that the sensor images are registered. We think that our modelization is promising for introducing a registration and blur operator Hi and then implementing the common segmentation, deblurring and registration at the same time. This remains an open problem and is our future studies.

References [1] S. Gautier, G. Le Besnerais, A. Mohammad-Djafari, and B. Lavayssi`ere, “Data fusion in the field of non destructive testing,” in Maximum Entropy and Bayesian Methods. Kluwer Academic Publ., Santa Fe, NM, K. Hanson edition, 1995. [2] T. Bass, “Intrusion detection systems and multisensor data fusion,” in Comm. of the ACM, vol. 43, April 2000, pp. 99–105. [3] G. Matsopoulos, S. Marshall, and J. Brunt, “Multiresolution morphological fusion of MR and CT images of the human brain,” in IEEE Proceedings on Vision, Image and Signal Processing, vol.141 Issue : 3, Seattle, USA, 1994, pp. 137–142. [4] B. Johnston and B. Mackiewitch, “Segmentation of multiple sclerosis lesions in intensity corrected multispectral MRI,” IEEE Trans. on medical imaging, pp. 154– 169, April 1996. [5] Chuin-Mu Wang and Clayton Chi-Chang Chen et al., “Detection of spectral signatures in multispectral MR images for classification,” IEEE Trans. on medical imaging, pp. 50–61, January 2003. 31

[6] E. Reddick and J.O. Glass et al., “Automated segmentation and classification of multispectral Magnetic Resonance Images of brain using artificial neural networks,” IEEE Trans. on medical imaging, pp. 911–918, December 1997. [7] M.N. Ahmed and M. Yamany et al., “A modified fuzzy c-means algorithm for bias field estimation and segmentation of MRI data,” IEEE Trans. on medical imaging, pp. 193–199, March 2002. [8] R.K. Sharma, Probabilistic model-based multisensor image fusion, Ph.D. thesis, Graduate Institute of Science and Technology, Oregon, USA, 1999. [9] Du Yong et al., “Satellite image fusion with multiscale wavelet analysis for marine applications : preserving spatial information and minimizing artifacts (PSIMA),” J. Remote Sensing, Vol.29, No 1, pp. 14–23, 2003. [10] Ramesh Chaveli et al., “Fusion performance measures and a lifting wavelet transform based algorithm for image fusion,” in Information Fusion, Proc. of the 5th int. conf. on, July 2002, pp. 317–320. [11] P.S. Chavez and A.Y. Kwarteng, “Extracting spectral contrast in Landsat thermal mapper image data using Principal Component Analysis,” in PE and RS(55), 1989, pp. 339–348. [12] G. Simone and F.C. Morabito, “ICA-NN based data fusion approach in ECT signal restoration,” in Neural Networks, Proceeding of the IEEE-INNS-ENNS International Joint Conference on, Vol 5, July 2000, pp. 59–64.

32

[13] C.H. Chen and Z. Xiaouhui, “On the roles of PCA and ICA in data fusion,” in Geoscience and Remote Sensing Symposium, IEEE International Proceedings IGARSS, vol 6, July 2000, pp. 2620–2622. [14] K. Held and E.R. Kops et al., “Markov Random Field segmentation of brain MR images,” IEEE Trans. on medical imaging, pp. 878–886, December 1997. [15] L. Aurdal, Analysis of Multi-Image Magnetic Resonance Acquisitions for Segmentation and Quantification of Cerebral Pathologies, Ph.D. thesis, Ecole Nationale Sup´erieure des T´el´ecommunications, ENST, Paris, France, 1997. [16] Tu Zhuowen and Zhu Song-Chun, “Image segmentation by data-driven Markov Chain Monte Carlo,” IEEE Trans. on pattern analysis and machine intelligence, pp. 657–673, May 2002. [17] Z. Kato, “Bayesian color image segmentation using Reversible Jump Markov Chain Monte Carlo,” Tech. Rep., ERCIM (European Research Consortium for Informatics and Mathematics), february 1999. [18] D. Higdon, Spatial Applications of Markov Chain Monte Carlo for Bayesian Inference, Ph.D. thesis, University of Washington, 1994. [19] H. Snoussi and A. Mohammad-Djafari, “Fast joint separation and segmentation of mixed images,” Journal of Electronic Imaging, vol 13(2), April 2004. [20] D. Chandler, Introduction to modern statistical mechanics, Oxford university press, 1987.

33

[21] O. Rockinger and T. Feshner, “Pixel-level image fusion : the case of image sequences,” in Proc. SPIE, vol. 3374, february 1998, pp. 378–398. [22] Tianhu Lei and Wilfred Sewchand, “Statistical approach to X-ray CT imaging and its applications in image analysis–part ii : a new stochastic model-based image segmentation technique for X-ray CT image,” IEEE Trans. on medical imaging, vol. 11, no. 1, pp. 62–69, March 1992. [23] Tianhu Lei and Jayaram K. Udupa, “Performance evaluation of finite normal mixture model-based image segmentation techniques,” IEEE Trans. on image proceesing, vol. 12, no. 10, pp. 1153–1169, October 2003. [24] F. Samadzadegan, “Fusion techniques in remote sensing,” in Com. IV Joint workshop on challenges in geospatial analysis integration and visualisation II, Stuttgart, Germany, September 2003. [25] G. Gindi, M. Lee, A. Rangarajan, and I. George Zubal, “Bayesian reconstruction of functional images using anatomical information as priors,” IEEE Transaction on medical imaging, vol. 12, no. 4, pp. 670–680, 1993. [26] T. Hebert and R. Leahy, “A generalized EM alogorithm for 3-D Bayesian reconstruction from Poisson data using Gibbs priors,” IEEE Transaction on medical imaging, vol. 8, no. 2, pp. 194–202, June 1989. [27] S. Gautier, J. Idier, A. Mohammad-Djafari, and B. Lavayssi`ere, “X-ray and ultrasound data fusion,” in Proceeding of the International Conference on Image Processing, Chicago, USA, October 1998, pp. 366–369. 34

[28] C. Robert, M´ethodes de Monte Carlo par Chaˆınes de Markov, Economica, Paris, France, 1996.

List of Figures 1

Examples of images for data fusion and joint segmentation. a) T1-weighted, T2-weighted and T1-weighted with contrast agent transversal slices of a 3D brain MR images. b) Two observations from transmission and backscattering X rays in security systems (with the permission of American Science and Engineering, Inc., 2003) . . . . . . . . . . . . . . . . . .

5

2

Chess board decomposition of the labels z . . . . . . . . . . . . . . . . .

14

3

Results of data fusion with high SNR (σε2i = 0.001) : (a,b) original images f1 and f2 and (b) their corresponding observations g1 and g2 . (c) results HMMI of data fusion (7 labels) with the first model (from right to left : fˆ1 , HMMI and zˆ HMMI ). (d) results of data fusion (7 labels) with the second fˆ2 HMMC HMMC and zˆ HMMC .) . . . . . . . . . . , fˆ2 model : (from right to left : fˆ1

4

23

Results of data fusion with low SNR (σε2i = 0.01) : (a,b) original images f1 and f2 and (b) their corresponding observations g1 and g2 . (c) results HMMI HMMI of data fusion with the first model (from right to left : fˆ1 , fˆ2 and

zˆ HMMI . (d) results of data fusion with the second model : (from right to HMMC HMMC left : fˆ1 , fˆ2 and zˆ HMMC . . . . . . . . . . . . . . . . . . . . . . .

35

24

5

Data fusion of medical images : (a) original data. (b) noisy observations with a Gaussian noise of variance 0.005. (c) Estimation with the first method (7 labels). (d) Estimation with the second method (7 labels). . . .

6

25

Data fusion in X-ray security system images : (a) original data. Fusion result of different methods : (b) Average, (c) PCA-transform, (d) Laplacian Pyramid, (e) SIDWT, (f) our first method (8 labels), (g) our second

7

method (8 labels). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27

Estimation of joint K for the suit-cases . . . . . . . . . . . . . . . . . . .

29

List of Tables 1

Comparison of the two methods with noisy data . . . . . . . . . . . . . .

22

2

Estimation of the noise variance . . . . . . . . . . . . . . . . . . . . . .

22

3

Estimation K for the simulated data . . . . . . . . . . . . . . . . . . . .

29

4

Estimation K fro the sut-cases taken independently . . . . . . . . . . . .

29

36