Spatial Information in Entropy-based Image Registration

entropy estimated through an MST algorithm was used as the registration func- ..... The Kruskal algorithm is widely believed to be the fastest general purpose ...
202KB taille 3 téléchargements 285 vues
Spatial Information in Entropy-based Image Registration Mert R. Sabuncu and Peter J. Ramadge Department of Electrical Engineering Princeton University Princeton, NJ 08544

Abstract. Information-theoretic approaches (e.g. mutual information) have yielded accurate and robust similarity measures for multi-modal and mono-modal image registration. However, recent research suggests that registration based on mutual information has room for improvement. The paper proposes a method for including spatial information in this approach by using spatial feature vectors obtained from the images. A minimum spanning tree algorithm is employed to compute an estimate of the conditional entropy in higher dimensions. The paper includes the theory to motivate the proposed similarity measure. Experimental results indicate that the suggested method can achieve a more robust and accurate registration compared to similarity measures that don’t include spatial information.

1

Introduction

The goal of image registration is to geometrically transform one image so that physical (for medical imaging, physiological) correspondences line up. This is usually a precursor to comparison or fusion of the images. The problem is multimodal if images are obtained through two different imaging devices, e.g. functional Magnetic Resonance images (fMRI) and structural MR images. A popular approach to this problem defines a similarity measure between the images and performs an optimization over allowed transformations to maximize this measure. Here, we consider “intrinsic” image registration methods that are based on voxel or pixel intensities. In the literature, many different similarity measures have been proposed under this category. However, most of these methods assume a strict prior distribution on the noise (such as i.i.d. Gaussian noise) and/or a linear relation between the two image intensities. Due to these assumptions, these methods generally perform poorly for multi-modal registration. An information-theoretic approach was recently introduced by Viola et al. [11]. The main idea is to use a distance measure based on Mutual Information (MI) and its variants [8]. Using MI was motivated by a maximum likelihood (ML) solution to a certain model. However, the assumption that the imaging function doesn’t depend on spatial position [12] is very restrictive. Many researchers [7, 9] noticed this flaw and proposed heuristic improvements. One approach is to use higher dimensional feature vectors to capture the spatial information. However,

until recently a good estimate for the entropy of higher dimensional p.d.f.’s was not available. We extend Viola’s approach to include spatial information by proposing a more general similarity measure for the registration problem. Experimental results suggest that our method improves the accuracy and robustness of the registration in the presence of intensity variations. We apply the minimum spanning tree (MST) based entropy estimator [3] and our new registration function to several examples to illustrate the performance of our method. Additionally, we include a comparison of our similarity measure to that used in [4] where joint entropy estimated through an MST algorithm was used as the registration function.

2

Prior Work

Viola et al. [11] and Collignon et al. [5] were the first researchers to use informationtheoretic similarity measures for image registration. Mutual information has the advantage of not requiring the definition of landmarks or features such as surfaces. Moreover, it is shown to be well-suited for multi-modal image registration problems. However, despite some promising results, recent research [7, 10, 9] has suggested that registration based on mutual information has room for improvement. There can be cases where the mutual information registration function contains local maxima and these can cause misregistration [7]. Different approaches have been proposed to address these issues. These include other information-theory based similarity measures, such as normalized mutual information [10], “higherorder” mutual information [9] and combining gradient information with mutual information [7]. In [7], Puim et al. propose incorporating spatial information by multiplying the registration function with a so-called gradient information term. The main idea is that at registration, locations with large gradient magnitude should be aligned and the orientation of the gradients at those locations should be similar. Experiments suggested that this new measure outperformed regular mutual information based techniques, however the approach is ad-hoc. In [9], Rueckert et al. suggest employing “second order” mutual information, using the co-occurrence matrices of neighboring voxels’ intensities. They showed that second-order mutual-information outperformed first-order mutual information, since it incorporated spatial information into the registration process. In [4], Hero et al. use the theory of entropic spanning graphs to directly estimate the entropy of a distribution. This has the advantage of by-passing density estimation. Furthermore, for large dimensions this algorithm can be easily implemented, whereas histogram methods cannot. A detailed review of the applications of entropic spanning graphs can be found in [1].

3 3.1

Image Registration and Spatial Information Model and Main Assumptions

We are given two images: the reference image, u(x, y), and the floating image, v(x, y). We assume the reference image contains the scene represented in the floating image. Additionally, we assume that each structure (each tissue in medical images) is mapped through different imaging functions for each image to possibly distinct intensities. These mappings may depend on the spatial location. The mathematical relation between the two images is hence modelled as: v(x, y) = fx,y (u(t(x, y))) + W (x, y) , ∀(x, y) ∈ Ω.

(1)

In the above equation, t(x, y) represents a geometrical transformation of the coordinates. t may be constrained to a certain class, e.g. affine, rigid-body, splinebased nonlinear, or polynomial transformations, depending on the specific application. Ω is a finite subset of the image plane. W (x, y) is random noise. fx,y (.) is the imaging function; it represents the relation between the pixel intensity values of the two images. The imaging function may depend on the spatial location. In general, we will constrain f by a parameterization as follows: Let s(x, y) be a function from the image plane to Rk , some k ∈ Z+ , and let fx,y (.) = gs(x,y) (.). Then: v(x, y) = gs(x,y) (u(t(x, y))) + W (x, y), ∀(x, y) ∈ Ω. (2) 3.2

Maximum Likelihood, Conditional Entropy and Registration

In this section we will develop a registration function in a maximum likelihood framework. Let T be a random geometric transformation selected from a given class with respect to a p.d.f. pT (t). Let U and V be random images, dependent through the transformation T with pixel intensity values satisfying (2). Let tˆ be the ML estimate of T given U and V : tˆ = arg max PU,V |T (U = u, V = v|T = t) t

(3)

With the assumption PU |T (U = u|T = t) = PU (U = u) (3) reduces to tˆ = arg max PV |U,T (V = v|U = u, T = t). t

(4)

Assume the pixel noise W (x, y) is an independent process. It follows from (2): Y PV |U,T (V = v|U = u, T = t) = PW (x,y) (W (x, y) = v(x, y) − gs(x,y) (u(t(x, y)))). (x,y)∈Ω

Hence, maximizing the log-likelihood function of t, is equivalent to maximizing: X m(t) = log PW (x,y) (W (x, y) = v(x, y) − gs(x,y) (u(t(x, y)))). (5) (x,y)∈Ω

However, since the imaging function g(.) and probability distribution of the pixel noise PW (x,y) (.) are unknown, it is not clear how to calculate or estimate m(t). We will investigate the use ofTconditional entropy to approximate m(t). T Let N : = |Ω|, Nc : = |Ω {(x, y) : s(x, y) = c}| and Na,b,c : = |Ω {(x, y) : U (t(x, y)) = a, V (x, y) = b, s(x, y) = c}|. Note that N and Nc are deterministic values, while Na,b,c , since it depends upon U and V , is a random variable. Let ∆ : = {(a, b, c) : ∃(x, y) ∈ Ω, U (t(x, y)) = a, V (x, y) = b, s(x, y) = c}. Assume that the pixel noise is identically distributed on every level-set of s(x, y), that is W (x, y) has an arbitrary distribution that satisfies: PW (x1 ,y1 ) (W (x1 , y1 ) = w1 ) = PW (x2 ,y2 ) (W (x2 , y2 ) = w1 ) = PWc (w1 )

(6)

for ∀ (x1 , y1 ), (x2 , y2 ) ∈ Ω such that s((x1 , y1 )) = s((x2 , y2 )) = c.Then, we can rewrite (5) using this alternative notation: X m(t) = Na,b,c log PWc (b − gc (a)). (7) (a,b,c)∈∆

Next, we will introduce some new random variables and relate m(t) to the conditional entropy of these random variables. Let (X, Y ) be a uniformly distributed random vector taking values in Ω. For a given transformation t, let A := U (t(X, Y )), B : = V (X, Y ) and C : = s(X, Y ). It is obvious that PB|A,C (B = b|A = a, C = c) = PWc (b − gc (a)). Hence m(t), can be rewritten as: X m(t) = Na,b,c log PB|A,C (B = b|A = a, C = c). (8) (a,b,c)∈∆

Theorem 1. ∀ (x1 , y1 ), (x2 , y2 ) ∈ Ω, and ∀ t, assume: PU (t(x1 ,y1 )) (U (t(x1 , y1 )) = a) = PU (t(x2 ,y2 )) (U (t(x2 , y2 )) = a) = Pa Then E(Na,b,c ) = N PA,B,C (A = a, B = b, C = c).

(9)

Proof. E(Na,b,c ) =

X

PU (t(x,y)),V (x,y) (U (t(x, y)) = a, V (x, y) = b).

(x,y)∈Ω:s(x,y)=c

Applying the assumption on the distribution of the pixel intensity values of U P yields E(Na,b,c ) = (x,y)∈Ω: s(x,y)=c PV (x,y)|U (t(x,y)) (V (x, y) = b|U (t(x, y)) = a)Pa . Then using (2), E(Na,b,c ) = Nc PWc (b − gc (a))Pa . Now, let’s look at PA,B,C (.). PA,B,C (A = a, B = b, C = c) = PB|A,C (B = b|A = a, C = c)PA|C (A = a|C = c) ×PC (C = c) = PWc (b − gc (a))Pa

Nc E(Na,b,c ) = . N N u t

The conditional entropy, H(B|A, C) is defined as: H(B|A, C) = −

X

PA,B,C (B = b, A = a, C = c) log

a,b,c

PA,B,C (A = a, B = b, C = c) . PA,C (A = a, C = c) (10)

Corollary 1. For m(t) defined in (5): E(m(t)) = −N H(B|A, C) (11) P Proof. From (8): E(m(t)) = ∆ E(Na,b,c ) log PB|A,C (B = b|A = a, C = c). Using (9) and (10) we get: E(m(t)) = −N H(B|A, C). u t Instead of m(t), that is difficult to estimate, we propose using an estimate for E(m(t)) = −N H(B|A, C) as a registration function. This can be estimated using the MST estimator. 3.3

The Minimum Spanning Tree Estimator

In this section, we will provide a brief background on the MST estimator [3]. One advantage of the MST estimator is that it skips density estimation and directly computes an estimate for the entropy. Thus, the complication of fine-tuning the density estimator parameters is avoided. It has also been shown in [2] that this MST estimator has a faster convergence rate compared to plug-in estimators, especially for non smooth densities. Another advantage is that it provides a consistent estimator for higher dimensional feature vectors. Hence, feature vectors that can incorporate spatial or other types of important information can be more easily employed. In [4], Hero et al. suggest using the R´enyi Entropy as a similarity measure for registration. The α-R´enyi entropy of a density is defined as: Z 1 Rα (Z) = log pα (12) Z (z)dz 1−α Rd where log is the natural logarithm. The limit of R´enyi entropy as α approaches 1 is the well-known Shannon differential entropy. Use of R´enyi Entropy is motivated by the consistent MST estimator which uses the power weighted length of the minimum spanning tree over the samples from the density. In [4], Hero et al. use raw pixel intensity values obtained from the images as the vertices of the graph, i.e. samples from the density. Given a set χn = {z1 , z2 , ..., zn } of n points (vertices) in Rd , a spanning tree is a connected acyclic graph which passes through all points in χn . In this graph, all n points are connected through n − 1 edges {ei }. The minimum spanning tree is the spanning tree that minimizes the total weighted edge length Lγ (χn ) = Σi |ei |γ for a fixed γ. ˆ In [3], Hero et al. show that R(Z) defined below, is a strongly consistent estimate of the R´enyi entropy. Let Lγ be the weighted edge functional and let

χn = {z1 , z2 , ..., zn } be an i.i.d sample drawn from a distribution on [0, 1]d with density pZ (z). Then ˆ R(Z) =

Ld(1−α) (χn ) 1 [log − log βL,d(1−α) ]. 1−α nα

Hero et al. claim that under “proper” registration the R´enyi entropy of the joint distribution of features obtained from the overlapped images should be minimized [4, 6]. This is reflected as the smallest weighted length of the MST. In the previous section, we proposed E(m(t)) = −N H(B|A, C) as the registration function. Note that −N H(B|A, C) = N (H(A, C) − H(A, B, C)). Let s : Ω → Rk . Let Λ and Ψ be lists with repetition defined as {(a, c) : ∃(x, y) ∈ Ω, U (t(x, y)) = a, s(x, y) = c} and {(a, b, c) : ∃(x, y) ∈ Ω, U (t(x, y)) = a, V (x, y) = b, s(x, y) = c}, respectively. For α ≈ 1, we suggest the following as an estimate for (11): N[

L(1+k)(1−α) (Λ) L(2+k)(1−α) (Ψ ) 1 1 log − log βL,(1+k)(1−α) − log α 1−α N 1−α Nα + log βL,(2+k)(1−α) ].

Since βL,(1+k)(1−α) and βL,(2+k)(1−α) don’t depend on the transformation t, the registration problem reduces to maximizing: N [log

3.4

L(1+k)(1−α) (Λ) ]. L(2+k)(1−α) (Ψ )

(13)

Spatial Information

Our model incorporates spatial information through the spatial function s. In general, s may not be known. However, depending on the application, one may have good prior knowledge of the function. Note that, the validity of some assumptions such as (6),(2), depends on s. Thus, the more prior knowledge one has on this function, the more realistic is the model. In an extreme case, one can set s(x, y) = (x, y). Note that, in this case the assumptions on the noise (6) and imaging function (2) are relatively weak. However, intuition and experimental results suggest that the coarser the partitioning defined by s, the more accurate the resulting estimator. In another extreme, one can set s(x, y) = constant. In this case, the assumptions on the noise (6) and imaging function (2) may be too strong.

4

Implementation

The main part of our algorithm involves solving the MST problem on the given set of feature vectors obtained from the images, which are defined to be vectors containing the intensity values and spatial function values, s(x, y) of each pixel. The Kruskal algorithm is widely believed to be the fastest general purpose

algorithm to solve this problem for sparse graphs. It has a time complexity of O(V 2 log V ), V being the number of vertices. This algorithm can be accelerated by applying some modifications suggested by Neemuchwala et al. in [6]. This study concentrates on the accuracy and robustness of the proposed similarity measure in the presence of intensity variances, such as the ones caused by RF inhomogeneity in MR imaging. Thus, the optimization scheme to determine the registration parameters and the transformation space was not included in the study. An exhaustive search was carried out in the registration parameter space to determine the values that maximize the proposed similarity measure. Cubic spline interpolation was used in all experiments.

5

Results

In this section, we demonstrate the performance of our proposed measure in a variety of cases. For comparison, we also illustrate some results obtained by applying the joint entropy measure proposed in [4] and mutual information [11]. In Section 5.1, we demonstrate the behavior of the aforementioned registration functions, in the case of low-resolution multi-modal registration. In Section 5.2 we evaluate the accuracy of the proposed measures by comparison of the registration results against the true values.

(a)

(b)

Fig. 1. a) Functional MR Image b) Floating Image (artificial).

5.1

Registration Functions

The experiment involves a functional MR image (reference image). The floating image was artificially created by negating the reference image and adding a bias field in the x-direction (see Fig. 1.b). Thus, no transformation is required to align these images. The images have been subsampled by a factor of 8 in each direction. Since there is no random noise, the experimental results characterize the performance of the similarity measures on this pair of images. Figure 2 shows plots of the calculated registration functions. The results suggest that our proposed registration functions are relatively smooth, even with a small number

of samples. However, the conditional entropy with restricted spatial information (Fig. 2.b) has a sharper peak compared to the one with full spatial information (Fig. 2.a). This suggests that conditional entropy with spatial information matched to the spatial variation of the imaging function is less susceptible to noise. The joint entropy measure (Fig. 2.c) achieves its optimum at 1 for translation, which is incorrect. It is known that the mutual information registration function is less smooth with a small number of samples, for instance in multiresolution methods. In the rotation case, conditional entropy with ideal spatial information has the sharpest peak, whereas the joint entropy measure contains many local optima.

8

5.8

1.3 −0.015

1.2 5.6

7

−0.016

5

6

1 Mutual Information

5.2

−0.017

−0.018 Joint Entropy

Conditional Entropy w/ s(x,y) = x

Conditional Entropy w/ s(x,y) = (x,y)

1.1 5.4

5

−0.019

0.9

0.8

−0.02

4

0.7

4.8 −0.021

0.6

3 4.6

4.4 −10

−0.022

−8

−6

−4

−2

0 translation

2

4

6

8

10

2 −10

−8

−6

−4

(a)

−2

0 translation

2

4

6

8

−0.023 −10

10

0.5

−8

−6

−4

(b)

−2

0 translation

2

4

6

8

0.4 −10

10

−6

−4

−2

0 translation

2

4

6

8

10

0.2

0.4

0.6

0.8

1

(d)

8

6

−8

(c) 1.3 −0.014

7

1.2

5.5

−0.016

6

1.1

5

similarity measure

similarity measure

similarity measure

4.5

4

similarity measure

−0.018

5

−0.02

−0.022

1

0.9

4 0.8

3

3.5

3 −1

−0.024

0.7

2

−0.8

−0.6

−0.4

−0.2

0 0.2 rotation angle

(e)

0.4

0.6

0.8

1

1 −1

−0.026

−0.8

−0.6

−0.4

−0.2

0 0.2 rotation angle

(f)

0.4

0.6

0.8

1

−0.028 −1

−0.8

−0.6

−0.4

−0.2

0 0.2 rotation angle

(g)

0.4

0.6

0.8

1

0.6 −1

−0.8

−0.6

−0.4

−0.2

0 rotation angle

(h)

Fig. 2. Registration Functions. From left to right: Cond. Entropy with s(x, y) = (x, y), Cond. Entropy with s(x, y) = x, negative Joint Entropy, Mutual Information. Top row: translation along x-axis, bottom row: rotation around image center.

(a)

(b)

(c)

(d)

Fig. 3. First Experiment: (a) Reference image (structural MR), (b) Floating image. Second Experiment: (c) Reference image, (d) Floating image.

5.2

Accuracy

These experiments use the images in Fig. 3. In the first experiment, the floating image was created by applying a bias field in the x-direction and adding i.i.d Gaussian pixel noise to the reference image. This simulates a mono-modal registration experiment in the presence of intensity variances. In the second experiment, the reference image was negated before going through the same process to create the floating image. This simulates multi-modal registration. Table 1 presents the results for a variety of noise and bias field magnitudes. The table entries are r.m.s. values of the estimated translations over 5 trials. The true translation is 0. First Exp.

Second Exp.

Bias SN R1 C.E.(s=(x,y)) C.E.(s=x) J.E. SN R2 C.E.(s=(x,y)) C.E.(s=x) J.E. 0 4.02 0.32 0.50 0 9.45 0 0 0 0.5 4.0 0.39 0.50 3.35 9.45 0 0 1.16 1.0 4.0 1.80 0.50 6.35 9.45 0.22 0.32 0.74 1.5 4.0 2.41 1.32 5.37 9.45 0.45 0.67 0.84 0 2.0 0.39 0.50 0 4.73 0.22 0 0 0.5 2.0 0.63 0.50 2.91 4.73 0 0 1.16 1.0 2.0 2.25 0.50 6.63 4.73 0.55 0.39 1.45 1.5 2.0 1.00 0.50 4.12 4.73 1.50 0.71 1.63 0 1.34 0.50 0.45 0.22 3.15 0.22 0.55 0.32 0.5 1.34 1.00 1.00 2.02 3.15 0.32 0.50 0.84 1.0 1.34 3.27 1.47 6.90 3.15 1.47 0.71 1.02 1.5 1.34 3.81 2.21 6.45 3.15 1.41 0.32 1.83 Table 1. Estimated translations for the MR (experiment 1) and “baboon” (experiment 2) image pairs with different SN R and bias field magnitudes. The methods are: Conditional entropy (C.E.) with full spatial information (s(x,y)=(x,y)) and ideal spatial information (s(x,y)=x), joint entropy (J.E.) with no spatial information. SN R is defined as the ratio of average pixel intensity to standard deviation of the noise.

These results indicate that our proposed method (for both cases: with ideal and non-ideal spatial information) produces a more accurate registration in the presence of intensity variations compared to the joint entropy measure. The experiments also suggest that conditional entropy with spatial information restricted to the prior knowledge of intensity variations is a robust registration function in the presence of high magnitude noise.

6

Future Work

Several issues of the proposed method need further investigation. The robustness of the conditional entropy with restricted spatial information should be

researched more extensively. It is known that in MR imaging one may have a strong prior knowledge of the RF inhomogeneity. Our experiments indicate that this knowledge can be used to improve registration. Another important topic is the integration of an optimization scheme to the MST based algorithm. The smoothness of the proposed registration function (even with very low-resolution images) is encouraging. This suggests hill-climbing methods can be applied for the optimization. This is currently under investigation.

7

Acknowledgements

The MR images were provided by the Center for the Study of Brain, Mind and Behavior at Princeton University. The fast MST code was obtained from Huzefa Neemuchwala and Alfred Hero at University of Michigan Ann Arbor.

References 1. Hero A.O., Ma B., Michel O., Gorman J.D.: Applications of entropic spanning graphs. IEEE Signal Proc. Mag., vol 19, no 5, (2002) 85–95 2. Hero A.O., Ma B., Michel O., Gorman J.D.: Alpha divergence for classification, indexing and retrieval. Technical Report CSPL-328 Communications and Signal Processing Laboratory, The University of Michigan, (2001) 3. Hero A.O., Michel O.: Estimation of R´enyi Information Divergence via Pruned Minimal Spanning Trees. IEEE Workshop on Higher Order Statistics, Caesaria Israel, (1999) 4. Ma B., Hero A.O., Gorman J., Michel O.: Image registration with minimal spanning tree algorithm. 2000 IEEE International Conf. on Image Processing, Vancouver, (2000) 5. Maes F., Colligon A., Vandermeulen D., Marchal G., Suetens P.: Multimodality image registration by maximization of mutual information. IEEE Trans. Med. Imag., vol. 16, no. 2, (1997) 187–198 6. Neemuchwala H.F., Hero A.O., and Carson P.L.: Image registration using alphaentropy measures and entropic graphs. European Journal on Signal Proc. (2002) 7. Pluim J., Maintz A., Viergever M.: Image registration by maximization of combined mutual information and gradient information. IEEE Trans. Med. Imag., vol. 19, no.8, (2000) 809–814 8. Richard E.B.: Principles and practice of information theory. Reading, MA, AddisonWesley, (1987) 9. Rueckert D., Clarkson M.J., Hill D.L.G., Hawkes D.J.: Non-rigid registration using higher-order mutual information. Medical Imaging: Image Processing, K.M Hanson, Ed. Bellinghamn, WA, SPIE Press, (2000) 10. Studholme C., Hill D.L.G., Hawkes D.J.: An overlap invariant entropy measure of 3D medical image alignment. Pattern Recognition, vol. 32, no. 1, (1999) 71–86 11. Viola P.A., Wells III W.M., Atsumi H., Nakajima S., Kikinis R.: Multi-modal Volume Registration by Maximization of Mutual Information. Medical Image Analysis. vol. 1, no. 1 (1996) 5–51 12. Viola P.A.: Alignment by maximization of mutual information. Ph.D. Thesis, MIT, Cambridge, MA (1995)