Evaluation and Practical Issues of Subpixel Image ... - Fabrice Humblot

Jan 31, 2005 - our reflection on the observation of Fig. 1. The image on the right is the one on the left shifted of dx = 1. 4 pixel and dy = 3. 4 pixel. x y dy dx. 1. 2.
419KB taille 2 téléchargements 190 vues
EVALUATION AND PRACTICAL ISSUES OF SUBPIXEL IMAGE REGISTRATION USING PHASE CORRELATION METHODS Fabrice Humblot, Bertrand Collin,

Ali Mohammad-Djafari

DGA/DCE/CTA/DT/GIP, Centre Technique d’Arcueil, 94114 Arcueil, France. {Fabrice.Humblot, Bertrand.Collin}@etca.fr

LSS (CNRS-SUPELEC-UPS), Ecole Sup´erieure d’Electricit´e, 91192 Gif-sur-Yvette Cedex, France. [email protected]

ABSTRACT

we suppose that our camera is set on a stabilized platform on the standpoint of the camera axis rotation. These two remarks In this study we first propose and compare two frequencyallow us to remove homothety and rotation from the field of domain motion estimation methods using phase correlation possible transformations. So, this particular context enables principle. Then, we focus on the particular case of subpixel us to limit the field of possible transformations to the global translational movements, we evaluate the accuracy of these translatory motion. two registration methods and we give some practical hints for L. Brown presents a survey of image registration techtheir implementation. Finally, we compare the performances niques in [1]. F. Dufaux and F. Moscheni give a good review of these methods for both an ideal theoretical case and for real of motion estimation techniques in [2]. These techniques can images. be divided into five main groups: Keywords. i) Gradient techniques [3] are based on the hypothesis that Subpixel image registration, Phase correlation, Super-resolution. the image luminance is invariant along motion trajectories. They solve an optical flow constraint equation which leads in a dense motion field. 1. INTRODUCTION ii) Pel-recursive techniques [4] can be considered as a subset of gradient techniques in which the spatiotemporal constraint In image processing, precise estimation of the shift between is minimized recursively. two consecutive images is a fundamental step to a great numiii) Block matching techniques [5] are based on the matching ber of other data and image processing such as super-resolution of blocks between two images, which goal is to minimize a (SR) reconstruction. Image registration is used to match two disparity measure. or more images that show the same scene from different viewiv) Frequency-domain techniques [6] are based on the relapoints, from different sensors, or at different times (in a video tionship between transformed coefficients (obtained by the sequence for instance). In order to register two images, a Gabor or the Fourier Transform (FT)) of shifted images. Phase transformation that enables to map each point in both images correlation method is part of this group. must be found. v) Finally, there are methods based on level set [7]. A level The main objective of this work is to perform SR reconset represents a group of pixel having the same intensity in struction in order to gain a better detection on faintly conan image. The goal of these methods is to try to match sectrasted and isolated specks having a size of almost one pixel tions of two images using measures such as surface and main in an image. However, in this communication, we focus only moments of inertia. on the subpixel image registration problem. Actually, it is esWith the purpose of estimating the global movement of sential to know accurately the transformation that enables to translation between two images, one efficient, robust to noise go from one image to another if we want to have a good SR and accurate way uses the property of FT shift. This general reconstruction. The SR reconstruction consists in producing a high-resolution method is called image registration by phase correlation. It enables an accuracy which is underneath the size of a pixel; (HR) image from a set of low-resolution (LR) images. These this is why we speak of subpixel accuracy. Indeed, FT can LR images can be taken from a video sequence. They must be efficiently implemented using the Fast Fourier Transform come out from a moving scene: the movement and non re(FFT). The objectives of this study are to evaluate and to dundant information is what make SR process possible. So, compare two methods that are derived from phase correlation the obtained HR image contains more useful information than principle. We also were motivated by the practical issues of one of the LR images taken in the initial video sequence. their implementation. In this work we assume our video camera will be using the maximum zoom when the operator will decide to use the This paper is a presentation of a more complete work [8]. SR reconstruction, and so, the registration process. Indeed, It is organized as follows. Section 2 describes the theoretiTo appear in Proc. PSIP2005, Jan 31 - Feb 2, 2005, Toulouse, France

for two images differing from one another by a translatory motion, signal power is concentrated in a main peak (xp , yp ), and in two close secondary ones (xs , yp ) with xs = xp ± 1 and (xp , ys ) with ys = yp ± 1. Eventually, we find two solutions which are for ∆x:

cal aspects of the phase correlation method and the two extensions ([9, 10]) that gain subpixel accuracy, which will be evaluated afterwards. Section 3 gives the equation we used to do the registration process in the case of a bilinear approximation. Section 4 presents the evaluation results before the conclusions of this work in section 5.

∆x =

C(xs ,yp )xs +C(xp ,yp )xp C(xs ,yp )+C(xp ,yp )

∆x =

C(xs ,yp )xs −C(xp ,yp )xp C(xs ,yp )−C(xp ,yp )

2. THEORETICAL ASPECTS 2.1. Basic Principle of the Phase Correlation Method

and .

Valid solution is the one in [xp , xs ] in the case xs = xp + 1, and in [xs , xp ] for xs = xp − 1.

A neat way to register two images is to use the phase correlation [11]. It takes benefit from the FT shift theorem. Let the image I2 be the image I1 shifted by (x0 , y0 ), so

2.3. Extension Using Image Gradients

I2 (x, y) = I1 (x − x0 , y − y0 ) .

Based on the previous model, [10] suggests to use two complex gradients images G1 and G2 in place of I1 and I2 in previous algorithm. A complex gradient image G is a complex value image where the real and imaginary parts are horizontal and vertical gradients of the original image:

With Iˆ1 and Iˆ2 being their FT, we have Iˆ2 (u, v) = Iˆ1 (u, v)e−j(u∗x0 +v∗y0 ) . Then, the cross-power spectrum (CPS) equation gives

G = Gh + j × Gv

Iˆ2 (u, v)Iˆ1∗ (u, v) = e−j(u∗x0 +v∗y0 ) . |Iˆ2 (u, v)Iˆ∗ (u, v)|

with

Finally, the inverse FT (IFT) of this exponential function gives the Dirac function centered on (x0 , y0 ).

and

2.2. Extension to Subpixel Registration

Thus the performance of this algorithm is improved because it increases, in some ways, the contrast of the image, more specifically the contours of the objects (see Fig. 5 (c) and (d) to have an illustration of this feature).

Gh (x, y) = I(x + 1, y) − I(x − 1, y)

1

Gv (x, y) = I(x, y + 1) − I(x, y − 1) .

The basics presented in previous part suppose an integer shift (x0 , y0 ) between images. [9] suggests a model based on the hypothesis that images being shifted of a subpixel value have, originally, been moved of an integer value. A downsampling reduced their shift to a subpixel value. Thus, if Iˆ is the FT of I1 before downsampling, the FT expression of I1 and I2 that has been downsampled by integer factors M on the x-axis and N on the y-axis are: Iˆs1 (u, v) =

1 MN

M −1 N −1 X X



u+2πm v+2πn , N M



M −1 N −1 X X



u+2πm v+2πn , N M



m=0 n=0

Iˆs2 (u, v) =

1 MN

m=0 n=0

e−j(

3. REGISTRATION EQUATION ACCOUNTING FOR BORDERS With intent to translate a image, we have to define a registration equation. We choose to express it in the case of a bilinear approximation. Given two images I0 and I1 , a vector (−dx , −dy ) representing the translatory motion between both images, we want to construct I2 as being I1 shifted of (dx , dy ), and so, register I2 on I0 . In the case of a subpixel movement of translation (i.e. (dx , dy ) ∈ ] − 1; 1[2 ), we based our reflection on the observation of Fig. 1. The image on the right is the one on the left shifted of dx = 14 pixel and dy = 34 pixel.

and

u+2πm x0 , v+2πn y0 M N

) .

Using these relations and computing the CPS and going back to the image domain, the Dirac function obtained previously becomes:   C(x, y) = IFT

C(x, y) ≈

y

∗ Iˆs2 (u,v)Iˆs1 (u,v) ∗ (u,v)| ˆ ˆ |Is2 (u,v)Is1

sin(π(M x−x0 )) sin(π(N y−y0 )) π(M x−x0 ) π(N y−y0 )

.

dx

x

(1)

At last, using (1) applied to a set of three points, we can estimate subpixel shift ∆x = xM0 and ∆y = yN0 respectively on x and y-axis, without knowledge of x0 , y0 , M or N . Indeed,

1

2

3

4

5

6

7

8

9

dy

1

2

3

4

5

6

Fig. 1. Example of a subpixel move with a 9 pixel image (bilinear approximation) 2

First, we notice that for each new value of I2 (x, y), four pixels from I1 are used. According to the way of the displacement, the equation is not quite the same. It depends on the sign of dx and dy . This is why we introduce sign(dx ) as being the sign of dx , and sign(dy ) the sign of dy . Thus, for a move that is underneath the pixel size, we obtained the following generalized registration equation: I2 (x, y) = (1 − |dx |)(1 − |dy |)I1 (x, y) + |dx |(1 − |dy |)I1 (x − sign(dx ), y) +

(a)

(1 − |dx |)|dy |I1 (x, y − sign(dy )) + |dx ||dy |I1 (x − sign(dx ), y − sign(dy )) . (2) Next for the side case, we investigated several solutions that we do not detail here. But the one we retained is based on a simple observation. The registration process uses a reference image which is I0 in our example. We found out that the better idea to reconstruct the unknown area is to take it identically in the reference image I0 .

(b)

4. NUMERICAL EVALUATION AND PRACTICAL RESULTS

Fig. 2. Reference image (a), difference between reference and shifted images (b), and between reference and shifted/registered images (c)

4.1. Accurate Case of an Analytical Function With the purpose of correctly evaluating both phase correlation registration methods, we decided to appraise them with two images designed thanks to the numerical function (3). Their size is 256 × 256 pixels2 . We choose two Gaussian functions centred on (73, 88) for the one which has a standard deviation (SD) σ1 = 30, and centred on (173, 183) for the other that has a SD σ2 = 23. I(x, y) =



(x−73)2 +(y−88)2

Method 1 ¯x

(3)

In order to build the second image which is the reference one translated by (dx , dy ), we use (3) to calculate I(x − dx , y − dy ). The reference image can be seen on Fig. 2 (a). We compute all possible subpixel displacements between 1 and 2, i.e. (dx , dy ) ∈ [1; 2[2 , using a 0.05 pixel step to perform this evaluation. So there are 400 cases, and each time, we estimated the value of the translatory move. It is important to specify here one practical issue we met about the numerical image construction. At first, we neglected the integrator function of each charge coupled device (CCD) camera sensors (hereafter, we do not distinguish between a CCD sensor and a pixel). Actually we were taking the value of the function in the middle of each pixel, instead of using the following integration model (see [12] for more details): s[i, j] =

Z

i+0.5 i−0.5

Z

¯y

Value in the middle of pixels 3.36 2.93

«

− 2σ2 1 1 + e 2πσ12 „ « (x−173)2 +(y−183)2 − 2 2σ2 1 e . 2πσ22

(c)

max x max y 15.5

12.4

Integration (q = 3)

2.77 1.55

8.8

4.1

Integration (q = 5)

1.05 0.43

5.0

2.6

Integration (q = 7)

0.86 0.78

4.1

2.4

Integration (q = 9)

0.78 0.92

3.7

2.5

Integration (q = 11)

0.73 1.06

3.5

2.4

Method 2 (gradient) ¯x

¯y

Value in the middle of pixels 3.17 3.00

max x max y 12.9

13.2

Integration (q = 3)

2.47 2.22

6.5

5.5

Integration (q = 5)

0.76 0.64

2.7

1.8

Integration (q = 7)

0.57 0.47

1.6

1.4

Integration (q = 9)

0.52 0.45

1.7

1.3

Integration (q = 11)

0.50 0.48

1.8

1.5

Table 1. Mean value and maximum of the errors for both methods (on 400 estimations, expressed in %)

j+0.5

h0 (u, v) du dv j−0.5

3

where s[i, j] is the percentage of light intensity at pixel (i, j), and h0 is the optical point-spread function. To consider this aspect, we introduced the size of a pixel side. Subdividing this size by an odd number q (3, 5, 7, 9 and 11), we get a regular grid with q 2 squares. Then, we only need to compute the mean of these q 2 values to obtain an accurate value of the function at pixel (i, j). The normalized pixel values were finally included between 0 and 55. The difference between this image built with the function value in the middle of each pixel, and the integration model (with q = 11) goes from −0.65 to 0.81 − i.e. a 2.65 % variation of the values − and the mean value is 3.89 × 10−4 . Table 1 sums up results obtained. ¯x and max x are the mean error and the maximum error committed on dx estimations.

(a)

We notice a ratio of 6 between errors depending on the way the image was built, and in favor of the case of integration vs. middle point value. We also observe improving results when increasing q. Finally, we remark that the method using gradients of the image has, almost always, a more accurate estimation than the classical method using simple images. With regards to the best case, the maximum error committed is 2.4 % with the first method, and 1.3 % with the second one. If we look at an example which is the case of the translatory move (1.35, 1.85), and for images obtained by integration with (q = 11), the first method computes a move of (1.342, 1.857) and the second gives (1.340, 1.847). These are errors of 0.8 % and 0.7 % for the first method, and 1.0 % and 0.3 % for the second.

(b)

Fig. 3 (a) shows the perfect grid of the 400 move vectors values used to evaluate the registration methods. Each move vector is representative of a translation such as (dx , dy ) ∈ [1; 2[2 using a 0.05 pixel step. The following grids (b) and (c) depict the obtained results when computing the shift. In figure (b) the shifted image is build using the middle-point value of function (3) and we estimate the translation using the gradient method. Using an integration model (with q = 11) leads to the far better grid of figure (c). The distorsion of figure (b) clearly outlines the importance of the integration model. Finally, Fig. 4 shows the error committed by gradient method, during dx estimation, in the case of construction by integration with q = 11. To each move of dx corresponds 20 estimated values. And for each estimation, we see the difference between real and computed values. This figure allows to see a dependency of the error ∆x = |dˆx − dx | as a function of the real dx value which is due to the non-linear relation between the observations and the translation parameter dx . This error is greater when the subpixel move dx is between 0.35 and 0.45 modulo one pixel.

(c) Fig. 3. Initial constellation (a), constellation obtained after movement estimation (b) and (c) We define P P

i j α= P P i

To end with this first simulations, Fig. 2 (b) and (c) give an idea about registration quality. Initial translatory motion between both image was (1.350, 1.850), the second method leads to (1.340, 1.847), and we used equation (2) to register the shifted image. Let ∆ be the difference image (shown in Fig. 2 (b)) between the reference image and the moved image.

∆2 (i, j)

2 j I0 (i, j)

as being a relative error measure. We find that: ∆ ∈ [−3.36, 3.36], ∆mean = 2.04 × 10−3 , ∆SD = 0.75 and α = 4.49 × 10−3 . The same estimations computed on the difference (shown in Fig. 2 (c)) between the reference image and the registered 4

(a)

(b)

Fig. 4. Error on dx estimation shifted image give: ∆ ∈ [−0.27, 0.82], ∆mean = 1.36×10−4, ∆SD = 5.2×10−2 and α = 2.15 × 10−5 . 4.2. Real Unknown Data (c)

The second simulation presented here allows us to show another advantage of the method using image gradients. We took the first and the last image of a 350 image video sequence showing clouds in the sky. The camera was fixed, and the size of the image is 156 × 205 pixels2 . We did not know the real shift between both images, but we could see clouds had moved of a few pixels. As said in the introduction, FT can be efficiently implemented using the FFT algorithm. One of the consequences is that images are truncated to have sizes being powers of 2. So, before any computation, these sky images are resized to 128 × 128 pixels. Fig. 5 (a) and (b) show both images used in this simulation, (c) and (d) are the norms of their complex gradient images. In this case, the first method detects the movement of translation (−0.07, −0.12), whereas the method using image gradients estimates (9.51, 2.48). The observable results of Fig. 6 let us conclude that the real shift must be very close to (9.51, 2.48). In this figure (a) is the reference image, (b) is the second image when registered, (c) is the difference between the reference and unregistered second image, and finally (d) is the difference between the reference and registered second image. For image (c) we found that ∆mean = 0.82, ∆SD = 15.5 and α = 3.26 × 10−2 . Image (d) gives ∆mean = 0.32, ∆SD = 3.6 and α = 1.79 × 10−3 .

(d) st

th

Fig. 5. (a) and (b) are the 1 and the 350 image of a video sequence showing the sky, (c) and (d) are the norm of their gradients (not at the same scale)

5. CONCLUSION

(a)

(b)

(c)

(d)

Fig. 6. 1st and reference image (a), 350th image registered on reference image (b), initial difference (c) and difference after registration (d)

In this work, we first proposed and evaluated two frequencydomain subpixel motion estimation methods using phase correlation principle. We then focused on the case of a translational shift, and established a registration equation in the case of a bilinear approximation (3), and proposed a better way to deal with the unknown area of an image registration process. 5

The conclusion is to take this area identically in the reference image. In a first numerical experiment with a known analytical function as the reference image (Subsection 4.1) we evaluated the subpixel accuracy of these methods. In particular, we showed that with a good model, the error on the result is less than 4 % using the first method, and less than 2 % using the gradient method. In the second numerical experiment (Subsection 4.2) we showed the advantage brought by the use of image gradients. Indeed, gradient method works also better with images that have as well faint and badly drawn areas for which the first method does not function. Finally, we implemented the proposed methods using the FFT which enables to have a fast process. It needs less than one second to register two images of 220 × 320 pixels2 , with an Intel Celeron processor at 500 MHz. At last, we used these methods with the intention of estimating a global translatory motion between two similar images. Nevertheless, it is obvious that they also can be used for a local purpose, on sections of the image. We can apply the process on small pieces of the images, choosing preferably side lengths that are power of 2 to be compatible with FFT.

[4] H. Musmann, P. Pirsch, and H. Grallert, “Advances in Picture Coding,” in Proc. IEEE, April 1985, vol. PROC73, pp. 523–548. [5] J. Jain and A. Jain, “Displacement Measurement and its Application in Interframe Image Coding,” in IEEE Trans. Commun., December 1981, vol. COM-29, pp. 1799–1808. [6] D. Fleet and A. Jepson, “Computation of Component Image Velocity from Local Phase Information,” Int. J. Computer Vision, vol. 5, pp. 77–104, 1990. [7] F. Cao and B. Collin, “Region Tracking Using Level Set Registration,” in Aerosense, Acquisition, Tracking, and Pointing. SPIE, 2001. [8] F. Humblot, “Recalage sub-pixellique des images par la m´ethode de corr´elation de phase,” Tech. Rep., DGA, http://fabricehumblot.free.fr/publis/2004 02 RapTech Recalage.pdf,

February 2004. [9] H. Foroosh, J. Zerubia, and M. Berthod, “Extension of Phase Correlation to Subpixel Registration,” IEEE Trans. Image Processing, vol. 11, no. 3, pp. 188–200, 2002.

6. REFERENCES

[10] V. Argyriou and T. Vlachos, “Sub-Pixel Motion Estimation Using Gradient Cross-Correlation,” in The 7th International Symposium on Signal Processing and its Applications (ISSPA), Paris, 1–4 July 2003.

[1] Lisa Gottesfeld Brown, “A Survey of Image Registration Techniques,” ACM Computing Survey, vol. 24, no. 4, pp. 325–376, December 1992.

[11] C. D. Kuglin and D. C. Hines, “The Phase Correlation Image Alignment Method,” in Proc. IEEE 1975 Int. Conf. Cybernetics and Society, September 1975, pp. 163–165.

[2] F. Dufaux and F. Moscheni, “Motion Estimation Techniques for Digital TV: A Review and a New Contribution,” Proceedings of the IEEE, vol. 83, no. 6, pp. 858– 876, June 1995.

[12] V. Samson, F. Champagnat, and J. F. Giovanelli, “Point Target Detection and Subpixel Position Estimation in Optical Imagery,” Applied Optics, vol. 43, no. 2, pp. 257–263, January 2004.

[3] J. Barron, D. Fleet, and S. Beauchemin, “Performance of Optical Flow Techniques,” Int. J. Computer Vision, vol. 12, no. 1, pp. 43–77, 1994. 6