Fundamental matrix estimation revisited through a global 3D

new method to estimate the fundamental matrix that takes advantage of ... So a camera cannot move rela- tively to ...... Next appears the tricky problem of point ...
377KB taille 2 téléchargements 245 vues
Proceedings of Acivs 2004 (Advanced Concepts for Intelligent Vision Systems), Brussels, Belgium, Aug. 31-Sept. 3, 2004

FUNDAMENTAL MATRIX ESTIMATION REVISITED THROUGH A GLOBAL 3D RECONSTRUCTION FRAMEWORK 1

Benjamin Albouy, 1Sylvie Treuillet, 1Yves Lucas and 2Dimitar Birov 1

[email protected] University of Orleans, Vision and Robotics Laboratory, 63 av. De Lattre de Tassigny, Bourges, France 2 University of Sofia, Faculty of Mathematics and Computer Science, Bulgaria 1

Two kinds of stereovision systems exist causing two ways of calibration. In the first one, cameras are fixed rigidly to each other. So a camera cannot move relatively to another. The simplest way to calibrate such a system is to use a test pattern that must be the more accurate as possible to provide reliable reconstruction. Moreover, this pattern needs to be suited to the observed scene [3]. The main drawback of these systems and methods is the lack of flexibility due to specific equipment. A second approach consists in using one and only camera and taking several views by moving in space. Practically, the last calibration method is no more possible. As the camera positions are unknown, a selfcalibration step is necessary. The parameters of cameras are then computed only from correspondences between the different images of the sequence. The first step of all self-calibration methods is to compute the epipolar geometry. This concept was firstly published by LonguetHiggins in 1981 [4]. It has initiated all the 3D reconstruction methods. The epipolar geometry is defined by the fundamental matrix, which has been widely studied since Longuet-Higgins paper. For instance, Hartley published a method to find extrinsic parameters of cameras in 1992 [5]. And a lot of works based on Kruppa equations succeed in computing the intrinsic parameters of cameras (Faugeras et al [6], Lourakis and Deriche [7], Ma et al [8]). Self-calibration is not possible in some singular camera configurations, as shown by Sturm [9].

ABSTRACT This work presents a whole chain of 3D world reconstruction starting with point correspondences on two uncalibrated images taken from very different points of view. Five popular algorithms to estimate the fundamental matrix, selected from three groups (linear, iterative and robust) have been tested. After this simple selfcalibration step, a triangulation algorithm is applied to reconstruct the scene. The evaluation of the estimation methods is realized by measuring the error between the reconstructed scene and the synthetic one. Finally, a new method to estimate the fundamental matrix that takes advantage of linear and robust methods is presented.

1.

INTRODUCTION

The apprehension of the third dimension from 2D images is a difficult problem in computer vision. It is based on several images taken from different points of view to allow the computation of 3D positions thanks to a triangulation algorithm. Moreover, the system needs to be calibrated to reconstruct the world in a Euclidean space. A system is calibrated when the pose and the intrinsic parameters of the different cameras are known. When the system is uncalibrated, the world can only be reconstructed in a projective space, that is to say that there exists a projective transformation between the reconstructed world and the real one. Though it is sufficient in robotic context, it could not provide a realistic user display. Faugeras has defined a stratification of the space [1] according to the degree of calibration of the system. This concept has been widely used by others scientists (Pollefeys [2] for example). With increasing levels of calibration, the reconstructed world can be expressed in projective, affine, metric and finally Euclidean stratums. So the aim of calibration is to reach a higher stratum from a lower one.

The aim of this work is to propose a user framework to select the method best suited for a 3D reconstruction, starting with point correspondences on two uncalibrated images, when two very different points of view are considered for best triangulation accuracy. The size of the observed object is such that it just fits the image plane. These two assumptions are easily verified when handling a digital camera, as the centering is done manually. In this context, a self-calibration algorithm is required and since it derives from fundamental matrix, the estimation of this matrix is the critical step. In a recent work, Armangué and Salvi [10] surveyed the most

185

Proceedings of Acivs 2004 (Advanced Concepts for Intelligent Vision Systems), Brussels, Belgium, Aug. 31-Sept. 3, 2004

camera 1 (point O). The x and y axis are chosen parallel to image plane, x-axis is from left to right and y-axis is from up to down according to the image. The third zaxis is directed to the image plane and is perpendicular to it. It is also the optical axis of the camera. P and P’ matrices contain all information about the cameras. They can be decomposed as follows:

M 3

u

I’

v l I

O

camera 1 y

u’

m z x

m’

v’

l’

P

z’ x’ e

e’

O’ camera 2 y’

popular techniques to achieve it. They distinguish three kinds of methods: linear, iterative and robust ones. Each method has been evaluated by measuring the average distance between projected points and associated epipolar line. But this measurement does not reflect the accuracy of the final 3D world reconstruction. In our work, five techniques for estimating the fundamental matrix are investigated, which are representative of the three classes pointed out by Armangué and Salvi. They are analyzed according to the final reconstruction error brought by the fundamental matrix. To discriminate between techniques, robustness to noise added on projected points and to bad correspondences between images will be tested on synthetic data. This paper is divided as follows. After a short description of epipolar geometry, the 3D reconstruction chain and fundamental matrix estimation methods will be presented before showing the experimental results. Finally, a discussion of these results ends this paper.

§D u ¨ A ¨ 0 ¨ ¨ 0 ©

EPIPOLAR GEOMETRY

P ˜ M and s c ˜ m c

Pc ˜ M

(2)

D u ˜ cot T Dv sin T 0

cu · ¸ cv ¸ ¸ 1 ¸¹

(3)

where D u f ˜ k u and D v f ˜ k v , f is the focal length expressed in millimeters and ku and kv are respectively the sensor resolutions in horizontal and vertical directions. The values ku and kv are expressed in pixel/mm and T is the skew angle between the two axes u and v. Finally, cu and cv are the pixel coordinates of the principal point in the images. It indicates the intersection of the optical axis of the camera with the image plane. The rigid transformation described by T is the translation of vector t from camera 2 to the origin followed by the rotation R that aligns camera 1 and 2 axes. So the matrix T is computed thanks to:

The epipolar geometry is directly linked to the positions of the camera viewpoints. By using the classical pinhole model, a point M of the 3D space is projected on m with the first camera and on m’ with the second (Figure 1). Cameras optical centers are placed on O and O’, and their respective image planes are I and I’. By using homogeneous coordinates, M is defined by the vector (x, y, z, 1)T where (x, y, z) is its position in Euclidean space. Its projections on camera planes are computed as follows: s˜m

Ac ˜ Id 3u3 0 ˜ T c

where A and A’ are 3u3 matrices that encapsulate intrinsic parameters of camera 1 and 2, which transform world coordinates (millimeters for example in (x, y) coordinates) in image coordinates (pixels in (u, v) coordinates). T and T’ represent extrinsic parameters of the two cameras and are the 4u4 rigid transformation matrices of the world that places the cameras onto the origin of the space. Since the camera 1 is the origin of the world, T is the 4u4-identity matrix. From this point, T’ is noted T and is the transformation from camera 2 to camera 1. The matrix Id3u3 is the 3u3-identity matrix. For each camera, A-matrix can be written as follows:

Figure 1. Epipolar geometry.

2.

A ˜ Id 3u3 0 ˜ T and P c

T

(1)

where s and s’ are two scale factors such as m = (u, v, 1)T and m’ = (u’, v’, 1)T, with u and v referring to image plane coordinates. P and P’ are the 3u4 projection matrices associated with camera 1 and 2. As a way of simplification, the origin of the world is chosen on

186

§ R 0 · § Id 3u3 t · ¸ ¨¨ ¸¸ ˜ ¨¨ 1¸¹ © 0 1¹ © 0

§ R R ˜t· ¨¨ ¸ 1 ¸¹ ©0

(4)

The three points M, O and O’ define a plane 3 that intersects the two image planes I and I’ on the epipolar lines l and l’. As the two points O and O’ are fixed, all the epipolar lines are intersecting in a same point on each image plane, which are called epipoles. These points e and e’ are also the intersections of the line (OO’) with the two image planes. All this epipolar geometry is resumed by the so-called fundamental matrix that is defined by the following equation:

Proceedings of Acivs 2004 (Advanced Concepts for Intelligent Vision Systems), Brussels, Belgium, Aug. 31-Sept. 3, 2004

Point correspondences m l m’

E

F Intrinsic parameters estimation A1, A2, F

A1, A2, R, t Triangulation M Figure 2. 3D reconstruction chain. (5)

The rank of this matrix is 2. The elements of the vector u’ = F.m are the coefficients of the equation of l’ and reciprocally u = m’T.F is associated to l. So equation (5) gives access to the distance from m to l and from m’ to l’. This distance should to be null with respect to the epipolar geometry. The fundamental matrix relating camera frames can be written as follows: F

A2

T

˜ R ˜ >t @u ˜ A1

1

(6)

with the following notations:

>t @u

A

T

§ 0 ¨ ¨ t3 ¨ t © 2

A

1 T

t2 · ¸ t1 ¸ 0 ¸¹

t3 0 t1

AT

1

(7)

(8)

The equation (6) verifies the constraint of the equation (5). This can be easily demonstrated by injecting equation (1), (2) and (6) into equation (5). 3.

(9)

3.1. Estimation of the fundamental matrix

Extrinsic parameters estimation

0

R ˜ >t @u

Then extrinsic parameters are computed using a decomposition of the essential matrix. Once all these parameters have been found, a triangulation method is applied to reconstruct the 3D scene from matched points in images.

Fundamental matrix estimation

m cT ˜ F ˜ m

T

A2 ˜ F ˜ A1

According to the survey of Armangué and Salvi [10], a lot of methods exist to estimate the fundamental matrix from point correspondences. They have tested 19 methods among the most popular ones. The differences between methods are mainly: criterion to minimize, weighting of points, number of points, constraints on F and the image coordinate system. In [10] a classification of methods in three categories is proposed, namely: linear, iterative and robust methods. The linear ones are the simplest and direct. Iterative methods are more accurate by weighting the points, and robust methods detect outliers in the set of points. In this work, we have selected five methods representing the three groups. The group of linear methods is represented in this work by the classical 8-point algorithm from Longuet-Higgins adapted to more points by Hartley [11], the orthogonal least square technique [12] and the adaptation of Bookstein method from Torr [13]. The group of iterative methods is represented by the iterative re-weighted least squares method proposed by Torr and Murray [12]. Finally, the robust MAPSAC method of Torr and Murray [14] is tested. Two key points are common to all tested algorithms. Firstly, a normalization of image coordinates, proposed by Hartley [11] is applied. Hartley has shown that normalization may enforce the accuracy of fundamental matrix estimation algorithms. Normalization consists in translating all coordinates so that the centroid of the points coincides with the origin of image, then scaling all coordinates so that the mean distance of points from the origin is equal to 2 . Thanks to normalization, the resulting fundamental matrix doesn’t rely on system coordinates used in images. Secondly, at the end of each algorithm, the fundamental matrix F is replaced by its closest rank-2 matrix by using Singular Value Decomposition (SVD) as done by Hartley [11]. This method is as follows: let F = UDVT be the SVD of F. D is then a 3u3 diagonal matrix of the eigen values of F. D can so be noted as D = diag(r, s, t). The matrix F’ = UD’VT where D’ = diag(r, s, 0) is the closest rank-2 matrix of F under the Frobenius norm. If F is not a rank-2 matrix, epipolar lines don’t intersect in one point, and so epipoles are undetermined. This must be avoided in self-calibration algorithms.

THE RECONSTRUCTION CHAIN

The reconstruction chain is deduced from the last section and can be divided in four steps as shown on Figure 2. From point correspondences, the fundamental matrix is first estimated. Then camera parameters are extracted, starting with intrinsic ones thanks to a self-calibration algorithm. From the A-matrices, it is possible to compute the essential matrix, which is the calibrated version of the fundamental matrix and is expressed as follows:

187

Proceedings of Acivs 2004 (Advanced Concepts for Intelligent Vision Systems), Brussels, Belgium, Aug. 31-Sept. 3, 2004

Excepted for the 8-point algorithm, all the tested techniques come from the structure and motion Matlab toolkit developed by Torr, which is freely available on his web site (http://wwwcms.brookes.ac.uk/~philiptorr/). The 8-point algorithm has been implemented thanks to the information provided in the Hartley paper [11].

It has to be noticed that since F is defined up to a scale factor, the essential matrix E computed from F is also defined up to a scale factor. So determining the norm of t1 and t2 doesn’t make sense since an overall scale factor must be determined for Euclidean reconstruction. It may be easily determined by knowing one distance in the real world. We can observe that the overall scale factor is equal to the real distance between the two cameras as the module of t is equal to one. To choose the right (R, t) couple, the 3D scene must be reconstructed for each one using the method explained in the next section. The couple that gives the greatest number of points in front of the two cameras and such that reprojected points be into the images is selected. To determine the position of a 3D point compared to the camera plane, one simply computes the Euclidean scalar product between the vector from camera to 3D point and the director vector of the camera. If the result is positive, the point is in front of the camera, otherwise it is behind.

3.2. Estimation of the camera intrinsic parameters Intrinsic parameters are estimated using a selfcalibration method. Sturm has shown that selfcalibration is only possible under certain conditions and has pointed out all cases that could bring to an insolvable problem [9]. Moreover, to simplify and to make possible the existence of a unique solution, we have to make some assumptions on the intrinsic parameters. First of all, we choose to use the same camera model with the same focal length for all images. So all virtual cameras corresponding to each image have the same matrix A as in equation (3). Then considering cameras with square pixels, it derives that T S 2 and that ku = kv, thus Du = Dv = D. Finally, one can suppose that the principal point is the center of images, that is to say that cu and cv are known. After these assumptions matrix A becomes:

A

§D 0 ¨ ¨0 D ¨0 0 ©

cu · ¸ cv ¸ 1 ¸¹

3.4. Triangulation

(10)

The problem of estimating intrinsic parameters is now reduced to D determination. It is possible to compute D from the fundamental matrix thanks to the Kruppa equations ([7], [8] and [15]). All these methods are very similar, so we choose to follow the one explained by Sturm [15], which was the simplest to implement and matches the best our needs. According to Sturm’s paper, this method is robust to noise.

Once all parameters are known, the stereovision system is calibrated. So it is possible to reconstruct the 3D scene using a classical triangulation method. To retrieve the position of a 3D point M, the image points m and m’ are backprojected along two lines issued from O and O’. These two lines intersect in M. Unfortunately, the two lines don’t intersect in noisy images. Setting M as the middle of the shorter segment that connects the two lines is the best choice because the self-calibration process enables a metric reconstruction [16]. As the fundamental matrix is only defined up to a scale factor, it is necessary to scale the reconstructed data to fit to the real world. To estimate this scale factor, one distance must be known on the real object to compute the ratio between the reconstructed distance and the real one.

3.3. Estimation of extrinsic parameters 4.

Once the fundamental matrix and the intrinsic parameters are known, we can compute the essential matrix according to equation (9). The goal is now to decompose E to obtain R and t, which respectively represent the rotation and the translation between the two cameras with camera 1 as origin. Hartley proposed a method to decompose E [5], but his decomposition is not unique. There are two solutions for the rotation matrices R1 and R2 and also for the translation vectors t1 and t2, thus giving four possible decompositions. To select the valid solution, the 3D scene must be reconstructed using each decomposition to identify the realistic one.

EXPERIMENTAL RESULTS

This section presents the results obtained with the five tested methods for 3D reconstruction evaluation. The experimentation is done on synthetic data. The 3D scene is a set of 100 randomly generated points in a box. The box is defined for –5 < x < 5, -5 < y < 5 and 10 < z < 20 using arbitrary space coordinates. Figure 3 shows a perspective view of it with the position of the two cameras.

188

Proceedings of Acivs 2004 (Advanced Concepts for Intelligent Vision Systems), Brussels, Belgium, Aug. 31-Sept. 3, 2004

20 15 10 5 -5

Bounding box

0

0

0

50

50

100

100

150

150

200

200

250

250

300

300

350

350

400

400

450

450 0

0

z Cam 1 x y

5 -10

-5

0

10

15

20

25

400

500

600

0

100

200

300

400

500

600

(b)

4.1. Robustness against noise Once the points have been projected on the two images, gaussian noise is added on their positions. The standard deviation of gaussian noise is varying from 0 to 2 pixels, which corresponds to a maximal relative displacement of 6 pixels for 98 % of points. To obtain statistical evidence, the measurements are averaged on 100 realizations of randomly generated 3D points. Robustness is evaluated using two indexes, which are the relative error on focal length and the average distance between reconstructed points and real ones (Figure 5). The error on the focal length is expressed in percent using the following equation:

The first camera is at the origin of space coordinates and is viewing along increasing z. The second camera is at position (20, 0, 5) and is looking toward the point (0, 0, 15), which is the center of the box. The up direction of the second camera is the same as the first one (y axis). The camera resolution is standard 640u480 pixels and the focal length of both cameras is D = 600 in pixels. This focal length is close to the suited focal length, according to image resolutions. All this organization is coherent with the use of a handled camera to take picture. Figure 4 shows the projection of the bounding box on each camera. These experimental data lead to the following matrices, according to definitions given in section 2:

err%D

A

§ 600 0 320 · ¸ ¨ ¨ 0 600 240 ¸ ¨ 0 0 1 ¸¹ ©

(11)

R

§ 0.44721 0 0.89443 · ¸ ¨ 0 1 0 ¸ ¨ ¨ 0.89443 0 0.44721¸ ¹ ©

(12)

t

300

Figure 4. Projection of the bounding box on each camera. Projection on camera 1 (a) and projection on camera 2 (b).

Figure 3. Representation of the 3D scene.

§ 20 · ¸ ¨ ¨ 0 ¸. ¨ 5¸ ¹ ©

200

(a)

Cam 2 z x y 5

100

(13)

Dˆ D D

(14)

where Dˆ is the estimation of D . To show meaningful errors, the average distance between reconstructed and ideal points is reported without dimension. Distances are divided by the greatest distance of the real object. In our experiments, the object is the bounding box, so the greatest distance is the diagonal, which is 10 3 . These errors are thus measured as a percentage of the greatest distance included in the object to reconstruct. The expression of the error on one point is thus as follows: err p



p

(15)

10 3

where pˆ is the reconstructed point of the real point p . As it was shown in section 3, a scale factor must be applied on reconstructed point coordinates to enable some comparisons with real world. As the errors measured are distances between real world and reconstructed one, this scale factor is needed. As remarked in section 3.3, the scale factor is the real distance between the two cameras. It is thus equal to 20.62, which is the norm of t.

Table 1 shows the computation times of the algorithms for 100-point correspondences. All the algorithms have been implemented in Matlab 6.5 running on a Pentium IV 3GHz under Windows XP.

189

Proceedings of Acivs 2004 (Advanced Concepts for Intelligent Vision Systems), Brussels, Belgium, Aug. 31-Sept. 3, 2004

9

8 7

Alpha error in %

6

Average distance error in % of max object size

8-point Orthogonal LS Bookstein Iterative MAPSAC

5 4 3 2 1 0

0

0.2

0.4

0.6

0.8

1 sigma

1.2

1.4

1.6

1.8

7 6 5 4 3 2 1 0

2

(a)

8-point Orthogonal LS Bookstein Iterative MAPSAC

8

0

0.2

0.4

0.6

0.8

1 sigma

1.2

1.4

1.6

1.8

2

(b)

Figure 5. Robustness against noise. (a) error measured on D (b) average distances between reconstructed points and ideal ones.

Points on image 1 A lgorithm 8-point C om putation 2 time in m s

O rth. LS Bookstein Iterative M APS A C 1

1.2

5.5

505

Table 1. Computation time of the algorithm in ms

1 .................. N-n N-n+1 N-n+2 ............... N-1

N

1 .................. N-n N-n+2 ................... N-1 N Corresponding points on image 2

N-n+1

Table 2. Method to generate n bad matches.

It appears from these experiments that the 8-point and orthogonal least square algorithms are the best since errors are always minimal, so as computing time. It should be noted that this conclusion is only valid when no bad match has occurred during the matching process.

expect these results since the aim of robust methods is to deal with bad correspondences. Since MAPSAC has been proven to be robust to bad correspondences but not really to noise on point positions, and that linear methods are robust to noise but not to bad correspondences, we suggest to look for an algorithm that takes advantages of both, but first we need to detail MAPSAC itself.

4.2. Robustness against bad correspondences Practically, the matching process introduces some bad matches. It is thus necessary to investigate the behavior of algorithms in such conditions. For this second experiment, no gaussian noise has been added to data and the variable is now the number of bad matches. To generate n bad correspondences on the N couples of projected points, the n last projected points on image 2 are reordered by a circular permutation as represented in Table 2.

4.3. Deeper into MAPSAC algorithm At each iteration of the MAPSAC algorithm, the fundamental matrix is estimated using a simpler algorithm for fundamental matrix estimation called 7-point algorithm. This algorithm can output up to three solutions. MAPSAC selects the one that minimizes distances between points and epipolar lines, but it is not always the most realistic one. The first idea to solve this problem has been to replace the 7-point algorithm with the 8point algorithm. But as Torr explains in the documentation of its Matlab toolbox, the more points are selected, the more the set of points can contain bad correspondences. Moreover, Luong et al. [17] shown that linear algorithms give biased solution. The second idea is to use MAPSAC as a classifying algorithm. As MAPSAC is able to point out outliers of correspondences, the great number of inliers can be used to compute the fundamental matrix with a linear method. The more there are points, the more robust to

As before, 100 realizations of randomly generated points have been tested to obtain averaged results on measurements. The same errors are measured but this time depending on the number of bad matches. One bad match corresponds to no permutation. The results are reported on Figure 6. These results show that the robust method (MAPSAC) provide the best fundamental matrix to estimate focal length and to reconstruct 3D data, since error is always zero. Nonrobust algorithms are not able to deal with bad matches. As soon as there are bad correspondences, errors become very important and may exceed 100 %. One could

190

Proceedings of Acivs 2004 (Advanced Concepts for Intelligent Vision Systems), Brussels, Belgium, Aug. 31-Sept. 3, 2004

100

100 8-point Orthogonal LS Bookstein Iterative MAPSAC

80

Alpha error in %

70 60 50 40 30 20 10 0

1

2

3

8-point Orthogonal LS Bookstein Iterative MAPSAC

90 Average distance error in % of max object size

90

4 5 6 7 Number of bad matches

8

9

80 70 60 50 40 30 20 10 0

10

1

2

3

4 5 6 7 Number of bad matches

(a)

8

9

10

(b)

Figure 6. Robustness against bad matches. (a) error measured on D (b) average distances between reconstructed points and real ones. 5

5 MAPSAC MAPSAC + 8-point Noise sigma = 1

4

Alpha error in %

3.5 3 2.5 2 1.5 1 0.5 0

1

2

3

MAPSAC MAPSAC + 8-point

4.5 Average distance error in % of max object size

4.5

4 5 6 7 Number of bad matches

8

9

Noise sigma = 1 4 3.5 3 2.5 2 1.5 1 0.5 0

10

1

2

3

4 5 6 7 Number of bad matches

(a)

8

9

10

(b)

Figure 7. Comparison of standard MAPSAC and MAPSAC followed with 8-point algorithm. The standard deviation of the Gaussian noise added on positions of projected points is 1. (a) error measured on D (b) average distances between reconstructed points and ideal ones.

noise the linear methods are. Also, the fundamental matrix won’t be still computed with only 7 points as in standard MAPSAC, but with all points marked as inliers. So the result will be more accurate. This supposition is corroborated by our experiments (Figure 7). MAPSAC algorithm has been applied, followed by the 8-point algorithm applied only on inliers. So, the errors are computed only with points that were marked as inliers. Gaussian noise with standard deviation equal to 1 has also been added on positions of the projected points. These results show a stable error for all numbers of bad correspondences, due to MAPSAC algorithm. Indeed, its role is to put away bad correspondences, so that the normalized 8-point can compute fundamental matrix with valid data. The curves also demonstrate that the errors of the final algorithm are the same that the ones of the 8-point algorithm without outliers (Figure 5) and

for sigma equal to 1. So the goal to combine advantages of MAPSAC and linear algorithm is reached. The error of MAPSAC is divided by two using this method. The resulting algorithm have a better accuracy than MAPSAC itself for a low cost of computation time, since computation time of 8-point algorithm may be neglected in front of MAPSAC one (see Table 1). 5.

CONCLUSION

We have surveyed five representative fundamental matrix estimation methods in a global 3D world reconstruction framework. These methods have been confronted to noise on position of points and to bad correspondences in the data set. It appears that 8-point method is the most robust to noisy image points, and that MAPSAC is the most robust to bad correspondences. So, the two meth-

191

Proceedings of Acivs 2004 (Advanced Concepts for Intelligent Vision Systems), Brussels, Belgium, Aug. 31-Sept. 3, 2004

[11] R. Hartley, In defense of the 8-points algorithm, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 19, No. 6, p. 580-593 June 1997 [12] P. Torr, D. Murray, The development and comparison of robust methods for estimating the fundamental matrix, International Journal of Computer Vision, Vol. 24 No. 3, p. 271-300, 1997 [13] P. Torr, A. Fitzgibbon, Invariant fitting of two view geometry or “in defiance of the 8 point algorithm”, Proceedings British Machine Vision Conference, p. 83-92, 2003 [14] P. Torr, Bayesian Model Estimation and Selection for Epipolar Geometry and Generic Manifold Fitting, International Journal of Computer Vision, Vol. 50, No. 1, p. 3561, 2002 [15] P. Sturm, On focal length calibration from two views, Proceedings of the conference on computer vision and pattern recognition, Vol. 2, p. 145-150, December 2001. [16] R. Hartley, P. Sturm, Triangulation, CVIU, Vol. 68, No 2, p. 146-157, November 1997 [17] Q.-T. Luong, R. Deriche, O. Faugeras, T. Papadopoulo, On determining the fundamental matrix : analysis of different methods and experimental results, INRIA research report, RR N°1894, April 1993.

ods have been mixed: MAPSAC is used to classify good and bad correspondences, whereas 8-point algorithm is used to compute the fundamental matrix from resulting inliers. However, some parameters have been arbitrarily fixed in this work, such as positions of the cameras and intrinsic parameters. These parameters normally don’t interfere on fundamental matrix estimation but only on selfcalibration algorithm. They have been chosen to provide optimal conditions for self-calibration. The influence of these parameters will be investigated in future works. The method presented here has been tested only on synthetic data. It will be necessary to test it on real images. But it raises questions that have to be answered. Firstly, the number of points detected on images is not necessarily equal to 100, so the influence of the number of points should be investigated. Next appears the tricky problem of point matching since images are supposed to be taken from very different points of view. Maybe some targets should be used to help good correspondences. 6.

REFERENCES

[1] O. Faugeras, Stratification of 3-D vision: projective, affine, and metric representations, Journal of the Optical Society of America A, 12(3), p. 465-484, March 1995 [2] M. Pollefeys, Tutorial on 3D Modeling from Images, ECCV 2000, 26 June 2000, Dublin, Ireland [3] J.-M. Lavest, M. Viala, M. Dhome, Quelle précision pour une mire d'étalonnage ? Traitement du signal, Vol. 16, n. 3, 1999. [4] H. C. Longuet-Higgins, A computer algorithm for reconstructing a scene from two projections, Nature 293 (1981) 133-135. [5] R. Hartley, Estimation of Relative Camera Positions for Uncalibrated Cameras, ECCV'92, Lecture notes in Computer Science, Vol. 588. Ed G. Sandini, Springer-Verlag, 1992, p. 579-587 [6] O. Faugeras, Q.-T. Luong and S.J. Maybank, Camera Self-Calibration : Theory and Experiments, ECCV'92, Lecture notes in Computer Science, Vol. 588. Ed. G. Sandini, Springer-Verlag, 1992, pp. 321-334 [7] M. Lourakis, R. Deriche, Camera Self-Calibration Using the Kruppa Equations and the SVD of the Fundamental Matrix : From Point Correspondences to 3D Measurements, Research Report 3748 INRIA, August 1999 [8] Y. Ma, R. Vidal, J. Kosecka, S. Sastry, Kruppa Equation Revisited : its Renormalization and Degeneracy, ECCV 2000, 26 June 2000, Dublin, Ireland, p 561-577 [9] P. Sturm, Vision 3D non calibrée : Contributions à la reconstruction projective et études des mouvements critiques pour l'auto calibrage, PhD report, GRAVIR Laboratory - IMAG - INRIA Rhône-Alpes. [10] X. Armangué, J. Salvi, Overall view regarding fundamental matrix estimation, Image and Vision Computing, Vol. 21, p 205-220, 2003

192