Faugeras (1995) Stratification of 3-d vision

we will call the world, when viewed by a system of cameras. ..... this terminology is that we can think of the projective plane as containing the usual a ne plane .... projective plane P2 by perspective projection from a center of projection C (the ...
396KB taille 2 téléchargements 240 vues
Stratication of 3-D vision: projective, ane, and metric representations Olivier Faugeras INRIA 2004 route des Lucioles, B.P. 93 06902 Sophia-Antipolis, France [email protected] Abstract

In this article we provide a conceptual framework in which to think of the relationships between the three-dimensional structure of the physical space and the geometric properties of a set of cameras which provide pictures from which measurements can be made. We usually think of the physical space as being embedded in a three-dimensional euclidean space where measurements of lengths and angles do make sense. It turns out that for articial systems, such as robots, this is not a mandatory viewpoint and that it is sometimes sucient to think of the physical space as being embedded in an ane or even projective space. The question then arises of how to relate these models to image measurements and to geometric properties of sets of cameras. We show that in the case of two cameras, a stereo rig, the projective structure of the world can be recovered as soon as the epipolar geometry of the stereo rig is known and that this geometry is summarized by a single 3 3 matrix, which we called the fundamental matrix 1, 2]. The ane structure can then be recovered if we add to this information a projective transformation between the two images which is induced by the plane at innity. Finally, the euclidean structure (up to a similitude) can be recovered if we add to these two elements the knowledgepof two conics (one for each camera) which are the images of the absolute conic, a circle of radius ;1 in the plane at innity. In all three cases we show how the three-dimensional information can be recovered directly from the images without explicitely reconstructing the scene structure. This denes a natural hierarchy of geometric structures, a set of three strata, that we overlay on the physical world and which we show to be recoverable by simple procedures relying on two items, the physical space itself together with possibly, but not necessarily, some a priori information about it, and some voluntary motions of the set of cameras.

1

1 Introduction This article discusses several ways of representing the geometry of three-dimensional space, which we will call the world, when viewed by a system of cameras. We usually think of the world as being euclidean, i.e. of being a place where it makes sense to measure angles and distances. When we look at this space with a system of cameras, we rst make measurements in the images and then, attempt to relate them to three-dimensional quantities. This has been, and still is, one of the main research topics in computer vision in such areas as motion analysis, stereo, and camera calibration. All computer vision scientists know that going from image quantities to reliable three-dimensional metric quantities is very dicult, basically because a camera is not a metric device, unless it has been carefully calibrated, which is itself a very dicult task. We will see later in the paper that a camera is really a projective device, hence, part of the diculty. There are two very important ideas that have emerged in the recent years and which are related to this problematic. The rst idea is that it is not always necessary, in order to perform tasks in the world, to use metric measurements and that less detailed measures such as, for example, ratios of lengths, may quite often be sucient to achieve these tasks. This is also an active research area in robotics and vision. The second idea is that calibration in the usual sense i.e. by using special calibration grids can be entirely avoided by using active camera motions and exploiting the fact that the world can be modelled as euclidean. In this paper I want to articulate these two ideas with the idea that we can dene on the three-dimensional space that we call the world not only the structure of an euclidean metric space that we are used to, but also weaker (hence more general) structures, e.g. ane and projective. These structures can be thought of geometric strata which are overlaid one after each other upon the world. First, the projective stratum which can be specialized next into an ane stratum which can itself be specialized further into an euclidean stratum. If we are used to thinking about projective, ane, and euclidean spaces, it is rarely the case that we think of these three structures simultaneously. But I think that in order to really understand the relationship between the world and its images we must be able to picture in our minds those three structures overlaid upon each other. Closely related to this stratication idea is the idea of group of geometric transformations acting on the elements of these strata and leaving invariant some properties of geometric congurations of these elements. Attached to the projective stratum is the group of projective transformations or collineations, attached to the ane stratum is the group of ane transformations, and attached to the metric stratum is the group of rigid transformations or displacements. In fact, because we rarely have an absolute yardstick for measuring distances, we are in practice interested in a subgroup of the group of displacements, the group of similitudes, otherwise known as euclidean transformations. Interestingly enough, this group appears very naturally when we build the series of strata. It is well-known but remarkable enough to be stressed here that these four groups can be considered as subgroups of each other, e.g. the group of ane transformations can be considered as a subgroup of the group of collineations and the group of similitudes as a subgroup of the group of ane transformations. These relationships will be made clearer in the paper. This notion of groups brings in naturally another notion which is central to this paper, i.e. the notion of invariant. In our context, an invariant is a property of a geometric conguration which does not change when a transformation of a given group is applied to that geometric conguration. For a given geometric conguration, e.g. a set of points, lines, surfaces, there may exist projective invariants which are properties of the conguration which do not change when we apply a projec2

tive transformation to the elements of the conguration. These invariants are also be ane and similitude invariants of the same conguration. From the practical standpoint, this means that if can measure invariant properties of the world within the projective stratum (the most general), then these properties will remain invariant in the next strata, i.e. ane and euclidean. We do not present in this paper any experimental results. This is not to say that we are not interested in the actual implementation of the ideas that I will present. In fact, many of these ideas or consequences of them have been implemented, sometimes on special purpose hardware, and we refer the interested reader to the corresponding publications 2, 1, 3, 4, 5, 6]. The purpose of this paper is to provide a coherent framework in which to express these ideas in a somewhat systematic and formal, perhaps even elegant, way. We hope that the reader will accept to step back a little and take a fresh look at a number of old problems. This may provide opportunities for solving more eciently some newer problems.

2 Related work Koenderink and van Doorn have started the interest of the computer vision community in non metric reconstructions from sets of cameras with their pioneering work on ane structure from motion 7]. Gunnar Sparr has developed a theory based on a novel denition of shape 8, 9] and showed that this theory could also be used to compute ane and projective reconstructions of the world from point correspondences. Contrarily to us, he does not rely on epipolar geometry to obtain these reconstructions. The cost he has to pay is a somewhat more complicated theory than ours and a more dicult combinatorial problem of obtaining the point correspondences since he cannot rely on the epipolar constraint 10, 11, 12]. Roger Mohr and coworkers 13] have used some ideas from projective geometry to perform reconstruction of the world from a number of point correspondences. However, neither do their clearly distinguish between the three main classes of reconstructions nor do they abstract them from the relevant camera geometry. Richard Hartley and coworkers 14] have developed simultaneously and independently of 15] a method to compute a projective reconstruction of the world from point correspondences. The second paper also included a preliminary discussion of the ane reconstruction case. More recently, Amnon Shashua 16, 17, 18] has developed a set of similar ideas which dier slightly from those expressed in 15, 14] by the fact that, through the use of an extra plane of reference, he introduces a special projective invariant which allows him to elegantly predict the position of image points in other views (for a related approach, see 19]). No attempt has been made in this work to relate the three types of possible reconstruction. The ideas developed in the present paper are closely related to those expressed by Luong and Viville 20] who also looked at the problem of representing systems of cameras in the framework of projective, ane and euclidean geometries. Their main emphasis was on the characterization of invariant representations for the perspective projection operation performed by the cameras while we are interested here in obtaining invariant representations of the 3-D scene, in determining how the minimum information about the camera geometry, necessary to estimate such representations can be obtained from the images, and how the 3-D representations themselves can be obtained from the images without actually performing an explicit 3-D reconstruction.

3

3 Stratication of 3-D space: projective, ane and euclidean structures 3.1 Notations

We represent vectors and matrices in boldface, i.e. vector x is noted x. Since a great deal of our discussion deals with geometric entities which can sometime be represented by vectors or matrices we sometimes dierentiate between the geometric entity itself, e.g. a point x and its vector representation x.

3.2 3-D space as a projective space

We will rst consider that the world is embedded in a projective space of dimension three noted P 3. Similarly, we will consider the retinal plane of a camera as embedded in a projective space of dimension two noted P 2 . Since we will also have to consider projective spaces of dimension one, we begin with a brief pedestrian introduction to general projective spaces of any dimension and then specialize to the cases where this dimension equals one, two, or three.

3.2.1 General projective spaces

The use of projective spaces has been made popular in three-dimensional computer graphics and robotics from the early days because they allowed a very compact representation of all changes of coordinate systems as four by four matrixes instead of a rotation matrix and a translation vector. This is because such changes are special cases of linear projective transformations called collineations. Let us look at general projective spaces in more detail. An n-dimensional projective space, P n (in this article we will use only the cases n = 1 2 3) can be thought of as arising from an n + 1 dimensional vector space (real or complex) in which we dene the following relation between non zero vectors. To help guide the reader's intuition, it is useful to think of a non zero vector as dening a line through the origin. We say that two such vectors x and y are equivalent if and only if they dene the same line. Mathematically, this can be stated as the fact that there exists a non zero scalar  such that

y = x It is easily veried that this denes an equivalence relation on the vector space minus the zero vector. The equivalence class of a vector is the set of all non zero vectors which are parallel to it (it can be thought of as the line dened by this vector). The set of all equivalence classes is the projective space P n . A point in that space is called a projective point and is an equivalence class of vectors and can therefore be represented by any vector in the class. If x is such a vector, then x  6= 0 is also in the class and represents the same projective point. The coordinates of any vector in the equivalence class are the coordinates of the corresponding projective point. They are therefore not all equal to zero (we have excluded the zero vector from the beginning) and dened up to a scale factor. It is sometimes useful to diereintiate between the projective point noted x and one of its coordinate vectors noted x. What do we have so far? a projective point in P n is represented by an n +1-vector of coordinates x = x1 : : : xn+1]T , where at least one of the xi is non-zero. The numbers xi are sometimes called the homogeneous or projective coordinates of the point, and the vector x is called a coordinate vector. 4

Two n +1-vectors x1 : : : xn+1]T and y1 : : : yn+1]T represent the same point if and only if there exists a non-zero scalar  such that xi = yi for 1  i  n + 1. Therefore, the correspondence between points and coordinate vectors is not one-to-one and this makes the application of linear algebra to projective geometry a little more complicated.

Collineations We now look at the linear transformations of a projective space. An (n+1)(n+1) matrix A such that det(A) is dierent from 0 denes a linear transformation or collineation from P n into itself. It is easy to see that the set of collineations is a group for the usual operation of

matrix multiplication. This group is also known as the projective group The matrix associated with a given collineation is dened up to a nonzero scale factor, which we usually denote by: y = Ax and also x ^ y Quite often we will omit for simplicity the factor  and write simply y = Ax. The reader must remember that this is a projective equality, equivalent to the equality of n ratios.

Projective basis Another important notion is that of a projective basis. This is the extension

to projective spaces of the idea of coordinate system. A projective basis is a set of n + 2 points of P n such that no n + 1 of them are linearly dependent. A set of projective points are linearly independent if, considering any set of coordinate vectors of these points, these vectors are linearly independent. It is readily veried that this is independent of the choice of the coordinate vectors and of the choice of basis vectors. For example, the set ei = 0 : : :  1 : : :  0]T  i = 1     n + 1, where 1 is in the ith position, and en+2 = 1 1     1]T , is a projective basis, called the standard projective basis. A projective point of P n represented by any of its coordiante vectors x can be described as a linear combination of any n + 1 points of the standard basis. For example:

x=

nX +1 i=1

xi ei

We will use several time in the sequel the following result, borrowed from, for example, 21] and the proof of which can be found in 15]. Proposition 1 Let x1     xn+2 be n +2 coordinate vectors of points in P n , no n +1 of which are linearly dependent, i.e., a projective basis. If e1      en+1 en+2 is the standard projective basis, there exist nonsingular matrices A such that Aei = i xi  i = 1     n +2, where the i are non-zero scalars any two matrices with this property dier at most by a scalar factor. This proposition tells us that any projective basis can be transformed, via a collineation into the standard projective basis.

Change of projective basis Let us consider two sets of n+2 points represented by the coordinate vectors x1     xn+2 and y1     yn+2. It can be proved that if the points in these two sets are in

general position, there exists a unique collineation that maps the rst set of points onto the second. Proposition 2 If x1     xn+2 and y1     yn+2 are two sets of n +2 coordinate vectors such that in either set no n +1 vectors are linearly dependent, i.e., form two projective basis, then there exists a non-singular (n + 1)  (n + 1) matrix P such that Pxi = i yi  i = 1     n + 2, where the i are scalars, and the matrix P is uniquely determined apart from a scalar factor. This proposition shows that a collineation is dened by n + 2 pairs of corresponding points. The proof can be found for example in 22]. 5

3.2.2 Projective lines, planes and spaces

We illustrate these general notions on three examples which will be used in the rest of the paper.

The projective line The space P 1 is known as the projective line. It is the simplest of all

projective spaces, which is the rst reason why we start with it. The second reason is that many structures embedded in higher dimensional projective spaces have the same structure as P 1. The standard projective basis of the projective line is e1 = 1 0]T  e2 = 0 1]T , and e3 = 1 1]T . A point on the line can be written as

x = x1e1 + x2e2

(1)

with x1 and x2 not both equal to 0. Let us consider a subset of P 1 of the points such that x2 6= 0. This is the same as excluding the point represented by e1 . Now since the homogeneous coordinates are dened up to a scalar, these points are described by a parameter , ;1    +1 so that

x = e1 + e2 where  = xx12 . The parameter  is often called the projective parameter of the point. Note that the point represented by e2 has projective parameter equal to 0. We now dene the very important concept of the cross-ratio, which is a quantity that remains invariant under the group of collineations. Let a b c d be four points of P 1 with their respective projective parameters a, b, c , d . Then the cross-ratio fa b c dg is dened to be fa b c dg = a ;; c : b ;; c (2) a d b d The signicance of the cross-ratio is that it is invariant under collineations of P 1. In particular, fa b c dg is independent of the choice of coordinates in P 1 . Note that the collineations of P 1 are usually called homographies.

The projective plane The space P 2 is known as the projective plane. A point in P 2 is dened by three numbers, not all zero, (x1 x2 x3). They form a coordinate vector x dened up to a scale factor. In P 2, there are objects other than points, such as lines. A line is also dened by a triplet of numbers (u1 u2 u3), not all zero. They form a coordinate vector u dened up to a scale factor. The equation of the line is

3 X

i=1

uixi = 0

(3)

Formally, there is no dierence between points and lines in P 2. This is known as the principle of duality. A point represented by x can be thought of as the set lines through it. These lines are represented by the coordinate vectors u satisfying uT x = 0. This is sometimes referred to as the line equation of the point. Inversely, a line represented by u can be thought of as the set of points represented by x and satisfying the same equation, called the point equation of the line. The principle of duality is a statement about theorems: given a theorem about points and lines, interchange the roles of the points and lines, and adjust the wording accordingly then the new statement will also be true. Let us now generalize the notion of cross-ratio, introduced in the previous section for four points of P 1 , to four lines of P 2 intersecting at a point. Given four lines l1  l2  l3 l4 of P 2 that intersect 6

at a point, their cross-ratio fl1  l2 l3 l4g is dened as the cross-ratio fP1 P2 P3 P4g of their four points of intersection with any line l not going through their point of intersection. This value is of course independent of the choice of l. There is a structure of the projective plane that has numerous applications, especially in stereo and motion. The name of this structure is the pencil of lines. It is the set of lines in P 2 passing through a xed point. This is a one-dimensional projective space known as a pencil of lines. Let us consider two lines l1 and l2 of the pencil represented by their coordinate vectors u1 and u2. Any line l of the pencil goes through the point of intersection of l1 and l2 represented by u1 ^ u2 . Thus, its coordinate vector u satises uT (u1 ^ u2) = 0, or equivalently

u = u1 +  u2 for two scalars  and  . This equation is formally equivalent to equation (1), and therefore the structure of a pencil of lines is the same as that of the projective line P 1 . Another, perhaps more elegant, way of proving this result is to apply the principle of duality: the set of lines going through a point is the dual of the set of points on a line, i.e., a projective line! Collineations of P 2 are dened by 33 invertible matrices, dened up to a scalar factor. According to proposition 1, such a collineation is dened by 4 pairs of corresponding points. Collineations transform points, lines, and pencils of lines into points, lines, and pencils of lines, and preserve cross-ratios. In the projective plane, the class of conic curves is especially important for reasons which will become apparent in sections 4.4 and 7. We give some simple properties of conics that will be used in later sections. A conic ! is a curve dened by the locus of points of the projective plane that satisfy the equation 3 X S (x) = aij xixj = 0 ij =1

where the scalars aij satisfy aij = aji for all i j and hence form a 3  3 symmetric matrix A. We can rewrite this equation in matrix form as

S (x) = xT Ax = 0 A is dened up to a scale factor and thus the conic depends on ve independent parameters. We consider only in the following non-singular conics for which matrix A is invertible. Let y and z be two points of the plane represented by y and z, respectively. A variable point on the line hy z i with projective parameter is represented by y + z, and this point lies on the conic ! if and only if S (y + z) = 0 By expanding this and grouping terms of similar degrees in we have S (y) + 2 S (y z) + 2S (z) = 0 (4) where S (y z) = yT Az = S (z y) This means that, in general, there are two points of intersection of the line hy z i with the conic !. These intersection points can be real or complex and are obtained by solving the quadratic equation (4). The two points are the same if and only if the following relation holds S (y z)2 ; S (y)S (z) = 0 7

If we consider that the point y is xed, this equation is quadratic in the coordinates of z : it is the equation of the two tangents from y to !. Specializing further, if y belongs to !, S (y) = 0 and the equation of the tangents becomes S (y z) = 0 which is linear in the coordinates of z : there is only one tangent to the conic at a point of the conic! note that this tangent l is represented by the vector l = Ay. We see that when y varies along the conic, it satises the equation yT Ay = 0 and thus the tangent l satises the equation lT A;T l = 0. This shows that the tangents to a conic ! dened by the matrix A (which we assume to be of rank 3) can be thought of belonging to a conic ! in the dual plane dened by a matrix proportional to A;T . This conic is called the dual conic of the conic !. Let B be the matrix of cofactors of matrix A. Since A;1 = det1(A) BT , we conclude that we can use B for representing ! instead of A;T . Related to these ideas are those of poles and polars which we will use in section 4.4.2. Given a point x represented by the vector x, the polar of x with respect to the conic ! dened by the matrix A is the line represented by the vector Ax. Therefore, the relation S (x y) = 0 is equivalent to saying that the point y is on the polar of the point x and vice versa. Given a line l represented by l, the pole of l with respect to the conic ! is the point x whose polar is l. Assuming that the matrix A is of rank 3, this point is therefore represented by the vector A;1l.

The projective space The space P 3 is known as the projective space. A point x in P 3 is dened by four numbers, (x1 x2 x3 x4), not all zero. They form a coordinate vector x dened up to a scale factor. In P 3, there are objects other than just points and lines, such as planes. A plane is also dened as a four-tuple of numbers (u1 u2 u3 u4), not all zero, which form a coordinate vector u dened up to a scale factor. The equation of this plane is then 4 X

i=1

uixi = 0

(5)

This shows that the same principle of duality that exists in P 2 between points and lines exists in P 3 between points and planes. A point represented by x can be thought of as the set of planes through it. These planes are represented by u satisfying uT x = 0, which is called the plane equation of the point. Inversely, a plane represented by u can be thought of as the set of points represented by x and satisfying the same equation, called the point equation of the plane. Let us generalize the notion of cross-ratio introduced for four points of P 1 and four lines of P 2 intersecting at a point, to four planes of P 3 intersecting at a line. Given four planes 1 2 3 4 of P 3 that intersect at a line l, their cross-ratio f 1 2 3 4g is dened as the cross-ratio fl1  l2 l3 l4g of their four lines of intersection with any plane not going through l. This is of course independent of the choice of . The cross-ratio can also be dened as the cross-ratio of the four points of intersection of any line, not lying in any of the four planes, with the four planes. This is also independent of the choice of the line. The structure that is analogous to the pencils of lines of P 2 is the pencil of planes, the set of all the planes that intersect at a given line. This structure is also a projective space of dimension one, an analog to the space P 1 since, using the principle of duality, a pencil of planes is projectively equivalent to a set of points on a same line (this line is the dual of the line of intersection of the planes). Let us use this concept to show that the ratios of the projective coordinates of a point M in a given projective basis can be interpreted as a cross-ratio. In order to do this, we assume without loss of generality (thanks to proposition 1) that the projective basis is the standard projective 8

basis of the projective space. We consider the four planes 1 (e1 e2  e3), 2 (e1  e2 e4 ),

3 (e1 e2 e5) and 4 (e1 e2 M ) which all go through the line he1 e2i. M is a point of projective coordinates (p q r s). The equations of these four planes are readily shown to be equal to

1 : x3 = 0

2 : x2 = 0

3 : x3 ; x2 = 0

4 : rx3 ; sx2 = 0 We can use the two planes 1 and 2 and the plane of equation x2 + x3 = 0 as the projective basis of the pencil of planes of axis he1 e2i. Looking at the previous equations, we see that

3 = 1 ; 2

4 = r 1 ; s 2 0+1 : 1+1 = r and therefore, the cross-ratio f 1 2 3 4g is equal to 0+ 1+ s i.e. to the ratio of the third to the fourth projective coordinates of M . This is shown in gure 1. We will use this remarkable relation in sections 5.2 and 6.2. s r

s r

Figure 1 approximately here. Collineations of P 3 are dened by 44 invertible matrices dened up to a scale factor. According to proposition 1, such a collineation is dened by 5 pairs of corresponding points. Collineations transform points, lines, planes, and pencils of planes into points, lines, planes and pencils of planes, preserving cross-ratios.

3.3 3-D space as an ane space

We now describe the second stratum that we will consider. The idea is to think of the world (and for that matter of the retina) as an ane space embedded in the corresponding projective space, i.e. P 3 and P 2, respectively. We consider rst the case of the retina, i.e. of the projective plane and then the case of the world, i.e. of the projective space. But to make things more clear we start with the projective line P 1 and show how we can associate an ane line to it.

3.3.1 Projective and ane lines The point represented by e1 is called the point at innity of the line P 1. It is dened by the linear equation x2 = 0. The reason for this terminology is that if we think of the projective line as containing the usual ane line under the correspondence  ! e1 + e2, then the projective

parameter  of the point gives us a one-to-one correspondence between the projective and ane lines for all values of  dierent from 1 (the ane line is simply the set of real numbers). The values  = 1 correspond to the point e1 , which is outside the ane line but is the limit of points of the ane line with large values of . This turns out to be an extremely useful interpretation of the relationship between the ane and projective lines and, as we show later, can be generalized to higher dimensions. Note that the choice of e1 as the point at innity is arbitrary and any other point will do equally well.

9

3.3.2 Projective and ane planes The line at innity Suppose we choose a line in the projective plane. Without loss of generality,

we can assume its equation to be x3 = 0. We call this line the line at innity of P 2 , denoted l1 . Just as in the previous case of the projective line, the choice of l1 as the line at innity is arbitrary and any other line will do equally well. But it is worth noting that points and lines at innity can be chosen consistently: i.e. if l1 is the line at innity of P 2 and l a line of P 2 dierent of l1 , then l \ l1 is a suitable choice for the point at innity on l (see next paragraph). The reason for this terminology is that we can think of the projective plane as containing the usual ane plane under the correspondence X = X1 X2 ]T ! X1  X2 1]T or X1e1 + X2 e2 + e3. This is a one-to-one correspondence between the ane plane and the projective plane minus the line of equation x3 = 0. For each projective point of coordinates (x1 x2 x3) that is not on that line, we have

X1 = xx31 X2 = xx23

(6)

If X1 ! 1 while X2 does not, we obtain e1, which is on l1 . Similarly, when X2 ! 1 while X1 does not, we obtain e2. Each line in the projective plane of the form of equation (3) intersects l1 at the point (;u2 u1 0), which is that line's point at innity. Note that the vector ;u2 u1]T gives the direction of the ane line of equation u1X1 + u2X2 + u3 = 0. This gives us a neat interpretation of the line at innity: each point on that line, with coordinates (x1 x2 0), can be thought of as a direction in the underlying ane plane, the direction parallel to the vector x1 x2]T . Indeed, it does not matter if x1 and x2 are dened only up to a scale factor since the direction does not change. We will use this observation later. As a rst, and very useful, application of the idea of thinking about the ane plane as embedded in a projective plane, let us consider the case of two parallel (but not identical) lines. Since by denition these two lines have the same direction parallel to the vector ;u2 u1]T , this means that if we consider them as projective lines of the projective plane, they intersect at the point represented by ;u2 u1 0]T of l1 . Therefore, two distinct parallel lines intersect at a point of l1 : thinking of the ane plane as embedded in the projective plane allows to avoid considering special cases.

Ane transformations of the plane We have seen that there is a one-to-one correspondence between the usual ane plane and the projective plane minus the line at innity. In the ane plane, we know that an ane transformation denes a correspondence X ! X0, which can be expressed in matrix form as

X0 = BX + b (7) where B is a 2  2 matrix of rank 2, and b is a 2  1 vector. From this equation it is clear that

these transformations form a group called the ane group, which is a subgroup of the projective group. This subgroup has the interesting property that it preserves the line at innity. Indeed, let A be the matrix of a collineation of P 2 that leaves l1 invariant. The matrix A can be written as " #

A = 0CT ac33 2 where C is a 2  2 matrix and c is a 2  1 vector. The condition that the rank of A is 3 implies that a33 6= 0 and the rank of C is equal to 2. Using the equations (6) we can write equation (7) with B = a133 C and b = a133 c. 10

3.3.3 Projective and ane spaces The plane at innity Similarly to the previous case, let us assume that we choose a plane in

the projective space P 3. Without loss of generality, we can assume its equation to be x4 = 0. We call this plane the plane at innity 1 of P 3. The reason for this terminology, just as in the case of P 2, is that it is possible to think of the projective space as containing the usual ane space under the correspondence X = X1 X2  X3]T ! X1 X2  X3 1]T or X1e1 + X2 e2 + X3 e3 + e4. This is a one-to-one correspondence between the ane space and the projective space minus the plane at innity of equation x4 = 0. For each projective point of coordinates (x1 x2 x3 x4) not in that plane, we have X1 = xx14 X2 = xx24 X3 = xx34 Similarly to the case of P 2, points, lines, and planes at innity can be chosen consistently in P 3 : if

1 is the plane at innity of P 3 and (resp. l) is a plane (resp. a line) of P 3 not equal to (resp. not included in) 1, then \ 1 (resp. l \ l1 ) is a suitable choice for the line at innity (resp. the point at innity) on (resp. on l). Hence each plane of equation (5) intersects the plane at innity along a line that is its line at innity. As in the case of the projective plane, it is often useful to think of the points in the plane at innity as the set of directions of the underlying ane space. For example, the point of projective coordinates x1 x2 x3 0]T represents the direction parallel to the vector x1 x2 x3]T and indeed, it does not matter whether x1 x2 x3 are dened up to a scale factor, since the direction does not change. An analysis similar to the one done in the two-dimensional case shows that two distinct ane parallel planes can be considered as two projective planes intersecting at a line in the plane at innity 1.

Ane transformations of the space In a similar fashion to the case of the projective plane,

we can consider the subset of the projective group that preserves the plane at innity. This set is a subgroup of the projective group called the ane group, and the transformations can be written in the same way as in equation (7) X0 = BX + b (8) where matrix B is 3  3 and has rank 3, and b is a 3  1 vector.

3.4 3-D space as a euclidean space

As a nal stratum, and to complete our trilogy, we want to think of the world (and for that matter of the retina) as a euclidean space embedded in the previous ane space. We consider rst the case of the retina, i.e. of the ane and projective plane and then the case of the world, i.e. of the ane and projective space.

3.4.1 Euclidean transformations of the plane: the absolute points

We can further specialize the set of ane transformations of the plane and require that they preserve not only the line at innity but also two special p points on that line called the absolute points I and J with coordinates (1 i 0), where i = ;1. This imposes constraints on matrix B in equation (7). Since we insist that I and J remain invariant, we have 1 = b111 + b12i + b10 i b211 + b22i + b20 11

which yields and

1 = b111 ; b12i + b10 ;i b211 ; b22i + b20 (b11 ; b22)i ; (b12 + b21) = 0

;(b11 ; b22)i ; (b12 + b21) = 0 Therefore b11 ; b22 = b12 + b21 = 0 and we can write " # cos  sin  X0 = c ; sin  cos  X + b (9) with c > 0 and 0   < 2 . This class of transformations is sometimes called the class of similitudes. It forms a subgroup of the ane group and therefore of the projective group. This group is called the similitude group or the euclidean transformations group. The ane point represented by X is rst rotated by  around the origin, then scaled by c, and translated by b. If we specialize the class of transformations further by assuming that c = 1, we obtain another subgroup called the group of (proper) rigid displacements. As an application of the use of the absolute points, we show how they can be used to dene the angle between two lines. The angle  between two lines l1 and l2 can be dened by considering their point of intersection m and the two lines im and jm joining m to the absolute points I and J (see gure 2). The angle is given by Laguerre formula: (10)  = 21i log(fl1  l2 im jmg) Which is also equal to the cross-ratio of the four points I J m1 m2 of intersection of the four lines with the line at innity l1. Because ei = cos + i sin = ;1, we see that if the cross-ratio fl1  l2 im jm g is equal to -1, the two lines l1 and l2 are perpendicular. Figure 2 approximately here.

3.4.2 Euclidean transformations of the space: the absolute conic

We can also further specialize the ane transformations of the space and require that they leave aPspecial conic invariant. This conic, , is obtained as the intersection of the quadric of equation 4 2 i=1 xi = 0 with 1 4 X x2i = x4 = 0 i=1

The conic  isp also called the absolute conic. Note that in 1,  can be interpreted as a circle of radius i = ;1, an imaginary circle! Therefore, all its points have complex coordinates in the standard projective basis and if m is a point of , then m, the complex conjugate point, is also on  since the absolute conic is dened by equations with real coecients. It is not dicult to show that the ane transformations that keep  invariant can be written X0 = cCX + b (11) where c > 0 and C is orthogonal, i.e., satises the equation CCT = I (see for example 22]). As in the two-dimensional case, this subset of the ane group is a subgroup called the similitude group. Similarly, the subset of the similitude group where c = 1 is also a subgroup called the group of (proper) rigid displacements. 12

3.5 Conclusion

We have shown how the world (resp. the retina) can be considered as a succession of strata. Each stratum corresponds to a specic geometric structure that we impose on the world (resp. on the retina). These geometric structures can be ordered in a hierarchy, from general (i.e. the projective structure), to more specialized (i.e the euclidean structure). To each stratum corresponds a group of transformations. These three groups are included in each other in a group theoretical sense: the group of similitudes is a subgroup of the ane group which is itself a subgroup of the projective group. Each group leaves some geometric quantities invariant: the cross-ratio is the most notable one for the projective group, the ratio of the lengths of two parallel vectors is the most notable one for the ane group, and angles and ratios of lengths are the most notable ones for the group of similitudes. We will see more of these invariants in the next sections.

4 Camera geometry Let us now turn to the sensor that we use to measure the world: the camera. We model classically a camera as a pinhole. This has proven to be an excellent approximation for most practical purposes. Even though it is important to keep in mind that the pinhole model is only an approximation, albeit usually a very good one, of a real physical camera, we hope to convince the reader in what follows of the usefulness of forgetting for some time the actual physical device and of thinking of the camera as a projective geometric engine. We develop this line of thought in the following sections and relate the projective modelling of the camera to the three strata which were presented in the previous section.

4.1 The perspective projection model

This projective engine maps the two-dimensional projective space P 3 onto the two-dimensional projective plane P 2 by perspective projection from a center of projection C (the optical center of the camera) onto a plane R not containing C (the retinal plane of the camera). This projection operation is projective linear in the sense that if we choose a projective basis of 3 P and a projective basis of P 2 , the correspondence between a point M of P 3 represented by M and its image m of P 2 represented by m can be written in vector form

m = PM

(12)

where P is a 3  4 matrix of rank 3 dened up to the multiplication with a non zero scalar. This matrix depends therefore upon 11 parameters and is called the perspective projection matrix of the camera. Note that if we change projective basis in the world by M0 = KM (K a 4  4 matrix of rank 4) and in the retinal plane by m0 = Hm (H a 33 matrix of rank 3), then the perspective projection matrix becomes P0 = HPK;1. Given the perspective projection matrix P and without any further assumption about the world, we can recover the coordinates of the optical center in the projective basis of the world. Indeed, the optical center is the point for which the perspective projection is not dened, it has no image and must therefore satisfy

PC = 0

which shows that C is represented by any non zero vector of the nullspace of the matrix P which is by denition of dimension 1 since rank(P) = 3. 13

4.2 Two cameras and the fundamental matrix: the projective stratum

If we now consider a binocular stereo rig, we can bring in some more geometric information which has profound implications for computer vision problems. Let us call C 0 the optical center of the second camera and R0 its retinal plane. The line hC C 0 i intersects R (resp. R0 ) in a point that we denote by e (resp. e0 ). These two points are called the epipoles of the stereo rig. By construction, any plane containing the line hC C 0 i, called an epipolar plane, intersects R (resp. R0) along a line going through the epipole e (resp. along a line going through the epipole e0 ), see gure 3 . Two such lines are called corresponding epipolar lines and have an immense importance for stereo algorithms. We can rephrase the situation in projective terms by saying that the pencil of epipolar planes induces in each retinal plane a pencil of epipolar lines. According to section 3 these three pencils are projective lines, i.e. one-dimensional projective spaces P 1 . Figure 3 approximately here.

The fundamental property of this geometric construction is that the "natural" correspondence between the two pencils of epipolar lines is projective linear, it is a homography between the two pencils considered as projective lines. The "natural" correspondence consists in associating with each epipolar line of the rst pencil the corresponding epipolar line of the second, i.e. the intersection of the epipolar plane dened by the rst one and the two optical centers with the second retinal plane. The reason why it is an homography is because it is one to one and preserves cross-ratios: The cross-ratio of four lines of the rst pencil is equal to the cross-ratio of the four corresponding epipolar planes which is equal to the cross-ratio of the four corresponding epipolar lines in the second retinal plane. This homography is at the heart of many of the ideas which will be presented in the next sections (see gure 4). We call it the epipolar homography. Figure 4 approximately here. Having presented the geometric viewpoint, let us now present its algebraic face. In order to do this, we will adopt a slightly dierent view, namely we will characterize the relationship between a point m in the rst retinal plane and its epipolar line lm0 in the second. This correspondence is also clearly projective linear (it is a projective linear mapping between the rst retinal plane considered as a P 2 and the dual of the second retinal plane, also considered as a P 2) and therefore there exists a 3  3 matrix F, dened up to a scale factor, such that

l0m = Fm

The matrix F is not of rank 3 since if m coincides with the epipole e its epipolar line is undened and therefore

Fe = 0

Let us now further consider a point m0 on the epipolar line lm0 of m. This point satises the relation

m T Fm = 0 0

which shows that the epipolar line lm in the rst retinal plane of m0 is represented by FT m0: 0

lm = FT m0 0

In particular we have

FT e0 = 0 14

(13)

The matrix F expresses algebraically the epipolar correspondence between the two retinal planes. It is called the fundamental matrix 2, 1, 4]. Its rank is in general equal to 2 and it therefore depends upon seven free parameters. A set of such parameters which have a neat geometric interpretation are the four ratios of projective coordinates of the two epipoles and the three ratios of the coecients of the homography between the two pencils of epipolar lines. Just as in the one-camera case where we related the optical center to the perspective projection P, in the two-cameras case, we can also relate the fundamental matrix F to the two perspective projection matrices P and P0. The interested reader is referred to, for example 22].

4.3 Two cameras looking at planes: the ane stratum

According to our discussion of section 3.3, in order to go from a projective representation of the world to an ane representation, we have to identify the plane at innity. Before explaining how this can be achieved, we will start with a brief description of the relationship between planes in the world and a pair of cameras. Indeed, planes in the world have very interesting properties with respect to our stereo rig. In eect, a plane induces in general a projective linear correspondence, a collineation, between the two retinal planes. This can be readily seen by noting that the perspective projection from to R (resp. from

to R0 ) is one to one if does not contain the optical center C (resp. the optical center C 0 ) and preserves cross-ratios and is therefore a collineation. Composing the inverse of the rst collineation with the second denes a collineation from R to R0 called the collineation induced by that we note H and represent by the 3  3 matrix H . Even though a collineation of P 2 depends upon 8 parameters, there is no contradiction with the fact that a plane depends upon 3 parameters. Indeed, the collineation is related to the fundamental matrix 2] in the following manner. Let m be a point of R. The point m0 of R0 represented by H m is the image in the second camera of the intersection of the optical ray hC mi with . Therefore it belongs to the epipolar line of m and we have

(H m)T Fm = 0 for all points m. This implies that the matrix HT F is antisymmetric: HT F + FT H = 0

(14)

This imposes six homogeneous constraints on the collineation. The interaction between the geometry of the stereo rig and H can also be seen as follows. Let us consider the point of intersection P of the line hC C 0 i with . The images of P in the two retinal planes are the two epipoles e and e0 which therefore correspond to each other through H . A consequence of this is that if the fundamental matrix is known (and thus the epipoles), three pairs of corresponding points are sucient to determine the collineation since a fourth pair (e e0) is already available. This observation can be turned into a very simple geometric construction: Let us assume that the plane (or its collineation) is represented by three pairs of corresponding points (mi  m0i ) i = 1 2 3 and the pair of epipoles (e e0 ). Given a point m in the rst image, how do we construct its image m0 in the second image under the plane collineation? This is shown in gure 5. We construct the point m12 intersection of the two lines hm1 m3 i and hm2  mi. The point m012 in the second image at the intersection of the line hm01 m03i and the epipolar line lm0 12 of m12 corresponds to m12 under the plane collineation since the line projecting to hm1 m3i in the rst image and to hm01 m03i in the second is certainly in the plane. Therefore, the line hm02 m012i is the image of the line hm2 m12i 15

and its intersection with the epipolar line lm0 of m yields the sought for point m0 . We call this procedure the Point-Plane procedure. We can use this procedure for solving another problem which will appear several times in the remaining of the paper. The problem is the following. Given a plane in the world represented either by its collineation or by three pairs of point correspondences, and given a line in the world represented by its pair of images (l l0 ), construct the images of the point of intersection of the line with the plane. If the plane is represented by its collineation, we just apply it to the line l, obtaining d0. The point of intersection m0 of l0 and d0 is the image in the second camera of the point M of intersection of the 3-D line with the plane. m is then obtained for example by intersecting the epipolar line of m0 with l. If the plane is represented by three pairs of corresponding points (mi  m0i ) i = 1 2 3 and the pair of epipoles (e e0 ) then we can solve our problem very simply by using twice Point-Plane. We call the resulting procedure Line-Plane. Figure 5 approximately here. Returning now to the problem of going from a projective representation of the world to an ane one, we see that the problem is really to obtain at least three pairs of corresponding points which are the images of three points in the plane at innity in order to estimate the collineation it induces between the two retinal planes. We describe several ways of doing this in section 6.

4.4 Two cameras looking at the absolute conic: the euclidean stratum

The image of the absolute conic in each camera is also a conic and this conic does not change when we move the camera around. This is because, as shown in section 3, the absolute conic is invariant with respect to similitudes of the world and hence to rigid displacements. It is hard to envision, but it is nonetheless true, that the absolute conic is a curve with only complex points (see section 3.4.2) whose image in a camera does not change when the camera is moved about the world. This phenomenon is analog to what happens to the image of a point at innity when we translate the camera: it does not change either. Both properties are intimately tied to the structure of the world as an ane or euclidean space and to the geometric operation performed by a camera.

4.4.1 One camera and the absolute conic: Measuring the angle between two optical rays

If the image of the absolute conic is known in a camera, it then becomes a metric measurement device that can compute angles between optical rays 22]. This can be readily seen by using Laguerre's formula given in section 3.4.1 as follows. Let m and n be two image points and consider the two optical rays hC mi and hC ni. Let us call  the angle (between 0 and ) that they form, let M and N be their intersections with the plane at innity, and let A and B be the two intersections of the line hM N i with the absolute conic . The angle  between hC mi and hC ni is given by Laguerre 's formula 21i log(fM N A B g). The reason for this is that the line at innity of the plane dened by the three points C m n is the intersection of that plane with the plane at innity, i.e., the line hM N i. The absolute points of that plane are the intersections A and B of that line with the absolute conic . The cross-ratio fM N A B g is preserved under the projection to the retinal plane, and thus the angle between hC mi and hC ni is given by 21i log(fm n a bg), where the points a and b are the "images" of the points A and B . Since a and b are the two intersections of the line hm ni with !, this shows that the angle can be computed only from the image ! of the absolute conic. The situation is depicted in gure 6. 16

Figure 6 approximately here. In detail, the line hm ni is represented by m + n. The variable is a projective parameter of that line. Point m has projective parameter 0, and point n has projective parameter equal to 1. The reader should not worry about this, since the magic of the cross-ratio will take care of it! In order to compute the projective parameters of a and b we apply equation (4) with S being the equation of !. The projective parameters are the roots of the quadratic equation

S (m) + 2 S (m n) + S (n) 2 = 0 Let 0 and 0 be the two roots, which are complex conjuguate. According to equation (2), we have 1 ; 0 fm n a bg = 00 ;; 0 : 1 ; 0 0 The ratio containing 1 is equal to 1 (that is the magic!), and therefore fm n a bg = 0 = e2iArg(0 ) 0 where Arg( 0 ) is the argument of the complex number 0 . In particular, we have  = Arg( 0 ) ( ). A straightforward computation shows that the two roots are equal to p

;S (m n) i S (m)S (n) ; S (m n)2 S (n ) Simple considerations show that

 n) cos  = ; pSS((m m)S (n)

(15) (16)

an which uniquely denes  between 0 and . The sine is therefore positive and given by p equation 1 ; cos2 .

4.4.2 The absolute conic and the intrinsic parameters

We have seen previously that the image ! of the absolute conic did not change when we moved the camera in space. This, together with the fact that ! contains only complex points, has some strong implications on the coecients of the equation dening !. We now examine these consequences. Let A be the symmetric matrix dening the equation of ! in the retinal plane:

S (m) = mT Am Since ! does not contain any real point, this means that S (m) is either strictly positive or strictly negative for all points m with real coordinates. Let us assume that it is strictly positive. The quadratic form dened by matrix A is accordingly positive denite and we can use a theorem which says that a necessary and sucient condition for a quadratic form to be positive denite is that its matrix can be written as WWT , where W is a lower-triangular matrix with positive diagonal elements 23]. This decomposition is called the Cholesky decomposition of matrix A and is unique. If we dene p = WT m, the equation of ! can be written S (p) = pT p. The matrix W can be interpreted as dening a change of projective coordinates in the retinal plane. For reasons which 17

will become clear later, we are more interested in the matrix W;T which is upper triangular. Let us write this matrix 3 2 a b c W;T = 64 0 d e 75 (17) 0 0 f It is easy to see that W;T , like W, has positive diagonal elements and since W is, like A, dened up to a scale factor, we can assume that f = 1 (f cannot be equal to 0 because otherwise the rank of A would be less than 3). We also have a > 0 and d > 0. Changing notations, we write

a = u b = ;u cot d = sin  c = u0 (18) e = v0 These equations uniquely dene the ve parameters u  v  u0 v0 , and . This is clear for u  u0 v0 . For and v , we see that the equation b = ;u cot denes between 0 and . Thus the sine is positive and since d > 0 this uniquely denes v as a positive number. These parameters have been introduced by several authors from physical and heuristic considerations in the past 22, 2] and are called the intrinsic parameters of the camera. Here they appear without such considerations, as a consequence of the fact that the image of the absolute conic is an imaginary curve. Let us assume that the retinal plane is an ane plane, which makes sense if we are using the image coordinates provided by the sensor. Note that we consider this ane plane as embedded in a projective plane, in agreement with our general approach. The ane plane can be considered as obtained from the projective plane by throwing away the line at innity of equation x3 = 0. A point of the ane plane of coordinates (u v) can be considered as a projective point of coordinates (u v 1). Inversely, a projective point of coordinates (x1 x2 x3) not belonging to the line at innity can be considered as an ane point of coordinates ( xx31  xx23 ). We now give an intuitive interpretation of the intrinsic parameters in this context. First, let us determine the center of !. We know that the center of a conic is the pole of the line at innity, thus it is the point represented by (see section 3.2.2) v

A;1e3 since the vector e3 = 0 0 1]T represents the line at innity. The relation A = WWT implies A;1 = W;T W;1 and therefore, according to equations (17) and (18), the center c of ! is the point of ane coordinates (u0 v0 ). Let us now consider the optical ray hC ci. We will show that it is perpendicular to all the directions of lines of the retinal plane. In order to do this, let us consider a point m on the line at innity of the retinal plane. The optical ray hC mi is therefore parallel to the retinal plane. In order to compute the angle between these two optical rays, we simply apply equation (15) to these two points. It is easy to show that S (m c) is equal to zero and therefore that the cross-ratio fm c I J g is equal to -1. This means that hC ci is perpendicular to hC mi for each point m on the line at innity of the retinal plane. Since we have seen that each such point represents a line direction in the underlying ane plane, we have proved that the line hC ci is orthogonal to all lines in the retinal plane and therefore to the retinal plane itself. The optical ray hC ci can be considered as the optical axis of the camera. Note that this interpretation is valid only if the original (u v) plane is a real ane plane, i.e. if it has not been projectively distorted. In that case the line at innity represented by (0 0 1) is 18

not the real line at innity and we cannot say anymore that the line hC ci is perpendicular to the retinal plane. We call this problem the problem of the hidden projective transformation. In a similar spirit, we can give an interpretation of the angle dened above in terms of the retinal coordinate system. Indeed, let us consider the directions of the u- and v-axes, i.e. the points at innity of coordinates (1 0 0) and (0 1 0). The angle  beween these two directions is obtained by applying equation (16): cos  = ; cos which shows that the angle is ; or since we are actually measuring angles between lines (in order to be able to talk about angles between vectors, we would have to orient the plane and this is equivalent to distinguishing between the two absolute points). Since the matrix WT is upper-triangular, it denes a collineation which preserves the line at innity (an ane transformation). Therefore, after the change of coordinate system dened by p = WT m, the line at innity has not changed but the directions of the new u0- and v0 -axes are orthogonal since the equation of the image of the absolute conic is S (p) = pT p. Indeed, this 0 ) = 0 where u01 and v1 0 are the points of projective coordinates (1 0 0) implies that S (u01  v1 0 0 and (0 1 0) in the (u  v ) coordinate system and, according to equation (16) this shows that the angle between the new u0- and v0 -axes is equal to 2 . We can now give a familiar interpretation of the equations (18). We consider the matrix WT as dening a change of coordinate system from the ane plane (u v) to the ane plane (u0 v0 ). We know that the directions u0 and v0 are orthogonal. Let us consider an orthonormal system of coordinates centered at c with axes u0 and v0 . Let p be a point represented by the vector u0 v0  1]T in that coordinate system. The equation p = WT m and the equations (18) can be written as follows u0 = u; u0 + v; v0 cos v;v0 sin v0 =  where u and v are the ane coordinates of the point m. This shows that the (u v) (pixel) coordinate system is obtained from the (u0 v0 ) (normalized) coordinate system by translating the origin by (;u0 ;v0 ), rotating the v0 -axis by ; 2 and scaling the unit vectors in the u- and v-directions by 1 1  and  , respectively (see gure 7). This is precisely the denition of the intrinsic parameters given for example in 22] from heuristic considerations. As mentioned previously, this interpretation does not hold true anymore if the plane (u v) has been projectively distorted. The points u1 and v1 are not at innity anymore and the interpretation of does not make sense (hidden projective transformation problem). This could happen, for example, if the retinal plane were tilted with respect to the real optical axis of the optical system of the camera (misalignment). On the other hand, and this is very important, all angle measurements based on equation (16) are still valid because they do not make any assumption about the line at innity of the retinal plane. To summarize, the usual interpretation of the intrinsic parameters u  v   u0 v0 in terms of physical parameters attached to the camera is valid only if the original retinal plane has not been distorted by a projective transformation. But, even if this is the case, the camera can still be used to perform euclidean measurements through equation (16) which does not require the knowledge of the line at innity in the retinal plane. u

v

v

u

v

Figure 7 approximately here.

19

4.4.3 Two cameras and the absolute conic

As a nal property of the absolute conic, let us consider its pair of images in the two retinal planes of a stereo rig and the two epipolar planes which are tangent to the absolute conic (they are complex conjuguate, intersecting along the real line hC C 0 i). These two planes intersect the two retinal planes along two pairs of corresponding epipolar lines, by denition, and these epipolar lines are tangent to the images of the absolute conic in the two retinal planes. This property will be used in section 7.1.2 to derive the Kruppa equations.

5 Recovering the projective stratum In this section we show that a stereo rig for which the fundamental matrix has been estimated allows to recover the rst stratum of the world, its projective structure. Even though this can be done algebraically 15] we will develop here a purely geometric approach. But rst we give some indications about the way the fundamental matrix can be estimated from a pair of images.

5.1 Learning the fundamental matrix

The estimation of the fundamental matrix of a stereo rig is a problem which has recently received a lot of attention from a variety of people 2, 1, 5]. The basic idea is to use equation (13) for a number N of known pairs of corresponding pixels (mi m0i ). We obtain equations which are linear in the coordinates of matrix F. More specically, let us note f the 9-dimensional vector whose coordinates are the elements of F. Each equation (13) can be written as

aTi f = 0 and the whole set of equations can be written in matrix form

Af = 0 where A is an N  9 matrix. Let a8 and a9 be the last two column vectors of A, f = gT  f8 f9]T , and let us rewrite the previous equation as

Bg = ;f8a8 ; f9a9 Assuming that the rank of the N 7 matrix B is seven, we can solve for the rst seven components g of f in the usual way g = ;f8 (BT B);1BT a8 ; f9(BT B);1BT a9

The solution depends upon two free parameters f8 and f9 which can be determined by using the constraint det(F) = 0. We obtain a third-degree homogeneous equation in f8 and f9 and we can solve for their ratio. Since a third degree equation has at least one real root we are guaranteed to obtain at least one solution for F. This solution is dened up to a scale factor and some normalization must be performed in order to make comparisons. One possibility is to normalize f such that its vector norm is equal to 1. If there are three real roots, we choose the one which minimizes the vector norm of Af , subject to the previous constraint. In fact we can do the same computation for any of the 36 choices of pairs of coordinates of f and choose, among the possibly 108 solutions, the one which minimizes the previous vector norm. This approach has the problem that it does not minimize a meaningful criterion in terms of image measurements. Even though equation (13) can be normalized by imposing that the vector 20

norms of m and m0 are equal to 1, what we really would like to happen is that the image distance of m0 (resp. of m) to the epipolar line of m (resp. of m0 ) is small and this is not guaranteed by the previous approach. This has led people to minimize the sum over the pairs of corresponding pixels of the sum of the distance of m0 to the epipolar line of m and the distance of m to the epipolar line of m0. The reader can easily verify that this criterion is not polynomial in the elements of F and that its minimization poses the usual problems of minimizing a criterion which is not a positive quadratic form in the unknowns. The best results have been obtained by initializing he nonlinear criterion with the result of the rst method 5]. A stereo rig for which the fundamental matrix F is known is said to be weakly calibrated.

5.2 Recovering the projective structure of the world: the projective stratum

Let us choose ve pair of point correspondences (ai a0i ) i = 1     5 in the two images. These correspondence may have been obtained, for example, in the process of estimating the fundamental matrix. We choose the ve points Ai in the world as a projective basis. Note that these points are not known in the usual sense: the only thing we know is their projections in the two images are the pairs (ai a0i ). They must be such that no four of them are coplanar (section 3.2.1) but this property can be checked directly from the pair of images 15]. In order to show, for example, that the point A4 is not in the plane dened by A1  A2 A3 , it is sucient to show that the projective coordinates of a4 in the projective basis (e a1  a2 a3) are dierent of those of a04 in the projective basis (e0  a01 a02 a03 ). Given any futher point correspondence (m m0 ) it denes a 3-D point M . We will show that the ratios of its projective coordinates in the previous projective basis can be computed from the pair of images. In order to do this we will use the fact that each such ratio is the cross-ratio of four planes and use the Line-Plane construction described in section 4.2. Suppose for example that we want to compute the ratio of the third to the fourth projective coordinates, in the previous projective basis, of M of images (m m0). We have seen in section to be equal to the cross-ratio of the four planes (A1  A2 A5 ) (A1 A2  M ) (A1 A2  A3) and (A1 A2  A4 ). Let P and Q be the points of intersection of the line hA3 A4 i with the planes (A1  A2 A5 ) and (A1 A2  M ), respectively. Our cross-ratio is therefore equal to the cross-ratio of the four points (P Q A3  A4 ) which can be computed from either one of the two images after we construct the images (p p0) and (q q0) of P and Q which we can do using the Line-Plane construction of section 4.2. These coordinates are invariant, by denition, under any collineation of the world. We have therefore computed a projective invariant representation of the world from a pair of weakly calibrated cameras.

6 Recovering the ane stratum In this section, we show that a stereo rig for which the fundamental matrix and the collineation induced by the plane at innity have been estimated allows to recover the second stratum of the world, its ane structure.

6.1 Estimating the plane at in nity

In some cases, some ane invariant information about the scene may be available. For example, we may know that two lines are parallel. Two parallel lines intersect in the plane at innity and therefore the points of intersection of their images in the two retinal planes are the images of that 21

point in 1 . Another example is if we know the midpoint of a segment. Let a1 (resp.a01) and a2 (resp. a02) be the images of the two endpoints and let a (resp. a0 ) be the known images of the midpoint. What does it teach us about the plane at innity? well, let us consider the point at innity B of the line of support of our line segment. Since A is the midpoint of A1 A2 , the cross-ratio fA B A1 A2 g equals -1. Since the cross-ratio is preserved by perspective projection, The image b (resp. b0 ) of B satises fa b a1 a2 g = ;1 (resp. fa0  b0 a01  a20 g = ;1). In order to construct b (resp. b0 ) we only have to construct the harmonic conjugate of a with respect to a1 and a2 (resp. the harmonic conjugate of a0 with respect to a01 and a02) and this is a standard geometric construction that can be performed with a straight-edge only 21]. The correspondence (b b0) yields one point in the plane at innity. More generally, if we have three pairs of correspondences (ai  a0i ) i = 1 2 3 such that a1 a2  a3 (resp. a01 a02 a03 ) are aligned, then the corresponding 3-D points A1  A2  A3 are aligned if and only if the two cross-ratios fe a1 a2 a3 g and fe0  a01 a02 a03 g are equal. If we happen to know the ratio of lengths AA11 AA23 , then it determines the vanishing point b (resp. b0) of the two image lines and thus one point in the plane at innity, since we must have fa1  b a2 a3g = fa01  b0 a02 a03g = AA11 AA32 . If no such information is available but if we can control the displacement of our stereo rig then we can exploit the fact that if we translate it without rotating it, straight lines remain parallel to themselves. More precisely, suppose we have a line L with images l and l0 before the translation of the stereo rig. After the translation, the images of L are l1 and l10 and because the rig has translated, the points a and a0, intersections of l and l1 in the rst image and of l0 and l10 in the second, are the images of a point A at innity, i.e. the point at innity of L. Note that in order to obtain this information we must have obtained the correspondence (l l0 ) by some other process and kept track of l (resp. l0 ) while the rig was translating in order to obtain the correspondence (l1 l10 ). A variant of this idea which has been implemented in the author's laboratory is to obtain point correspondences between the two images (this is needed to estimate the fundamental matrix) and then track them while the rig is translating 6]. Since two points dene a line we are back to the line case.

6.2 Ane reconstruction

Suppose that we have identied the plane at innity. As strange as this may sound, the plane at innity has nothing special to it and, just as a regular plane, it induces a collineation between the two images. We know that this collineation is in general dened by four point correspondences but, if the fundamental matrix has been estimated, three point correspondences are sucient. We have described in the previous section very simple ways of obtaining these correspondences by actively moving the cameras or by using some information about the scene. We now choose four pairs of point correspondences (ai a0i ) in the two images. We choose the corresponding four points Ai in the world as an ane basis. More precisely, we choose A1 as the origin of the ane frame and the three vectors A1Ai ei;1 i = 2 3 4 as the basis vectors. This assumes that none of the four points Ai lies in the plane at innity. This can be checked since the collineation H1 induced by 1 between the retinal planes is known. We simply have to check that H1 ai is suciently dierent from a0i for each i. Given any further point correspondence (m m0), not in the plane at innity, we will show that the ane coordinates of the corresponding 3-D point M in the ane basis (A1 e1  e2 e3) can be computed from the pair of images. We will do it in two dierent ways. First we will simply adapt the method presented in section 5.2 to this case and second, we will present a somewhat more intuitive construction. Both methods are of course equivalent. 22

The ane basis (A1 e1 e2  e3) can be considered as a projective basis (A1 A121  A131 A141 A5 ) where the points A1i1  i = 2 3 4 are the points at innity of the lines hA1 Ai i and A5 is the point of coordinates (1 1 1) in the ane basis. Since the images of the three points A1i1 i = 2 3 4 can be constructed using the procedure Line-Plane and if the images of A5 can be constructed from the images, we can apply exactly the projective scheme described in section 5.2. Indeed, we know from section 3.3 that when the projective coordinates in P 3 are chosen in such a way that the equation of the plane at innity is x4 = 0 the ane coordinates of a point in P 3 1 are the ratios of its rst three projective coordinates to the fourth. This construction is shown in gure 8. Figure 8 approximately here. It remains to show how to construct (a5 a05 ). According to gure 9, this can be done in three main steps, each of them being implementable in the images: 1. Construct P, intersection of the line going through A2 and parallel to hA1 A3 i with the line going through A3 and parallel to hA1 A2 i. 2. Construct the line going through P and parallel to hA1  A4i. 3. Construct the line through A4 parallel to hA1  P i. These last two lines intersect in A5 . Figure 9 approximately here. The corresponding construction in the rst image plane follows the same pattern and is shown in gure 10. In what follows, we denote by vpq the vanishing point of the image line hp qi where p and q can take any of the four values a b c and d. For example, vab is the vanishing point of the line ha bi. 1. Construct, using the procedure Line-Plane, the vanishing points va1 a2 and va1 a3 . p is at the intersection of ha3 va1 a2 i and of ha2 va1 a3 i. 2. Construct, using the procedure Line-Plane, the vanishing points va1 a4 of the line ha1 a4i. 3. Construct, using the procedure Line-Plane, the vanishing point r of the line ha1 pi. The point a5 is at the intersection of ha4 ri and hp va1 a4 i. Figure 10 approximately here. The second method may be more intuitive and can be found in 24]. According to gure 11, in order to compute the ane coordinates of M, we need to construct the images of two points: the point Q4 on the line hA1  A4i such that the line hM Q4i is parallel to the plane (A1 A2  A3 ), and the point Q, intersection of the line going through M and parallel to hA1  A4i with the plane (A1  A2 A3 ). From Q we then compute Q2 (resp. Q3), intersection of the line going through Q and parallel to hA1 A3 i (resp. parallel to hA1 A2 i) with hA1  A2i (resp. with hA1 A3 i). The three ane coordinates of M are the ratios AA11 QA  i = 2 3 4. Introducing the points at innity A1i1  i = 2 3 4 of the four lines hA1  Ai i, these ratios are in fact equal to the cross-ratios fA1  A1i1 Qi  Ai g which are preserved by perspective projection and can be computed from the images. i

i

23

Figure 11 approximately here. The images of Q are readily obtained: va1 a4 (resp. va1 a4 ) can be constructed through the procedure Line-Plane. This pair, together with the pair (m m0 ), dene a 3-D line and, applying our procedure Line-Plane a second time we construct the images (q q0 ) of the intersection of that line with the plane (A1 A2  A3 ). Once this construction has been completed, one notices that the line hM Q4i is parallel to the line hA1 Qi. We thus construct, using again the procedure Line-Plane, the vanishing points va4 q and va4q of the lines ha4 qi and ha04 q0i. q4 (resp. q40 ) is then obtained as the intersection of the line hva4 q  mi (resp. hva4 q  m0i) with the line ha1 a4i (resp. ha01 a04i. From va1 a2 (resp. va1 a2 ) va1 a3 (resp. va1 a3 ) we construct the points of intersection q2 and q3 (resp. q20 and q30 ) of the lines hq va1 a3 i and hq va1 a2 i (resp. of the lines hq0  va1 a3 i and hq0 va1 a2 i with ha1 a2 i and ha1 a3 i). The ane coordinates (X Y Z ) of M are then obtained in either one of the two images as the following cross-ratios 0

0

0

0

0

0

0

0

0

0

0

0

0

0

X = fa1  va1 a2 q2 a2 g = fa01 va1 a2 q20  a02g Y = fa1  va1 a3 q3 a3 g = fa01 va1 a3 q30  a03g Z = fa1  va1 a4 q4 a4 g = fa01 va1 a4 q40  a04g 0

0

0

0

0

0

These coordinates are invariant, by denition, under any collineation of the world. We have therefore computed an ane invariant representation of the the world from a pair of weakly calibrated cameras for which the collineation induced by the plane at innity is known. As a nal remark to conclude this section, in many cases one may not be interested in computing these ane coordinates, only in computing ane invariant three-dimensional properties of the scene such as ratios of lengths, midpoints, in checking ane invariant three-dimensional properties of the scene such as parallelism of lines, planes, or even in performing ane invariant constructions such as drawing a line going through a given point and parallel to a given line, constructing the midpoint of a line segment, etc... All these operations can be performed without choosing coordinates just by using the fundamental matrix and the knowledge of the plane at innity. If coordinates must be computed that can also be done directly from the images themselves and without explicitely reconstructing the points.

7 Recovering the euclidean stratum In this section we want to push our ideas to their nal stage and show that a stereo rig for which the fundamental matrix, the collineation induced by the plane at innity and the two images of the absolute conic have been estimated allows to recover the third stratum of the world, its euclidean structure or, more precisely, its structure up to a similitude. It is in fact redundant to know the collineation at innity and the two images of the absolute conic as shown later.

7.1 Estimating the image of the absolute conic

We now describe several ways of estimating the image of the absolute conic. First, in some cases, some similitude invariants of the scene may be available. For example, we may know the angle between two lines, or the ratio of the lengths of two non parallel segments. Each such bit of information yields a constraint on the image of the absolute conic, an idea that is used, at least in the case of angles, in 25]. 24

7.1.1 A priori information about the scene

If we know the angle  between two lines in the world, according to the analysis of section 3.4.2, and to equation (16), this yields the following constraint on the coecients of the equation of !

S (m n)2 = S (m)S (n) cos2 

(19)

This equation is seen to be a quadratic constraint on the coecients of the equation of !. If we have two images of the scene for which we know the plane at innity (see section 6), then we can obtain the vanishing point vpq of any line hp qi. Now if we know the ratio of the lengths of two non coplanar segments AB and CD, we can use equation (27) which will be derived in the section 7.3.2 to derive another constraint on the coecients of the equation of !. Let us call r the AB and dene (known) ratio CD

D(vpq  vst ) = S (vpq  vst)2 ; S (vpq )S (vst)

Using equations (19) and (27) we obtain the following constraint on the coecients of the equation of !

D(vac  vbc )D(vcd vbd )S (vab) = (20) r2 D(vac vab )D(vbc vbd)S (vcd ) which is seen to be a polynomial of degree 5 in the coecients of the equation of !. A similar constraint on the image of the absolute conic in the second image can be written. If that image has been obtained with the same camera without changing the internal parameters, then we obtain a second constraint on !.

7.1.2 Moving the camera and using Kruppa's equations

If no a priori information about the scene is available, we can still estimate the image of the absolute conic by using motions of the cameras. Note that the camera motions do not have to be known and can be anything as long as they are not pure translations or pure rotations, as shown in section 7.2.4. This observation was made in 26] and turned into an algorithm and a working method in

2, 1]. We now show that each such motion yields two quadratic polynomial equations in the coecients of the equation of the dual of the image of the absolute conic. In order to do this, we note that if we move a camera from position 1 to position 2 without changing its internal parameters, the image of the absolute conic remains the same, as was pointed out in sections 3.4.2 and 4.4. Also, as we noticed in section 4.4.3, the two tangents from the epipoles to this image correspond to each other in the epipolar homography. Expressing these two facts algebraically yields the two equations. Let B be the matrix of the dual of the image of the absolute conic. Let m be a point in the retinal plane, e the epipole. The line he mi is represented by the cross-product e ^ m which we write in matrix form e]m with e] being the antisymmetric matrix representing the cross-product with the vector e. To say that this line is tangent to ! is equivalent to saying that the point represented by e]m is on the dual conic ! . Hence we write the algebraic equation

(e]m)T Be]m = 0 or

mT e]Be]m = 0 25

(21)

This quadratic equation in the coordinates of m is the equation of the two tangents from e to !. From the previous considerations, for each point m on either one of these two tangents, its epipolar line must also be tangent to the image of the absolute conic in the second image. But we have seen that the image of the absolute conic in the second image is identical to its image in the rst. The same is of course true of the dual conics. Therefore, introducing the fundamental matrix F, we can write that mT FT BFm = 0 (22) if and only if the point m is on one of the previous two tangents. This second quadratic equation in the coordinates of m therefore also denes the two tangents from e to ! and, thus, The two equations (21) and (22) are equivalent. This yields a priori ve quadratic equations in the coecients of B. But in fact, because equations (21) and (22) represent a pair of lines and not a general conic, only two of these ve equations are independent since it is sucient to look at the intersection of the tangents with another line not going through e. In 27, 28, 26, 2], the line was chosen to be the line at innity but in principle any line not going through e will do. The two equations are called the Kruppa equations in recognition of the work of this Austrian mathematician 29] who worked on a variant of a problem posed by Chasles 30]. For details about the implementation of these ideas and experimental results, see 1, 4].

7.2 From a pair of images of the absolute conic to the plane at in nity

We have now estimated the images ! and !0 of the absolute conic in the two retinal planes of our stereo rig. Since the absolute conic lies in the plane at innity, from the two conics and the epipolar geometry dened by the fundamental matrix, we should be able to recover the plane at innity.

7.2.1 From two images of the absolute conic to H1

As shown previously, this is equivalent to estimating the collineation that it induces between the two retinal planes and, for that matter, three point correspondences are sucient. The question is of course how to obtain such correspondences. We may think of choosing a point m on ! and compute the intersection of its epipolar line with !0 to obtain a corresponding point m0 on !0. The problems with this approach is that m has complex coordinates and that its epipolar line (also a complex line) intersects !0 in general in two complex points. Thus we have an ambiguity. One way to go around this diculty is to do a bit of geometry. Let m be a point in the rst retina. The optical ray hC mi intersects the plane at innity in a point which we denote by M1 . How can we build the image m01 of M1 in the second retinal plane? this point is on the epipolar line lm0 of m and is such that the angle between the line hC C 0 i and the the optical ray hC 0 m01 i is the same as the angle between hC C 0 i and the optical ray hC mi. But this angle is known since we know ! and the epipoles: consider the two points a and b of intersection of the epipolar line he mi with !, the angle is given by the cross-ratio fe m a bg. Therefore, considering the two points of intersection a0 and b0 of lm0 with !0, m01 can be built as the point of lm0 such that the cross-ratio fe0  m01 a0  b0g is equal to fe m a bg. These two cross-ratios being of course equal to the cross-ratio fE M1 A B g as shown in gure 12. The details of the computation can be found in appendix A. Figure 12 approximately here. 26

The situation is of course symmetric between the two retinal planes and, given a point m0 in the second retinal plane, we can similarly build the image m1 of the intersection M10 of the optical ray hC 0  m0 i with the plane at innity. We can thus build an arbitrary large number of pairs of point correspondences (m m01 ) or (m1 m0 ) corresponding to points in the plane at innity. From these pairs, the collineation H1 can be estimated.

7.2.2 A three-dimensional euclidean interpretation of H1

Let us show that after an ane change of coordinates in the two retinal planes such that the equation of the absolute conic in the rst image (resp. the second) is pT p = 0 (resp. p T p0 = 0, its matrix H1 is proportional to a rotation matrix. Indeed, let us suppose that we change coordinate systems in the two retinal planes and dene p = WT m and p0 = W T m0 where W and W0 are dened from the equations of ! and !0 as explained in section 4.4.2. The equations of the images ! and !0 of the absolute conic are S (p) = pT p = 0 and S 0 (p0) = p T p0 = 0. Let p be a point of ! and p0 its image under H1 . p0 belongs to !0 and therefore S 0 (p0) = 0. But this is also equal to pT HT1H1p and must equal 0 for all points p of !, i.e. HT1 H1 is proportional to the identity matrix I. This shows that the matrix H1 is proportional to a rotation matrix RT . This matrix has a very intuitive interpretation. If we consider an orthonormal system of coordinates centered at the optical center C1 (resp. C2 ) with vectors parallel to the optical axis hC1 c1 i (resp. hC2 c2i and the u01- and v10 - orthogonal directions (resp. the u02- and v20 - orthogonal directions) the matrix R is the one transforming the directions of the axes of the rst coordinate system into those of the second, see gure 13. If we express H1 in the pixel coordinate systems: 0

0

0

H1 = W ;T RWT 0

which shows that in the case where the two cameras are identical, i.e. W = W0 , H1 is such that

R = WT H1W;T :

Hence, using the fact that R is orthogonal, and introducing the matrix A, we write:

A = HT1AH1 This shows that if the collineation H1 has been estimated by some means, perhaps by matching points between the two views, then this matrix equation can be used to solve for the coecients of A and then, by the Cholesky decomposition, for those of W. This idea has been proposed and put into a working algorithm by Hartley 31]. Figure 13 approximately here. The idea that was used to construct H1 from ! and !0 can be used to compute directly the vanishing points of a pair (l l0 ) of image lines without using the procedure Plane-Line. Here is how it works. Suppose that l is dened by the two points (m p) (resp. l0 is dened by the two points (m0  p0)). Note that we do not require that either m and m0 or p and p0 be corresponding points. We then build m01 and p01 in the second retinal plane, images of the points at innity of the optical rays hC mi and hC pi, and m1 and p1 in the rst retinal plane, images of the points at innity of the optical rays hC 0  m0 i and hC 0  p0i. The vanishing point q of l (resp. q0 of l0 ) is then obtained as the intersection of l and hm1 p1 i (resp. of l0 and hm01 p01 i). We call this procedure the Vanishing-Point procedure. 27

7.2.3 How H1 constrains ! and !0

We show in the next section how to use both the homography of the plane at innity and the images of the absolute conic to compute ratios of distances, an invariant for the group of similitudes. But before going into this, we study how the knowledge of the collineation H1 of the plane at innity constrains ! and !0. Since ! and !0 are the images of  in the two retinas, their duals ! and !  are the images of . As H1 is the collineation from the rst retinal plane to the second induced by 1, H;1T represents the collineation from the dual of the rst retinal plane to the dual of the second. In other words, lines of R are transformed into lines of R0 by the collineation H;1T . Indeed, let l0 be a line of R0 , represented by l0. Its equation can be written 0

l T m0 = 0 0

(23)

But for all points of R0 there exists a unique point m of R such that

m0 = H1m Replacing m0 by its value in (23), we obtain

(HT1l0)T m = 0 and therefore, l0 is the image of the line l of R represented by H;1T l.

Let now l and l0 be corresponding lines for the collineation H1 . If l belongs to !, we have lT Bl = 0. But l0 must then belong to !  and satisfy l T B0l0 = 0. This implies the following relation between B and B0 H1BHT1 = B0 (24) and imposes six homogeneous linear constraints on the coecients of B and B0. If the epipolar geometry is also known, a reasoning similar to that of the previous section shows that FT BF = B0 (25) which yields another set of six homogeneous linear equations which according to (14) are not independent of (24). 0

0

7.2.4 The Longuet-Higgins equation, pure camera translation

Let us nish this section by giving an interpretation of the fundamental matrix F when each retinal plane is referred to its normalized coordinates. We know that in this case we can choose H1 = RT . We then write equation (14)

RF + FT RT = 0

which says that the matrix FT RT is antisymmetric. Let us write it t], where t is a vector. We thus have:

FT = t] R

which shows that the transpose of the fundamental matrix is nothing but the essential matrix E dened by Longuet-Higgins in his 1981 paper 32]. The properties of this matrix have been subsequentally studied by several authors 33, 34, 22]. The vector t which appears in the denition is parallel to the line hC1 C2 i. The equation (13) becomes the well known Longuet-Higgins equation

mT Em0 = 0 28

. This allows us to interpret the vector t just introduced as the direction of the translation between the two optical centers, i.e. the direction of the line hC1 C2i in gure 13. Returning to equations (21) and (22), we see that if the displacement between the two camera positions is a pure translation, i.e. R = I, we have FT = t] = e]. The equations (21) and (22) are identical and there are no Kruppa equations in that case.

7.3 From a pair of images of the absolute conic, the plane at in nity to similitude invariants 7.3.1 Angles

We know from a previous section that the knowledge of the image of the absolute conic in one camera allows to compute angles between optical rays. Knowing the images of the absolute conic in two cameras allows to compute angles between any lines. For example, given three points A B C in the world with images a b c (resp. a0  b0 c0 ) in the rst (resp. second) retinal plane, how do we compute the angle between the lines hA B i and hA C i? Let P (resp. Q) be the points at innity of hA B i (resp. hA C i). The angle is obtained by a straightforward application of Laguerre's formula. In order to be able to compute the cross-ratio that appears into it, we must be able to compute the images of P and Q, i.e. the vanishing points of the image lines ha bi (resp. ha0 b0 i) and ha ci (resp. ha0 c0i). This is possible since, according to the previous section, we know the plane at innity. We can thus call upon our Line-Plane or Vanishing-Point procedures. More precisely, and according to equation (16), we have 0 0 0 (26) cos(hA B i hA C i) = ; pSS((pp)Sq()q) = ; pSS0 ((pp0)Sq0()q0) p and of course the sine is obtained as 1 ; cos2(hA B i hA C i). More generally, the angle between two general lines hA B i and hC Di, not necessarily coplanar is obtained by considering the point at innity P of hA B i and the point at innity Q of hC Di (i.e. the directions of the two lines) and computing the cross-ratio of P and Q and the points of intersection of the line hP Qi with the absolute conic, this computation being of course performed in the images, i.e. using equation (26).

7.3.2 Ratios of lengths

Let us now describe how we can compute the other type of similitude invariants, the ratios of lengths. Using the fact that angles can be computed, we show how to use them to compute ratios of lengths. Let us consider four points A B C and D and suppose we want to compute the ratio AB CD . Considering the two triangles ABC and BCD, as shown in gure 14, we can write, for the rst triangle AB BC sin = sin  and, for the second BC = CD sin  sin  AB from which we obtain the ratio CD as a function of the four angles   and  AB = sin  sin  (27) CD sin   sin  29

Figure 14 approximately here. Figure 15 shows the computation in the rst image plane. Using the procedure Line-Plane or Vanishing-Point, we construct the four vanishing points vab  vac  vbc  vbd , and vcd of the image lines ha bi ha ci hb ci hb di and hc di. The sines which appear in equation (27) are then AB obtained through equation (16). More specically, we obtain the neat formula for the ratio CD computed in the rst image in which appears only the equation of !: s

AB = S (vab )  S (vac )S (vbc ) ; S (vac  vbc )2  S (vbd )S (vcd ) ; S (vbd  vcd )2 CD S (vcd ) S (vab )S (vac ) ; S (vab vac )2 S (vbc )S (vbd) ; S (vbc  vbd )2 The geometry is shown is gure 15.

(28)

Figure 15 approximately here.

8 Conclusion Table 1 approximately here. We have reached the end of a fairly long journey in which we have seen that when we look at the physical world with a set of two cameras, there appears a natural hierarchical set of geometric descriptions of this world which involve a trilogy of groups of transformations which leave these descriptions invariant, i.e. among which we cannot discriminate. These three groups are the projective, the ane, and the similitude groups. For each of these descriptions, we have indicated a corresponding geometric property of the set of two cameras which, once known, allows to recover the related description from pairs of corresponding image features. We have also indicated how these properties of the pair of cameras could be estimated from images of the physical world. This is interesting in itself as well as in connection with the psychophysical work of Droulez and Cornilleau

35] in which they showed that humans with normal uncorrected vision and wearing distorting lenses could recover, after some time, the ability to perform correct metric judgments. Another important aspect of our work is that it clearly shows that, for each subgroup of interest, all three-dimensional invariants of the scene can be estimated directly from the images without performing an explicit 3-D reconstruction of the scene. This may buy stability in applications in particular because it avoids the problem, mentioned in section 4.4.2, of the hidden projective transformation. But this has to be checked experimentally. Table 1 summarizes the relations between the three strata of the physical world, the geometric properties of the stereo rig, and some of the three-dimensional quantities that can be recovered directly from the images.

30

A Computing m01

From section 4 we know that the angle between the optical ray hC mi and the baseline hC C 0 i is given by: cos = ; pSS((ee)Sm()m) Therefore we want to nd m01 on the epipolar line lm0 such that

S 0 (e0 m0 )

p 0 0 01 0 S (e )S (m1 )

is equal to ; cos . Let us choose any point m0 on lm0 and write

m01 = m0 + e0 The problem is to determine . We write

S 0 (e0  m01 ) = S 0 (e0 m0 ) + S 0 (e0) S 0 (m01) = S 0 (m0 ) + 2S 0 (e0 m0 ) + 2 S 0 (e0) We then express the fact that cos2 = SS(e ()eSm(m ) ) . We obtain a quadratic equation in the unknown : 2S 02 (e0 ) sin2 + 2S 0 (e0)S 0 (e0 m0 ) sin2 + S 02 (e0 m0 ) ; S 0 (e0)S 0 (m0) cos2 = 0 In order to compute its roots, we compute the discriminant 2

0

0

0

0

0

0 1 0 1

0 = S 0 (e0)S 0 (m0 ) ; S 02 (e0 m0 )]S 02(e0) sin2 cos2 Since, according to equation (15), the quantity S 0 (e0)S 0 (m0) ; S 02 (e0 m0 ) is positive, our equation has two real roots

0 0 0) j cos j qS 0 (e0 )S 0(m0 ) ; S 02 (e0 m0 )  = ; S S(e0 (em 0) S 0 (e0 ) sin

The sign of the cosine of the angle between hC 0 m01 i and hC C 0 i is given by S 0 (e0 m01 ) which is equal to j qS 0 (e0 )S 0 (m0) ; S 02 (e0  m0) S 0 (e0  m01 ) = S 0 (e0  m0) + S 0 (e0) = j cos sin therefore only one of the two roots provides the correct sign and the solution is unique, as expected.

31

4

e1

M

2

1

e3

3

e4

P

Q

e2 e5 Figure 1: The ratio of the third to the last projective coordinates of the point M in the projective basis (e1 e2  e3 e4  e5) is equal to the cross-ratio of the four planes (e1 e2 e3 ), (e1  e2 e4 ), (e1 e2  e5), and (e1 e2 M ).

32

jm

l1

l2

im

J

m2

l1

I m

m1

Figure 2: The angle  between l1 and l2 is given by Laguerre formula:  = 21i log(fl1 l2 im jm g).

33

Π M e’ C

e

C’ l’m

m

lm’

m’

R

R’

Figure 3: The epipolar geometry.

34

M2

M1

M3

M4 M5

e C C’ e’

Figure 4: The epipolar pencils.

35

m1

lm0

m0

m

m012 m12

m03

m01 m3

lm0 12 m02

m2

Figure 5: Construction of the image m0 of the point m under the collineation induced by the plane dened by the three point correspondences (mi  m0i ) i = 1 2 3 and the epipolar geometry.

36

M A B

m a !

C

 N

b n

1

Figure 6: How to compute the angle between the optical rays hC mi and hC ni using the image of the absolute conic.

37

u

i = u I j = ;u cot I + sin  J v

u0

m

i I

c j



J

v0

v

Figure 7: From pixel-coordinates (u v) to normalized coordinates (u0 v0 ).

38

4

A1 A3

1

3 A31

2 A4

M Q

A41

P

1

A2 A5

A21 Figure 8: The third ane coordinates of M in the ane basis of origin A1 and basis vectors A1Ai i = 2 3 4 is equal to the cross-ratio of the four planes (A1  A2 A3 ), (A1 A2  A4 ), (A1 A2  A5 ) and (A1  A2  M ).

39

A4

A5 A3

A1

A2

P

Figure 9: Three dimensional construction of A5 .

40

va1 a4 va1 a3

a5

a4

r

a3 p a1 a2 va1 a2 Figure 10: Construction of a5, image of A5 in the rst retina.

41

1

A141

A4 Q4 A1

Q3 Q2

A131

M Q

A3 Q1 A2 A121

Figure 11: Computation of the ane coordinates of M .

42

M1 B



1 A E

e b

m

C0

C a

b0

a0 0e m1

0

Figure 12: Construction of the pair (m m01 ) corresponding to the point M1 of the plane at innity

1 .

43

u01

u02 v10

C1

v20 C2

R

Figure 13: The collineation induced by the plane at innity 1 is proportional to the matrix R of the 3-D rotation.

44

D  A



 B

C

Figure 14: Computing the ratio of the lengths of the two segments AB and CD from the angles   and .

45

d

vbc vab

b a c

vcd

vbd

vac

Figure 15: Computing the ratio of the lengths of the two segments AB and CD from the vanishing points vab  vac  vbc  vbd , and vcd of the image lines ha bi ha ci hb ci hb di and hc di.

46

Geometric Structure Projective

Stereo Invariant Rig Measures Fundamental Cross-ratios Matrix Ane Collineation of Ratios of lengths the plane at of parallel segments innity Similitude Images of the Angles, ratios of absolute conic lengths of non-parallel segments

Table 1: Relations between the three strata and the geometric properties of the stereo rig.

47

References

1] Olivier D. Faugeras, Tuan Luong, and Steven Maybank. Camera self-calibration: theory and experiments. In Giulio Sandini, editor, Proceedings of the 2nd European Conference on Computer Vision, pages 321334. Springer-Verlag, Lecture Notes in Computer Science 588, 1992.

2] Quang-Tuan Luong. Matrice Fondamentale et Calibration Visuelle sur l'Environnement-Vers une plus grande autonomie des syst mes robotiques. PhD thesis, Universit de Paris-Sud, Centre d'Orsay, December 1992.

3] Olivier Faugeras, Bernard Hotz, Herv Mathieu, Thierry Viville, Zhengyou Zhang, Pascal Fua, Eric Thron, Laurent Moll, Grard Berry, Jean Vuillemin, Patrice Bertin, and Catherine Proy. Real time correlation based stereo: algorithm implementations and applications. Research Report 2013, INRIA Sophia-Antipolis, France, 1993. Submitted to The International Journal of Computer Vision.

4] Quang-Tuan Luong and Olivier Faugeras. The Fundamental matrix: theory, algorithms, and stability analysis. The International Journal of Computer Vision, 1994. To appear.

5] R. Deriche, Z. Zhang, Q.-T Luong, and O. Faugeras. Robust Recovery of the Epipolar Geometry for an Uncalibrated Stereo Rig. In Jan-Olof Eklundh, editor, Proceedings of the 3rd European Conference on Computer Vision, pages 567576, Vol.I, Stockholm,Sweden, 1994. Springer-Verlag, Lecture Notes in Computer Science 800-801.

6] C. Zeller and Olivier Faugeras. Applications of non-metric vision to some visual guided tasks. In Proc. International Conference on Pattern Recognition, Israel, 1994. To appear.

7] Jan J. Koenderink and Andrea J. van Doorn. Ane Structure from Motion. Journal of the Optical Society of America, A8:377385, 1991.

8] Gunnar Sparr. An algebraic-analytic method for reconstruction from image correspondences. In Proceedings 7th Scandinavian Conference on Image Analysis, pages 274281, 1991.

9] Gunnar Sparr. Projective invariants for ane shapes of points congurations. In J.L. Mundy and A. Zisserman, editors, Proceedings of DARPA-ESPRIT Workshop on Applications of Invariance in Computer Vision, pages 151169, 1991.

10] A. Sparr and G. Sparr. On a theorem of M. Chasles. Technical Report LUFTD2/TFMA-7001SE, Lund University, Dept. of Mathematics, 1993.

11] Gunnar Sparr. A common framework for kinetic depth, reconstruction and motion for deformable objects. In Jan-Olof Eklundh, editor, Proceedings of the 3rd European Conference on Computer Vision, volume 801 of Lecture Notes on Computer Science, pages 471482. SpringerVerlag, 1994.

12] Gunnar Sparr. Applications of a theorem of Chasles to computer vision. Technical Report LUFTD2/TFMA-7002-SE, Lund University, Dept. of Mathematics, 1994.

13] Roger Mohr, Luce Morin, and Enrico Grosso. Relative positioning with uncalibrated cameras. In J.L. Mundy and A. Zisserman, editors, Proceedings of DARPA-ESPRIT Workshop on Applications of Invariance in Computer Vision, Articial Intelligence Series, chapter 22, pages 440460. MIT Press, 1992. 48

14] Richard Hartley, Rajiv Gupta, and Tom Chang. Stereo from Uncalibrated Cameras. In Proceedings of CVPR92, Champaign, Illinois, pages 761764, June 1992.

15] Olivier D. Faugeras. What can be seen in three dimensions with an uncalibrated stereo rig. In Giulio Sandini, editor, Proceedings of the 2nd European Conference on Computer Vision, pages 563578. Springer-Verlag, Lecture Notes in Computer Science 588, May 1992.

16] Amnon Shashua. Projective Structure from two Uncalibrated Images: Structure from Motions and Recognition. Technical Report A.I. Memo No. 1363, MIT, September 1992.

17] Amnon Shashua. Projective Depth: A Geometric Invariant for 3D Reconstruction From Two PerspectivenOrthographic Views and For Visual Recognition. In Proc. Fourth International Conference on Computer Vision, pages 583590, 1993.

18] Amnon Shashua. Algebraic Functions for Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1994. To appear.

19] Olivier Faugeras and Luc Robert. What can two images tell us about a third one? The International Journal of Computer Vision, 1994. To appear. Also INRIA Technical report 2018.

20] Q.-T. Luong and T. Viville. Canonic representations for the geometries of multiple projective views. In Jan-Olof Eklundh, editor, Proceedings of the 3rd European Conference on Computer Vision, pages 589599, Vol. I. Springer-Verlag, Lecture Notes in Computer Science 800-801, 1994.

21] J.G. Semple and G.T. Kneebone. Algebraic Projective Geometry. Oxford: Clarendon Press, 1952. Reprinted 1979.

22] Olivier D. Faugeras. Three-Dimensional Computer Vision: a Geometric Viewpoint. MIT Press, 1993.

23] Gene H. Golub and Charles F. Van Loan. Matrix computations. The John Hopkins University Press, Baltimore, Maryland, 1983.

24] T. Moons, L. Van Gool, M. Van Dienst, and E. Pauwels. Ane structure from perspective image pairs under relative translations between object and camera. Technical Report KUL/ESAT/MI2/9306, Katholieke Universiteit Leuven, 1993.

25] Boubakeur Boufama, Roger Mohr, and Franoise Veillon. Euclidean Constraints for Uncalibrated Reconstruction. Technical Report RT96 IMAG 17 LIFIA, LIFIA, INSTITUT IMAG, March 1993.

26] S.J. Maybank and O.D. Faugeras. A Theory of Self-Calibration of a Moving Camera. The International Journal of Computer Vision, 8(2):123152, August 1992.

27] Olivier D. Faugeras and Steven Maybank. Motion from point matches: multiplicity of solutions. The International Journal of Computer Vision, 4(3):225246, June 1990. also INRIA Tech. Report 1157.

28] Olivier D. Faugeras and Steven Maybank. Mouvement partir de points : nombre de solutions. Comptes rendus de l'Acadmie des Sciences de Paris, pages 177183, 1991. t. 312, Srie II. 49

29] E. Kruppa. Zur Ermittlung eines Objektes aus zwei Perspektiven mit innerer Orientierung. Sitz.-Ber. Akad. Wiss., Wien, math. naturw. Kl., Abt. IIa., 122:19391948, 1913.

30] M. Chasles. Question No. 296. Nouv. Ann. Math., 14:50, 1855.

31] Richard Hartley. Self-Calibration from Multiple Views with a Rotating Camera. In Jan-Olof Eklundh, editor, 3rd European Conference on Computer Vision, volume 800 of Lecture Notes on Computer Science, pages 471478, Stockholm, May 1994. Springer-Verlag.

32] H.C. Longuet-Higgins. A Computer Algorithm for Reconstructing a Scene from Two Projections. Nature, 293:133135, 1981.

33] Thomas S. Huang and Olivier D. Faugeras. Some Properties of the E Matrix in TwoView Motion Estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(12):13101312, December 1989.

34] S.J. Maybank. Properties of Essential Matrices. International Journal of Imaging Systems and technology, 2:380384, 1990.

35] Jacques Droulez and Valrie Cornilleau. Adaptive changes in perceptual responses and visuomanual coordination during exposure to visual metrical distortion. Vision Research, 26(11):17831792, 1986.

50

List of Figures 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

The ratio of the third to the last projective coordinates of the point M in the projective basis (e1 e2 e3  e4 e5 ) is equal to the cross-ratio of the four planes (e1 e2  e3), (e1 e2  e4), (e1 e2  e5), and (e1 e2 M ). : : : : : : : : : : : : : : : : : : : : : : : : : The angle  between l1 and l2 is given by Laguerre formula:  = 21i log(fl1  l2 im jmg). The epipolar geometry. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : The epipolar pencils. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Construction of the image m0 of the point m under the collineation induced by the plane dened by the three point correspondences (mi  m0i ) i = 1 2 3 and the epipolar geometry. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : How to compute the angle between the optical rays hC mi and hC ni using the image of the absolute conic. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : From pixel-coordinates (u v) to normalized coordinates (u0 v0 ). : : : : : : : : : : : : The third ane coordinates of M in the ane basis of origin A1 and basis vectors A1Ai  i = 2 3 4 is equal to the cross-ratio of the four planes (A1  A2 A3 ), (A1  A2 A4 ), (A1 A2  A5 ) and (A1 A2 M ). : : : : : : : : : : : : : : : : : : : : : : Three dimensional construction of A5 . : : : : : : : : : : : : : : : : : : : : : : : : : : Construction of a5 , image of A5 in the rst retina. : : : : : : : : : : : : : : : : : : : Computation of the ane coordinates of M . : : : : : : : : : : : : : : : : : : : : : : : Construction of the pair (m m01 ) corresponding to the point M1 of the plane at innity 1. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : The collineation induced by the plane at innity 1 is proportional to the matrix R of the 3-D rotation. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Computing the ratio of the lengths of the two segments AB and CD from the angles   and . : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Computing the ratio of the lengths of the two segments AB and CD from the vanishing points vab  vac  vbc  vbd , and vcd of the image lines ha bi ha ci hb ci hb di and hc di. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

51

32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

List of Tables 1

Relations between the three strata and the geometric properties of the stereo rig. : : 47

52