ISR Report 04 .fr

... recognition system is based on eigenspace decompositions for face representation ... of a discrete-time controlled process that is governed by the linear stochastic ... The random variables w and v represent the process and measurement ...
614KB taille 5 téléchargements 368 vues
Human Machine Interfaces Recognition and Tracking

LAAS/ISR Report 04 Recognition and tracking using eigen objects and Kalman filter Date: 11th August 2005

Enguerran Boissier LAAS, Universite Paul Sabatier, 118, route de Narbonne, 31062 TOULOUSE FRANCE http://www.iup-ups.ups-tlse.fr/si

Project sponsored by ISR University of Coimbra - Polo II Pinhal de Marrocos, 3030-290 Coimbra PORTUGAL http://www.isr.uc.pt

Abstract: One application of Human Machine Interfaces is the interpretation of human motions by robots. Hands and face are human body parts which can convey a lot of information for the interaction between humans and robots. To extract this information by a camera observing the human, after have detect them, we need to recognize and track these parts. This report shows the implementation of a face recognition program, using eigenfaces and PCA and a tracking program using the kalman filter on Intel Opencv library. Key words: recognition, tracking, Opencv, kalman filter, eigenfaces

CONTENTS

2

Contents 1 Introduction

3

2 Recognition using eigenfaces 2.1 PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 4 4

3 Tracking using Kalman filter 3.1 Matrices initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 State correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 State prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6 6 7 7

4 Discussion and results 4.1 Eigenfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Kalman filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8 8 10

5 Conclusion

12

3

1

Introduction

Autonomous Robots inhabiting an environment need to navigate. Autonomous Robots inhabiting an human populated environment need also to interact. The interaction takes place in both directions though we mainly focus on a human (transmitter) acting on a robot (receiver). Several input modalities are possible (e.g audio and tactile cue) but we are mainly interested on the visual cue. Humans can speak through their body using fingers, hands, face or the whole body. In any case the property conveying the information is the 3-D position of the human body part, static like (facial) expression or (body) posture or dynamic like (hand) gestures. To extract this information by a camera observing the human, we need to detect, recognize and track these parts. This report continues to present the ongoing work on a human motion tracking interface. After

Figure 1: architecture of the system the face has been detected using the method described in [1] we want to identify the person by a face recognition method using PCA and eigen objects. If the person belongs to a group of known people we will start to track hands and face motion using the methods described in [1]. To be able to deal with noise, losses of detection and occlusions we will use a Kalman filter to estimate the position of the hands and face. The face recognition system is based on the eigenfaces method introduced by Turk et al. [2]. Eigenvector-based methods are used to extract low-dimensional subspaces which tend to simplify tasks such as classification. The Karhunen- Loeve Transform (KLT) and Principal Components Analysis (PCA) are the eigenvector-based techniques used for dimensionality reduction and feature extraction in automatic face recognition. Face recognition using eigen objects and the implementation using Intel Opencv library are shown in chapter 2. On the other side, the tracking system is based on kalman filter state predicitons. The Kalman filter was published by R.E. Kalman in 1960, and then has been the subject of extensive research and application, particularly in the area of autonomous or assisted navigation. The Kalman filter is a set of mathematical equations that provides an efficient computational (recursive) means to estimate the state of a process, in a way that minimizes the mean of the squared error. Chapter 3 will explain the kalman filter and its implementation using Opencv functions.

4

2

Recognition using eigenfaces

The face recognition system is based on eigenspace decompositions for face representation and modelling. The learning method estimates the complete probability distribution of the face’s appearance using an eigenvector decomposition of the image space. The face density is decomposed into two components: the density in the principal subspace (containing the traditionally-defined principal components) and its orthogonal complement (which is usually discarded in standard PCA).

2.1

PCA

Given a training set of W × H images, it is possible to form a training set of vectors xT . The basis functions for the Karhunen Loeve Transform (KLT) are obtained by solving the eigenvalue problem: ∆ = ΦT ΣΦ (1) where Σ is the covariance matrix, Φ is the eigenvector matrix of Σ and ∆ is the corresponding diagonal matrix of eigenvalues λi . In PCA, a partial KLT is performed to identify the largest eigenvalues eigenvectors and obtain a principal component feature vector y = ΦTM x˜, where x˜ = x − x is the mean normalised image vector and ΦM is a submatrix of Φ containing the principal eigenvectors([3]).

2.2

Implementation

The system architecture consists of three main modules: learning, face detection and face recognition. The first one is the (offline) learning process in which the system builds the eigenspace of several people. We capture 50 images per person and use a certain number to build the database. The image capture is performed automatically using the haar-like features detector and produces 32*32 pixels gray images. Then the orthonormal eigen basis and the average image for these people is computed (using the function cvCalcEigenObjects from Opencv [4])(see figure 2). Once this eigenspace is calculated the system is able to recognize the face of the person during the tracking process in real time. The first step of the recognition process is to detect the face within the image. This task is accomplished by our face detection system (using Haar-like features [1]). Like in the learning process, we then extract the face from the image and resize and convert it (see fig. 2). We then calculate all decomposition coefficients (function cvEigenDecomposite) for the extracted image using the eigen objects for each person in the database. In the next step we calculate the projection of the image from the eigen objects of the person and the coefficients calculated before (function cvEigenProjection). We finally get one projection image for each person (see figure 3). Then we simply compare which projected image match the

2.2 Implementation

5

Figure 2: learning process

best with the original extracted image. The best result of this comparison (if it is above a certain threshold) is defined as the recognized person. The two images are compared using the Opencv function cvMatchTemplate and choosing the squared difference normalized method : P 2 x,y (Img2(x, y) − Img1(x, y)) (2) R = qP P 2 2∗ Img1(x, y) Img2(x, y) x,y x,y Then we obtain a value between 0 and 1 (0 is the best value). We can transform this value in percentage ((1 − value) ∗ 100). We keep only values under a certain threshold (here, we use 0.04 equivalent to 96%).

Figure 3: projections created from an extracted face, with the percentage of recognition

6

3

Tracking using Kalman filter

The Kalman filter was published by R.E. Kalman in 1960 [5], and then has been the subject of extensive research and application, particularly in the area of autonomous or assisted navigation. The Kalman filter is a set of mathematical equations that provides an efficient computational (recursive) means to estimate the state of a process, in a way that minimizes the mean of the squared error. The Kalman filter addresses the general problem of trying to estimate the state x of a discrete-time controlled process that is governed by the linear stochastic difference equation xk = Axk−1 + Buk−1 + wk−1

(3)

zk = Hxk + vk

(4)

with a measurement z that is

The random variables w and v represent the process and measurement noise (respectively). They are assumed to be independent (of each other), white, and with normal probability distributions p(w) ∼ N (0, Q)

(5)

p(v) ∼ N (0, R)

(6)

In practice, the process noise covariance Q and measurement noise covariance R matrices might change with each time step or measurement, however here we assume they are constant. The n ∗ n matrix A (transition matrix) in the difference equation 3 relates the state at the previous time step k − 1 to the state at the current step k, in the absence of either a driving function or process noise. Note that A in practice might change with each time step, but here we assume it is constant. The n ∗ 1 matrix B relates the optional control input to the state x . The m ∗ n matrix H (measurement matrix) in the measurement equation 4 relates the state to the measurement zk . In practice H might change with each time step or measurement, but here we assume it is constant. The main goal of the kalman filter for our task is to predict state values (x, x; ˙ y, y), ˙ which will represent the positions and the velocities of the face and hands within the image. We will however use only the x, y values of the position, and not the x, ˙ y˙ values of the velocity. We update the state vector using the face position measured by our Haar-like feature face detection and hands position measured by our skin detector function [1]. First we update the state vector with this values, and then we predict the next state.

3.1

Matrices initialization

The previously mentioned matrices need to be initialized according to the constant velocity model we are using.

3.2 State correction  1  0  A= 0  0

T 0 0

T4



 1 0 0   0 1 T  0 0 1

 width2  width2   T P =  0  0

7

4

  1 0 0 0  H= 0 0 1 0

width2 T width2 T2

0

0

height2

0

height2 T

0

T3 2 2

T3  2 T Q= 0 0  0 0

0 0 T4 4 T3 2

0



 0  q T3   2  T2



0

 0   height2   T 

  16 0  R= 0 16

height2 T2

Where T is the sampling time, which we assume to be the time between two frames in our case 15 frames per second, therefore T=66ms. R is the measurement noise matrix whose values also correspond to the lower limit of the systems uncertainty of the prediction (16 4 4 pixels). q is a variable, function of T, in order to have q ∗ T4 > 16. For our case, we choose 4 q ∗ T4 = 36(corresponding to a 6 pixels minimum uncertainty). P is the uncertainty matrix and we initialize it with the size of the whole image(width for x, and height for y). We initialize the state vector with the first positions given by the Haar-like face detection and the skin detectors. After the initialization, a first prediction follows, then the cycle of correction and prediction starts.

3.2

State correction

The state correction is done with Opencv using the function cvKalmanCorrect. The following equations to update the state from the last measurement are used :

3.3

xk = x0k + K(zk − Hx0k )

(7)

Pk = (I − KH)Pk0

(8)

K = Pk0 H T (HPk0 H T + R)− 1

(9)

State prediction

The state prediction is done with Opencv using the function cvKalmanPredict. The following equations to predict the state from the last measurements are used : x0k = Axk−1

(10)

Pk0 = APk−1 AT + Q

(11)

8

wrong matches right matches

probability values 70%to94% 90%to100%

Table 1: approximation of possible values return by the recognition system However, when no measurements are provided by the system (e.g. in case of occlusions), we have to continue predicting using the last predicted values, until having new measurements. During that time, the state will not be corrected.

4

Discussion and results

The aim of the recognition program is to be used before initializing the kalman tracker, in order to be sure we track a person we want to track.

4.1

Eigenfaces

Using eigenfaces for recognition takes comparably a small amount of time (about 2 or 3 ms per person in the database). We are able to recognize at the same time several people and choose the one to track(see figure 4). Table 1 shows the wrong matches (other people from the database)

Figure 4: several person recognition

and the right matches in term of probability values. To obtain probability values as seen in table 1, it is necessary to use the same background, illumination conditions and camera settings for the recognition as for the creation of the database. Otherwise the probability values will be lower, due to the fact that we are using a correlation method for matching. In this case, the threshold needs to be adapted. We present now some results concerning the size and the content of the database. First we compared some results using a big database (50 images) and a small database (10 images) for a image sequence. The sequence contains several facial expressions and orientations (the person to be recognize is enguer). In figure 5, the first chart represents the recognition using 6 people and 50 images but with similar orientation and expression (most are

4.1 Eigenfaces

9

Figure 5: comparison between two kind of databases for different facial expressions and orientations

frontal faces). The second chart uses the same sequence but with a 10 images sized database and hand-picked images od different orientation and expression. The first chart shows better results for frontal faces. We can observe in both cases some downward peaks when the head changes its orientation or when the person produces different facial expressions. Finally looking at the threshold (red dotted line), we see that overall probability value of the right match (enguer) is lower in the second chart.

Figure 6: huge database composed of all kind of facials expressions and orientations

The figure 6 represent the same sequence as before but using a big database (50 images) with diverse facial expressions and orientations. Again, the results for the probability value of the right match (enguer) are lower than in chart one of figure 5, but some facial expressions are

4.2 Kalman filter

10

better recognized (e.g opened mouth at the end). The results show that a database of 50 images with diverse orientations and facial expressions produces a more stable recognition though the overall probability values might be lower.

4.2

Kalman filter

Some results of the kalman filter can be seen in figure 7. The measurement shown in figure 7 are taken from a sequence of a person placed on a chair, moving only occasionally forward or backward. The face is occluded several times. We can see that in the case were no measurements are available, the predicted and state values increase or decrease with a constant velocity according to our model. During this period, predicted values and state values are similar as we use the last predicted values in the state vector to predict the new values. When a new measurement value is available the predicted and state values will be corrected again. In this case the measurement values and the state values are similar. Results about

Figure 7: Kalman state, predicted and measured values comportment

the uncertainty matrix are shown in figure 8. The blue points correspond to the minimum and maximum uncertainty value. From this we can calculate the size of the uncertainty window around the predicted position point within the image . We see that when we don’t have measurement values, the size of the uncertainty window is growing. With new measured values, the size comes back to the minimum in few steps. The minimum is due to the values we put in the Q matrix, corresponding to our system noise. Here, we have a minimum of 6 pixels (see explanations in the section Matrices Initialization). We expected from kalman to deal with occlusions (see figure 9). When using the skin color ([1]), to detect hands and head, we will loose the tracking of the hand if a hand passes in front

4.2 Kalman filter

11

Figure 8: Kalman uncertainty, prediction and measured values comportment

of the face (Hand/Head occlusion) . This is due to the fact that the hands and the face have usually the same skin color. The detection of the hand will get stuck to the face due to the Camshift function (see [6]). To overcome this problem, we need to switch off the measurement (detection) of the hand when it enters the face area. With the kalman filter we can predict where and when the hand leaves the face area and thus reactivate the skin detection. We suppose that the hand moves with constant velocity.

Figure 9: sequence showing a hand/head occlusion and the tracking again of the hand due to kalman filter prediction

12

5

Conclusion

This report has shown the implementation of a recognition and tracking program for human motions using eigenfaces (PCA) and kalman filter from the Intel OpenCv library. These are two modules of a human tracking interface (see figure 1) also including face and skin detection from prior works([1]). Indeed, the face and hands tracking only starts after the recognition of the person. We presented results concerning the face recognition, showing the feasibility of our approach. We show results of the behavior of the kalman tracking, especially in the case of (Hand/Head) occlusions. As we are now able to follow the motion of some humans body parts, we can try to reproduce them using a 3D modelisation and animation software(i.e Maya).

REFERENCES

13

References [1] J.Rett and E.Boissier. Face and hands detection based on Haar-like features and skin color. 2005. [2] M. Turk and A. Pentland. Face recognition using eigenfaces. 1991. [3] Paulo Menezes Jose Barreto and Jorge Dias. Human-Robot Interaction based on Haar-like Features and Eigenfaces. 2004. [4] opencvref_cv.htm. [5] Kalman R. E. Transaction of the ASME—Journal of Basic Engineering, chapter A New Approach to Linear Filtering and Prediction Problems, pages 35–45. 1960. [6] Santa Clara CA Intel Corporation Gary R. Bradski, Microcomputer Research Lab. Computer vision face tracking for use in a perceptual user interface. 1998.