ISR Report 03

and negative images and by training the classifier [6]. ... profile version allows to detect profiles faces as well as frontal ones, though being less robust with.
520KB taille 4 téléchargements 235 vues
Human Machine Interfaces Detection

LAAS/ISR Report 03 Face and hands detection based on Haar-like features and skin color Date: 13th July 2005

Enguerran Boissier LAAS, Universite Paul Sabatier, 118, route de Narbonne, 31062 TOULOUSE FRANCE

Project sponsored by ISR http://www.isr.uc.pt

Abstract: One application of Human Machine Interfaces is the interpretation of human motions by robots. Hands and face are human body parts which can convey a lot of information for the interaction between humans and robots. To extract this information by a camera observing the human, we first need to detect these parts. This report shows the implementation of a face and hands detection program, using Haar-like features and skin color together on Intel Opencv library. The different steps used in this implementation as well as main results obtained are shown in this report. Key words: Opencv, Haar-like features, skin color, face detection, hands detection

Contents 1 Introduction

3

2 Detection using Haar-Like features

4

3 Implementation using Intel Opencv library

5

4 Detection using skin color

5

5 Coupling Haar-like features and skin detection

7

6 Discussions and results 6.1 Main results . . . . . . . . . . 6.2 Hand/Head occlusion . . . . . 6.3 Settings . . . . . . . . . . . . 6.4 Head roll with camshift function

8 8 8 9 9

7 Conclusion

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

10

3

1

Introduction

Autonomous Robots inhabiting an environment need to navigate. Autonomous Robots inhabiting an human populated environment need also to interact. The interaction takes place in both directions though we mainly focus on a human (transmitter) acting on a robot (receiver). Several input modalities are possible (e.g audio and tactile cue) but we are mainly interested on the visual cue. Humans can speak through their body using fingers, hands, face or the whole body. In any case the property conveying the information is the 3-D position of the human body part, static like (facial) expression or (body) posture or dynamic like (hand) gestures. To extract this information by a camera observing the human, we need to detect, recognize and track these parts. This report shows the first step to recognize human gestures, that is hands and face detection.

Figure 1: example of motion tracking This means we need to identify and locate the face and the hands within the image. A basic method to detect both hands and face is to use the skin color. Human skin color has been used and proved to be an effective feature for many applications from human face detection to hand tracking. However, most studies use either simple thresholding or a simple Gaussian distribution to characterize the properties of skin color. In our case, we will use an image of the skin color probability distribution within the original image. Chapter 4 and 5 will show how to detect skin color and implement a skin color detection using Opencv. Although skin colors of different races fall into a small cluster in normalized RGB or HSV color space, it is not obvious to have a good color model for all color of skin. Therefore, a second method using Haar-like features could be also used to detect face and hands. Although, hand-detection appears less robust due to shape variation and background influence. Haar-like features were introduced by Viola et al. [1] and improved at first by Lienhart et al. [2] and then by Menezes and Barreto [3] [4]. They are based on a set of simple features able to represent the object we want to detect, i.e human face. Using Haar-like features, several faces can be detected at any scale and in a constant time within an image. This method and how to implement it on Opencv, are described in chapter 2 and 3.

4

2

Detection using Haar-Like features

The Haar-Like detection technique introduced by Viola et al. [1] is based on the idea of the wavelet template (Oren et al., 1997 [5]) that defines the shape of an object in terms of a subset of the wavelet coefficients of the image. The features used are reminiscent of Haar Basis Functions (see figure 2 (picture token from [4])). Any of these Haar-like features can be

Figure 2: example of Haar basis features Figure 3: Example of Haar features that are (picture token from [4]) used to define the face (picture token from [3]).

computed at any scale or location in constant time using the integral image representation for images [1]. Using a set of theses features, we can encode some knowledge of the object we want to detect, i.e the contrasts exhibited by a human face and their spacial relationships (see figure 3 (picture token from [3])). A variant of the AdaBoost learning algorithm [1] is used to boost the feature classification performance, by selecting the features which best separates the positive and negative images and by training the classifier [6]. In its original form, the AdaBoost learning algorithm is used to boost the classification performance of a simple (sometimes called weak) learning algorithm. The main challenge is to find a very small number of these features that can be combined to form an effective classifier. According to the fact that Haar-like features encode contrast between regions, it’s obvious that it is more easy to use them to detect face rather than hands. Indeed, the face can be detect simply using it’s internal contrast, independently of the background. On the contrary, hands have no internal contrast. The only one we can find is the contrast between hands and the background. Or this contrast will be reversed having a black or a white background. Thus, in our case, Haar-like features will be only used for the face detection.

5

Search window size 640*480 image 320*240 image adaptive

Time consumption 345 ms 70 ms 15 ms to 70 ms (depending on the search window size)

Table 1: opencv face detection function time consumption per process loop

3

Implementation using Intel Opencv library

Intel Opencv library already provides a face detection demo. Frontal and profile face detections are provided. The frontal face detection is quite robust, but detects only frontal faces. The profile version allows to detect profiles faces as well as frontal ones, though being less robust with the latter. For our task, we will at first only use the frontal face detection. But, both profile and frontal faces could be used. Our strategy will be to start with frontal face detection and, after the face is lost, initiate a search for the profile. To be used for real time applications, the detection needs to be sufficiently fast. The relation between image size and time consumption for our system 1 is shown in table 1. To obtain a 320*240 image from a 640*480 image, we use a pyramidal downsampling (cvPyrDown function, provided by OpenCv library). The time consumption can be further decreased by be only searching inside a small window inside the image. This points towards a tracking approach by placing the search window at the last known center position of the face or around the next predicted position when using a predicting tracker (e.g. kalman). Using a search window with e.g. two times the size of the last bounding box of the face (see figure 4).

Figure 4: search-window around the detected face

4

Detection using skin color

Human skin color has been used and proved to be an effective feature for many applications from human face detection to hand tracking. However, most studies use either simple thresholding or a simple Gaussian distribution to characterize the properties of skin color. Although skin colors of 1

Athlon Amd64 3000+, firewire camera

6

different races fall into a small cluster in normalized RGB or HSV color space, a single Gaussian distribution is neither sufficient to model human skin color nor effective in general applications. A solution could be to use a huge data base of human faces from different ethnic groups to build a skin color model( [7]). Our solution will be to use the Haar-like feature face detection to set as the skin color model the color of the detected face. Having our skin color model, we will represent our color image as a probability distribution of our color model. In order to do this, we first create a model of the desired hue creating a color histogram which will represent our color model. We use the Hue Saturation Value (HSV) color system that corresponds to projecting standard Red, Green, Blue (RGB) color space along its principle diagonal from white to black (see arrow in figure 5). This result is the hexcone in figure 6. Descending the V axis in

Figure 5: RGB color cube

Figure 6: HSV color system

figure 6 gives us smaller hexcones corresponding to smaller (darker) RGB subcubes in figure 5. HSV space separates out hue (color) from saturation (how concentrated the color is) and from brightness. We then create our color models by taking 1D histograms from the H (hue) channel in HSV space. The stored flesh color histogram is then used as a model, or lookup table, to convert the image to a corresponding probability of flesh image as can be seen in the figure 7 (algorithm from Opencv camshift function [8]).

Figure 7: skin-color probability image

7

5

Coupling Haar-like features and skin detection

Several methods could be used to detect face and hands from this probability image. Our idea is to take advantage of the window returned by Haar-like around the face, and to compute the center of mass (or mean point) in this window within the probability image. This point should represent the center of the face. For discrete 2D image probability distributions, the mean location (the centroid) within the search window is found as follows: Find the zeroth moment XX M00 = I(x, y) (1) x

y

Find the first moment for x and y XX XX M10 = xI(x, y); M01 = yI(x, y) x

y

x

(2)

y

Then the mean search window location (the centroid) is xc =

M01 M10 ; yc = M00 M00

(3)

where I(x, y) is the pixel (probability) value at position (x, y) in the image, and x and y range over the search window. Then, knowing the position of the face, we can evaluate the position of the hands, and create two search windows around this positions. We assume the persons posture as standing frontal to the camera with the hands in a relaxed position visible beneath the hip close to the upper legs. For our task, we set two areas on the bottom of the image, separated by the face x position(see blue rectangles in first image of figure 8), with a height of 1/4 of image height and a width of two times the face width. Then we can compute in each area the mean position, as seen above, and then get hands position within the image. You can see in figure 9 the

Figure 8: Search windows created around hands supposed positions (blue rectangles)

sequence of face detection by Haar-like (green rectangle) and then hands detection by skin color (yellow and purple ellipse). The red ellipse is corresponding to the face detection by skin color. In this case, we assume the hands to be found in the bottom corners of the image. The ellipse around the hands will reduce until fitting with hands size (see figure 8). Haar-like features could also be used to make more robust the face skin detection. Indeed, we can update the histogram representing the color model of the face and the position of the track window using data returns

8

Figure 9: As a sequence from left to right. Left: Searching for a face, Middle: Detection of the face, Right: Detection of the hands

by Haar-like face detection. Thus after 5 successive frame founding a face within the image, we will update the skin color model and the track-window. We use the value 5 in order to be sure to update the skin detection with a good face and not with wrong information.

6 6.1

Discussions and results Main results

By using frontal and profile face Haar-like cascades as describe in the chapter 2, we achieve to have a robust face detector, able to detect several faces at any scale and in a real time. For our task, that is combine haar-like features with skin color detection, we will focus on only one face, and two hands. Using together Haar-like face detection and skin color detection, we are able to track the face and the hands of one person. Besides, using the face detection of Haar-like features to set the color model, we achieve to detect face and hands from people of different ethnic origin (see figure10). To use this program for real time applications, we obviously need to use search windows for face detection and skin color detection, in order to gain computing time.

6.2

Hand/Head occlusion

An interesting question is the behavior when a tracked hand passes by the tracked face. Due to the Haar-feature detection the tracker will loose the face as long as the hand occludes the face. The face skin color detector will incorporate the hand as a grown face and the hand skin color detector will incorporate the face as a grown hand. When the hand has passed the face the haar-like feature tracker will detect the face again and reinitialize the skin color detector. Thus the red ellipse will adjust to the real size of the head. The hand tracker will stay stuck with the (to big) head skin color (see figure 11).

6.3 Settings

9

sharpness shutter white balance(U/B) saturation

100 20 100 100

gamma 1 gain 150 white balance(V/R) 65

Table 2: Initial camera settings

6.3

Settings

To get an image under stable conditions we disabled all automatic control adjustments of the camera. With good settings, skin color is well distinguish from the background. In our case, we set the camera as it is describe in table 2.

Figure 10: different color people detection

Figure 11: occlusion of the face by a hand

We also set a threshold for S (saturation) values of HSV image at 100. Each pixel with a S value less than 100 will not be used for the probability image(see figure 12).

6.4

Head roll with camshift function

Using the camshift function of Opencv, we are also able to get the orientation of the head. The 2D orientation of the probability distribution is easy to obtain by using the second moments during the course of CAMSHIFT’s operation where (x,y) range over the search window, and I(x,y) is the pixel (probability) value at (x,y):

10

Figure 12: Saturation threshold at 30 and at 100 respectively

Seconds moment are : M20 =

XX x

x2 I(x, y); M02 =

XX

y

x

y 2 I(x, y)

(4)

y

Then the object orientation (major axis) is : M

arctan ( θ=

2( M11 −xc yc ) 00

M

M

( M20 −x2c )−( M02 −yc2 ) 00

00

2

) (5)

But this head roll is sensitive of many cases, such as presence of distractors (i.e one other person) within the image.

7

Conclusion

This report has shown the implementation of a detection program using Haar-like features and skin color on the Intel OpenCv library. We have compared two different approaches to detect face and hands, and shown how to use them together. Thus, we achieve to detect a face and the both hands of a person in an image, using the Haar-like features to detect the face, and skin color for the hands. For this task, we assume that the hands should be in the bottom position(see figure 8). We were able to use these two detection method together as a human face and hands tracker, in a real computing time. In the future we are planning to use a Kalman filter for both hands and face, in order to avoid occlusions or other case where we cannot detect this body parts. We also plan to be able to recognize the face, using eigen faces.

REFERENCES

11

References [1] Paul Viola and Michael J. Jones. Rapid object detection using a boosted cascade of simple features. IEEE CVPR, 2001. [2] Rainer Lienhart and Jochen Maydt. An extended set of haar-like features for rapid object detection. IEEE ICIP 2002, 1:900–903, 2002. [3] Paulo Menezes Jose Barreto and Jorge Dias. Human-Robot Interaction based on Haar-like Features and Eigenfaces. 2004. [4] Jose Barreto Paulo Menezes and Jorge Dias. Face tracking based on Haar-like Features and Eigenfaces. 2004. [5] P.Shina C.Papageorgiou, M.Oren and T.Poggio. Pedestrian detection using wavelet templates. 1997. [6] Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. 1995. [7] Narendra Ahuja Ming-Hsuan Yang. Face detection and gesture recognition for humancomputer interaction. Kluwer Academic Publishers, 2001. [8] Santa Clara CA Intel Corporation Gary R. Bradski, Microcomputer Research Lab. Computer vision face tracking for use in a perceptual user interface. 1998.