Feature selection for faces clustering in low constraint ... - ENS Lyon

This report briefly presents the problem of faces clustering, describes some studies ... The images are taken by a video camera installed in the corridor of the lab.
859KB taille 1 téléchargements 330 vues
M1 Informatique fondamentale ´ Ecole Normale Sup´erieure de Lyon

Feature selection for faces clustering in low constraint environment internship in the Slipguru research group of the university of Genoa under the direction of Nicola Rebagliati and Alessandro Verri

June 4th 2008 – August 14th 2008

Olivier Schwander

August 31, 2008 relecture Nicola 3

´ ENS Lyon

Universit`a degli studi di Genova

Introduction The main research interests of the Slipguru group are statistical learning and computer vision. The goal of my internship was to study some methods of image clustering applied to faces. This report briefly presents the problem of faces clustering, describes some studies using classical methods and details the design of a new region descriptor, well-suited for faces-related applications.

1

Contents 1 Clustering 1.1 Differences between supervised and unsupervised learning . . . . . . . . . . . . . 1.2 The faces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Features selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 3 3 5

2 Similarity and features 2.1 Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Similarity measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Features selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6 6 6 9

3 From Local Binary Patterns to Local Orientation 3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . 3.2 Interpretation of the uniform patterns . . . . . . . 3.3 Building a new descriptor . . . . . . . . . . . . . . 3.4 Justification . . . . . . . . . . . . . . . . . . . . . .

Patterns . . . . . . . . . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

12 12 13 13 16

4 Clustering algorithm 18 4.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Bibliography

20

A Mapping between LBP and LOP

21

B Data set for the LOP stability tests

27

2

Chapter 1

Clustering 1.1 1.1.1

Differences between supervised and unsupervised learning Supervised learning

In this case, the data set is split into two subsets : a learning set and a test set. The learning set is used to compute the parameters of the model, and the test one is used to measure the performance rate. Let’s take an example. A PhD thesis ([Destrero, 2008]) recently defended by a Slipguru member uses a facial authentication system: the goal is to verify the identity of someone. For each person known by the system, a set of images is used to learn the face of the person. After the learning step, the system is able to recognise the people known in the database and to reject the others. The main challenge is to avoid learning by heart the learning set but to manage to generalise: the system needs to be efficient not only on the learning set data but also on never seen data.

1.1.2

Unsupervised learning

Now, we want to avoid the need of a dedicated learning set. The system must choose a way to split the data set. It is the context of this internship: given a database of images containing faces of n people, we want to get n groups, one for each person, containing all the images of a same person. This problem is called clustering. The challenge is to made the system choose an interesting criterion for the clustering. Here, we want to split according the identity of the people and not, say, on the hair colour. With most algorithms, it is mandatory to give the number of wanted groups.

1.2 1.2.1

The faces Acquisition

The images are taken by a video camera installed in the corridor of the lab. Given a picture of the corridor, an algorithm explained in [Destrero, 2008] searches and extracts the faces. Then, sequences of images are built using an algorithm that follows the moves of people in the corridor. Each time someone walks in the corridor, we get a sequence from 10 to 20 images, representing the progression of the man in the corridor.

3

CHAPTER 1. CLUSTERING

Figure 1.1: A view of the corridor from the camera

Figure 1.2: A sequence of faces

1.2.2

Face specific problems

All the human faces are very similar, everybody has two eyes, a nose, a mouth, etc. It is often very difficult, even for a human, to find specific details to identify someone. Using little parts of the face (like the eyes) is difficult because it needs a precise positioning of the face. Moreover, even for the same person, the face can be highly changed by the expression (smile, sadness, . . . ).

1.2.3

Low constraint specific problems

Here, the pictures are taken in a low constrained environment: we have no information about the position and the orientation of the face (figure 1.4). Moreover, the illumination changes a lot during a day (morning, midday, evening) and we also have no control on the face expression (figure 1.3).

Figure 1.3: Expression changes into a sequence

Figure 1.4: Moves into a sequence Another problem is the liability of the face detection algorithm: some images are actually 4

CHAPTER 1. CLUSTERING not images but false positives. And we have sometimes errors in the sequences: a face can belong to one person but be put in another person’s faces sequence.

Figure 1.5: Some false positives

Figure 1.6: Two persons in the same sequence

1.3

Features selection

A global description of all the faces is the simplest way to make recognition systems, but it is obvious that all the parts don’t have the same importance in the recognition process: psychophysical studies (see [Zhao et al., 2003]) show for instance that the eyes are more important than the cheeks. So we will use some local descriptors of interesting regions for the recognition. A way to avoid bias in the choice of the interesting regions is to use a feature selection step: for each possible region, the feature, we measure its efficiency and we keep only the most efficient ones.

5

Chapter 2

Similarity and features 2.1

Goal

We want to evaluate the discriminative power of some similarity measures. The input of the test system is a couple of images sequences. We build a similarity matrix between the two sequences. Given two sequences a0 , an1 −1 and an1 , an1 +n2 −1 of n1 and n2 images and a similarity measure s, the output is the matrix mij = s(ai , aj ) for all i, j ∈ J0, n1 + n2 − 1K (Note that this matrix is symmetric since s must be symmetric). In the ideal case, we have a block-diagonal matrix with non-zero values (high similarity) in the two diagonal blocks and zero values (low similarity) in the other two blocks (figure 2.1). In real cases, we only expect high values in the non-zero parts, and low values in the zero parts, with enough difference to be able to split the different blocks. B

0

1

A

1

0

A

B

Figure 2.1: A block diagonal matrix Note that all the matrices are represented with the (0, 0) coefficient in the bottom-left hand corner.

2.2 2.2.1

Similarity measure Choice of the similarity measure

Before using a quantitative test, we begin by a rough selection using a visual inspection of the similarity matrix. Euclidean distance Let A and B be two matrices representing two images, the euclidean distance is:

6

CHAPTER 2. SIMILARITY AND FEATURES

d(A, B) =

sX

(Aij − Bij )2

i,j

Figure 2.2: An easy case and a hard one for visualising the two blocks with euclidean distance We see in the figure 2.2 that the contrast is not very high in the two cases, even for very different faces. Scalar product Let A and B be two matrices representing two images, the scalar product is: scalar(A, B) =

X

Aij Bij

i,j

In this particular case, the similarity measure is a not a distance, so we must invert the zero and non-zero cases: the value is high in the high similarity case, and low in the low similarity case.

Figure 2.3: An easy case and a hard one for visualising the two blocks with scalar product We see in the figure 2.3 that the scalar product behaves a little better in the difficult case. Conclusion The simple similarity measures using directly the grey level values are not sufficient to discriminate two sequences in every cases, we need a more elaborated descriptor to build an efficient similarity measure.

7

CHAPTER 2. SIMILARITY AND FEATURES

2.2.2

The LBP descriptor

Definition The Local Binary Pattern were introduced by [Ojala et al., 1996] to classify textures. It maps a 3 × 3 region to a 8 bits word: LBP(xc , yc ) =

7 X

2n s(in − ic )

(2.1)

n=0

with s(u) = 1 if u > 0 and 0 otherwise. The (xc , yc ) are the coordinates of the centre of the region, ic is the value of this point. The in covers the neighbourhood as we can see figure 2.4. 17

16

64

0

32

42

5

0

91

1

52

1

0

1

0

0

1

Figure 2.4: Building of the pattern 00101010

2.2.3

Histogram

To compute the descriptor of a region, we build the LBP histogram of the region: the dimension i of the histogram contains the count of patterns i in the region. Uniform patterns A uniform pattern is a pattern with exactly 0, 1 or 2 “01” or “10” transitions (10000001 and 00011000 are uniform but not 00101010). Around 90% of all the patterns found in real images are uniform, they can be discarded without losing too much information in order to reduce the dimension of the problem: from 256 possible values to 58. Similarity matrices We get the matrices in figure 2.5 The results are far more better, especially in the difficult cases.

8

CHAPTER 2. SIMILARITY AND FEATURES

Figure 2.5: An easy case and a hard one for visualising the two blocks with LBP descriptor and χ2 distance

2.3 2.3.1

Features selection Motivation

The idea that all the parts of the face do not have the same importance for recognition is widely admitted: [Ahonen et al., 2004] gets good results for face authentication using a regular grid and giving different weights to the different cells of the grid. This approach is valuable for images precisely aligned as in the FRGC1 database, but in a low constraint case, we can’t rely on a precise position of the face’s elements in the image. Moreover, we can think that a regular grid is too high constraint even for precisely aligned images: there is no reason that the interesting parts of the face would fit the cells of the grid. Instead of giving weights to the features, we will discard the less efficient one and keep only the most efficient one: in return for the cost of a preprocessing stage to select the interesting features, we greatly reduce the dimension of the problem, decrease the computation time and discard regions with no or few valuable information. This method was used with great success particularly in [Viola and Jones, 2001] and [Destrero, 2008] which present state-of-the-art algorithms for face detection and face authentication.

2.3.2

Rectangular features

The simplest form of a feature is the descriptor of a rectangular feature. Since it would be too long (about 20 seconds by rectangle and couple of sequences) to test all the possible rectangles (672400 for the 40 × 40 images we use), we need to limit the number of tests. One way could be to discard the smallest rectangles: a small rectangle carry few information and is more likely to change in case of perturbations of the image (translations, rotations, noise, . . . ). But this is not sufficient since we still have: • 443556 possibilities with only rectangles of size bigger than 5 × 5; • 246016 possibilities with only rectangles of size bigger than 10 × 10. Our solution is to choose a small set of rectangles at a random position with a random size (bigger than 10 × 10) and to measure the performance on some couples of faces sequences with each of the rectangles. 1 Face Recognition Grand Challenge, a US government database for the evaluation of face recognition algorithms

9

CHAPTER 2. SIMILARITY AND FEATURES

2.3.3

Quantitative measure

A visual inspection is not practicable for an automated selection process: we need a quantitative measure which allows us to evaluate the efficiency of a feature. The feature selection algorithm is: 1. Choose a rectangle at random 2. Choose a set of two sequences at random 3. Compute the similarity matrix of these two sequences 4. Compute the efficiency of the feature on each couple 5. Return the mean of the efficiencies on all the couples In order to compute this efficiency, we use the normalised graph Laplacian (as defined in [von Luxburg, 2007]). Let D be the matrix n X

Dii =

Wij

j=1

Dij

= 0 if i 6= j

The normalised laplacian is the following matrix: L = I − D− /2 W D− /2 1

1

The similarity matrix W is seen as the adjacency matrix of an undirected weighted complete connected graph: the problem of finding a partition of the two sequences is equivalent to finding a MaxCut of the associated graph. Here it is sufficient to get an approximation of the maxcut: if a cut with a high cost exists, there are two blocks highly visible. As explained in [von Luxburg, 2007], a spectral graph theory result relates an approximation of the maximum cut to the eigenvector associated with the biggest eigenvalue and this biggest eigenvalue is proportional to the cost of this cut. So, we only need to look at the maximum eigenvalue to see if the matrix is block-diagonal.

2.3.4

Results x1 2 1 2 9 2 10 2 15 5 5

y1 9 4 17 2 8 13 2 9 8 5

x2 25 25 24 23 25 23 21 25 35 24

y2 41 40 37 40 34 39 40 38 34 40

λ 1.562 1.556 1.538 1.530 1.527 1.522 1.516 1.512 1.505 1.500

Table 2.1: The better 10 rectangles with their score 10

CHAPTER 2. SIMILARITY AND FEATURES We see in the figure 2.6 the better features found by our algorithm. This experiment was made on 100 randomly selected rectangles and 20 couple of sequences. We manage to get good features since experience shows that we need values larger than 1.3 to do have a feature useful for a clustering purpose.

Figure 2.6: Drawing of the better features If we look at the rectangles selected, we notice a preference for the eyes and the nose. There are two reasons: these two parts carry a lot of information and, due to the face detection algorithm’s design, they are always the most precisely positioned parts of the face (see [Destrero, 2008]).

11

Chapter 3

From Local Binary Patterns to Local Orientation Patterns 3.1

Motivation

[Ojala et al., 1996] chose to discard non-uniform patterns arguing that they are only 10% of all the LBP in real images. Experimental results give him reason since there is no significant difference between performances with or without non-uniform patterns. We wonder either there is any meaning for uniforms LBP. Ojala assumes that there are few information in the non-uniform patterns because they are really fewer than the others. Nonetheless, this hypothesis does not rely on any other justification than experimental results. A possible drawback of this approach is that Ojala may discard meaningful patterns and keep noisy ones. Let’s take an example: 1 0 0

1 0

1 0 0

1 0 0

1 1

1 0 0

1 0 0

0 0

1 0 0

The first pattern is the original one, the two others may be patterns got after little changes in the images due to move, lightning change or noise. In this case, Ojala discards patterns that have the same meaning as the first one. Besides, Ojala keeps patterns with few information: 0 0 0

1 0

0 0 0

1 0 0

0 0

0 0 0

0 0 0

1 0

1 0 0

Again, the first pattern is the original one, little changes may lead to the two others, which are also uniform patterns. In order to check these assumptions, we made stability tests on some close images (the method is described in 3.4.1): As expected, we see in the figure 3.1 that even in the uniform patterns set some are very unstable (like the 23 and the 54 with 77% and 76% of deviation) but some others are really more stable (like 10, 11 or 33 for instance). So it seems that it is legitimate to search for a better classification that discards unstable patterns and keeps only the better ones.

12

CHAPTER 3. FROM LOCAL BINARY PATTERNS TO LOCAL ORIENTATION PATTERNS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

99.88 40.62 7.00 6.12 16.00 7.88 16.75 63.50 17.50 20.38 39.75 75.38 2.75 11.00 48.38 62.75 28.50 3.50 6.00 26.00

8.82 2.74 2.12 2.67 2.24 2.26 2.33 6.96 4.03 3.81 3.11 7.11 1.30 1.94 4.58 7.51 3.64 1.22 1.73 2.29

0.09 0.07 0.30 0.44 0.14 0.29 0.14 0.11 0.23 0.19 0.08 0.09 0.47 0.18 0.09 0.12 0.13 0.35 0.29 0.09

20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

54.25 24.50 8.50 3.12 5.88 39.12 80.75 31.88 10.38 13.38 16.88 20.12 33.62 89.50 29.12 9.25 2.00 14.75 52.00 61.88

8.38 2.12 1.66 2.42 3.06 5.06 6.22 3.79 2.96 2.50 4.37 2.52 5.38 5.74 2.76 2.77 1.32 1.79 4.58 5.46

0.15 0.09 0.20 0.77 0.52 0.13 0.08 0.12 0.28 0.19 0.26 0.13 0.16 0.06 0.09 0.30 0.66 0.12 0.09 0.09

40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58

22.88 9.25 7.62 39.88 69.75 25.12 8.88 1.50 88.75 40.12 10.75 16.00 20.50 6.75 2.38 5.50 6.00 1.75 16.12

2.57 2.68 3.90 5.01 5.52 4.01 3.18 0.87 7.46 6.57 3.96 4.06 3.50 3.11 1.80 1.41 1.87 0.83 2.26

0.11 0.29 0.51 0.13 0.08 0.16 0.36 0.58 0.08 0.16 0.37 0.25 0.17 0.46 0.76 0.26 0.31 0.47 0.14

Table 3.1: LBP stability results on close images (pattern, count, standard deviation, stability)

3.2

Interpretation of the uniform patterns

The usual point of view is to see uniform local binary patterns only as binary words. We suggest here a different interpretation, as the combination of two parameters (shown on figure 3.1): • a direction, • an aperture.

0

1

0

1 1

0 0

0

Figure 3.1: Direction and aperture of a LBP pattern Since the aperture may change a lot with a little perturbation, we guess that only the direction is useful.

3.3 3.3.1

Building a new descriptor Idea

The possible values of the new descriptor will be the following: 13

CHAPTER 3. FROM LOCAL BINARY PATTERNS TO LOCAL ORIENTATION PATTERNS

0

1

0 0

0

1

0

1

0

1

0

1

0

1

1

1

0

0

0

1

1 1

0

0

Figure 3.2: Little change for the direction but big change for the aperture • 8 directions (figure 3.3), • a maximum, • a minimum.

Figure 3.3: The 8 possible directions Like a gradient, the direction represents the sense of the variation of the grey level values inside the 3 × 3 neighbourhood. The maximum and minimum patterns are designed to describe the cases where there is an extremum in the neighbourhood (see table 3.2). 0 0 0

1 1

0 0 0

1 0 1

1 1

1 0 1

1 1 0

1 1

0 1 1

Table 3.2: A minimum pattern and two maximum ones Since we do not rely anymore on something that is binary but only on directions, we call our new descriptor Local Orientation Pattern.

3.3.2

Effective computation

The effective computation of the Local Orientation Patterns is made by a mapping between the Local Binary Patterns and the possible values for Local Orientation Patterns. For each possible LBP, we choose a corresponding LOP. Since we saw that a meaningful uniform LBP can become a non-uniform pattern due to noise or other perturbations, we work on all the possible patterns, and we need to choose a mapping such that a uniform LBP and a noised non-uniform one become the same LOP. Some choices are obvious, for instance:

14

CHAPTER 3. FROM LOCAL BINARY PATTERNS TO LOCAL ORIENTATION PATTERNS 1 0 0

1 0

0 0 0

0

1 0 0

1

0

0

1 0 0

is a up pattern,

1 1 1

is a right pattern,

1 1 1

is a up-right pattern

but others need more rigorous criteria.

3.3.3

Requirements

Our mapping needs to follow some requirements: • the same number of patterns in each direction, • coherence under the action of rotations (if a LBP is mapped to a direction, the image of the same LBP by a rotation needs to be mapped to the image of the previous direction by the same rotation), • central symmetry: if a LBP is mapped to a direction, its binary complement needs to be mapped to the opposite direction. For instance,

3.3.4

0 0 0

1

0 1 1

0

0

1

1 1 0

is up-right, so

0 0 0

is bottom-left.

Drawbacks

First, we must say that it is not so easy to build a satisfying mapping. Particularly it is difficult to choose between up and up-right patterns (and all their rotations) and between an extremum or a real direction. In the table 3.3, is the first pattern an up with some noise on the right side, or a true up-right ? And is the second pattern a maximum, or an up pattern with some noise ? 1 0 0

1 0

1 1 0

1 0 0

1 1

1 0 0

Table 3.3: Some dubious patterns

15

CHAPTER 3. FROM LOCAL BINARY PATTERNS TO LOCAL ORIENTATION PATTERNS

3.3.5

Noise pattern

In order to deal with the most dubious patterns, we introduce a noise pattern. The goal is to group all the patterns that may vary a lot with few perturbations. The full description of the mapping is available in the appendix A.

3.4 3.4.1

Justification Stability

The LOP descriptor is designed to be more stable than the LBP descriptor to little variations in the images. In order to prove that LOP is better for this purpose, we will compare the stability of some descriptors computed on a data set of nearly identical faces (described in the appendix B): with LBP but also with a gradient based pattern. Indeed we want to test an other descriptor using orientation since ours gives the direction of the variation of the grey levels in the image. To compare the descriptors, the gradient vector is mapped to 8 directions and a zero one (equivalent to the two extrema patterns of LOP) when the norm of the vector is under a threshold (chosen in order to have roughly the same number of zero patterns for gradient as the extrema patterns for LOP). For each dimension, we compute the standard deviation divided by the mean of this dimension for all the images of the data set, so this value does not depend on the number of patterns in a dimension. Finally, we look at the mean of all these normalised standard deviations to get the stability of the descriptor. Dimension 0 1 2 3 4 5 6 7 8 9 10

Count 123.75 86.12 87.75 117.62 134.12 489.38 126.88 124.00 148.00 74.12 88.25

Standard deviation 13.90 7.75 6.28 10.52 8.78 9.89 8.21 4.97 11.93 6.72 5.02

Stability 0.11 0.09 0.07 0.09 0.07 0.02 0.06 0.04 0.08 0.09 0.06

Table 3.4: Stability results for Local Orientation Patterns We see in the table 3.1 that LBP have a lot of unstable patterns and some very stable ones and in the table 3.5 that the gradient is always very unstable. On the contrary, we see in the table 3.4 we see that all the possible patterns of LOP are fare more stable than all the others. If we look at the global stability value, the mean of the standard deviations on all the dimensions, we see that LOP behaves far better than the two others. Surprisingly, we notice that the Gradient Pattern is very bad, even if it as nearly the same meaning than LOP.

3.4.2

Invariance

A very important characteristic for a region descriptor is its invariance to some transformations: 16

CHAPTER 3. FROM LOCAL BINARY PATTERNS TO LOCAL ORIENTATION PATTERNS Dimension 0 1 2 3 4 6 7 8 9

Count 214.88 136.50 254.38 97.62 217.25 221.62 201.38 198.38 58.00

Standard deviation 49.69 63.73 76.73 32.39 62.95 40.85 53.26 29.56 33.26

Stability 0.23 0.47 0.30 0.33 0.29 0.18 0.26 0.15 0.57

Table 3.5: Stability results for Gradient Patterns Pattern LBP Gradient LOP

Stability 0.24 0.31 0.07

Table 3.6: Global stability of the 3 patterns • translations, • rotations, • scale changes, • noise. Pattern LBP Gradient LOP

Noise 0.34 0.37 0.18

Rotation 0.16 0.36 0.05

Scale 0.97 0.59 0.51

Translation 0.23 0.81 0.14

Mean 0.56 0.71 0.29

Table 3.7: Stability under the action of some transformations for the 3 studied patterns We clearly see in the table 3.7 that our new descriptor performs far better than the two others (in average 27% better than LBP and 42% better than the gradient), even for the rotations, which is a surprising result for a pattern relying on orientation.

17

Chapter 4

Clustering algorithm 4.1

Algorithm

We begin with a classical dimension reduction method explained in [von Luxburg, 2007]: as described in 2.3.3, we compute the laplacian of the similarity matrix and the two eigenvectors u and v associated with the highest two eigenvalues. The image n will be mapped to the point (u[n], v[n]). So, from very high dimension descriptors, we get a set of R2 points. In our problem, we don’t want to clusterize single points but sequences of images. As a result, in order to use classical clustering algorithms, we need to describe each sequence with only one point. We choose the centroid of each sequence (due to the errors described in 1.2.3, we may need to exclude aberrant values). Now, we can use the classical algorithm k-means and more particularly the k-means++ variant described in [Arthur and Vassilvitskii, 2007] which is designed to give guaranties on the quality of the output clusters by a judicious choice of the initial centroids of the clusters. Obviously, we do not avoid the classical drawback of most of the clustering algorithms: we need to know how many clusters we want.

4.2

Results

Unfortunately, we didn’t have time to implement this algorithm and to make experiments on real data. So the results will be awaited a little.

18

Conclusion Based on Slipguru successes in the supervised case for face authentication, this internship leads to a method of face clustering using state-of-the-art techniques for feature selection, region descriptors, dimensionality reduction and spectral clustering. But the most important part of my internship is the design of a new region descriptor, the Local Orientation Pattern (chapter 3), which provides very interesting properties: very good stability to little changes as described in 3.4.1 and good invariance to usual transformations as said in 3.4.2. Moreover, the Local Orientation Pattern descriptor is simpler than Local Binary Pattern descriptor: we have only 11 dimensions instead of 59; but it is nearly as fast to compute since the mapping has a very low cost. What we miss is a full implementation of the clustering algorithm and so experimental results validating our global approach. Future work The next step is to refine the building of the mapping between LBP and LOP in order to get more meaningful patterns and to increase the stability of LOP. As for LBP in [Ojala et al., 2002], we may try with a higher radius and/or a higher number of directions. Due to the encouraging results with LOP, it would be very interesting to try the descriptor for other applications, like face detection, face authentication or even non-face related applications.

Thanks Big thanks to Nicola Rebagliati for all the time spent working with me. Thanks to Luca Baldassarre for the two weeks in his home, and to the university of Genoa for the other two months. Special thanks to Nicola’s plumber for all the nice stories we got thanks to him. I would also particularly like to thank all the Slipguru group members for their kindness and their welcome. 19

Bibliography [Ahonen et al., 2004] Ahonen, T., Hadid, A., and Pietik¨ainen, M. (2004). Face recognition with local binary patterns. In Proceedings of the 10th European Conference on Computer Vision, Gratz, Austria, pages 469–481. [Arthur and Vassilvitskii, 2007] Arthur, D. and Vassilvitskii, S. (2007). k-means++: the advantages of careful seeding. In SODA ’07: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 1027–1035, Philadelphia, PA, USA. Society for Industrial and Applied Mathematics. [Destrero, 2008] Destrero, A. (2008). Selecting features for face recognition from examples. PhD thesis, Universit` a degli studi di Genova. [Ojala et al., 2002] Ojala, T., Pietik¨ ainen, M., and M¨aenp¨a¨a, T. (2002). Multiresolution grayscale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell., 24(7):971–987. [Ojala et al., 1996] Ojala, T., Pietik¨ ainen, M., and Harwood, D. (1996). A comparative study of texture measures with classification based on feature distribution. Pattern Recognition, 29(1):55–59. [Viola and Jones, 2001] Viola, P. and Jones, M. (2001). Robust real-time face detection. Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on, 2:747–747. [von Luxburg, 2007] von Luxburg, U. (2007). A Tutorial on Spectral Clustering. ArXiv e-prints, 711. [Zhao et al., 2003] Zhao, W., Chellappa, R., Phillips, P. J., and Rosenfeld, A. (2003). Face recognition: A literature survey. ACM Comput. Surv., 35(4):399–458.

20

Appendix A

Mapping between LBP and LOP Here is the complete mapping between the Local Binary Patterns and the Local Orientation Patterns. Beginning with sharps is a representation of the LBP on the 3 × 3 region. On the next line, there are the LBP value and the LOP code associated. The codes are as follows: 0 minimum 10 maximum 5 noise 1 – 4, 6 – 9 the 8 directions (think of the numpad of a keyboard, see figure A.1)

7

8

4 1

9 6

2

3

Figure A.1: Numpad and directions

# 0 0 0 # 0 0 0 # 0 0 0 00000000 0

# 1 1 0 # 0 0 0 # 0 0 0 00000011 8

# 0 1 1 # 0 0 0 # 0 0 0 00000110 8

# 1 0 0 # 0 0 1 # 0 0 0 00001001 0

# 0 0 1 # 0 0 1 # 0 0 0 00001100 6

# 1 0 0 # 0 0 0 # 0 0 0 00000001 0

# 0 0 1 # 0 0 0 # 0 0 0 00000100 0

# 1 1 1 # 0 0 0 # 0 0 0 00000111 8

# 0 1 0 # 0 0 1 # 0 0 0 00001010 9

# 1 0 1 # 0 0 1 # 0 0 0 00001101 9

# 0 1 0 # 0 0 0 # 0 0 0 00000010 0

# 1 0 1 # 0 0 0 # 0 0 0 00000101 8

# 0 0 0 # 0 0 1 # 0 0 0 00001000 0

# 1 1 0 # 0 0 1 # 0 0 0 00001011 8

# 0 1 1 # 0 0 1 # 0 0 0 00001110 9

21

APPENDIX A. MAPPING BETWEEN LBP AND LOP # 1 1 1 # 0 0 1 # 0 0 0 00001111 5

# 1 0 0 # 0 0 1 # 0 0 1 00011001 5

# 1 1 0 # 0 0 0 # 0 1 0 00100011 5

# 1 0 1 # 0 0 1 # 0 1 0 00101101 0

# 1 1 1 # 0 0 0 # 0 1 1 00110111 10

# 0 0 0 # 0 0 0 # 0 0 1 00010000 0

# 0 1 0 # 0 0 1 # 0 0 1 00011010 6

# 0 0 1 # 0 0 0 # 0 1 0 00100100 0

# 0 1 1 # 0 0 1 # 0 1 0 00101110 5

# 0 0 0 # 0 0 1 # 0 1 1 00111000 3

# 1 0 0 # 0 0 0 # 0 0 1 00010001 0

# 1 1 0 # 0 0 1 # 0 0 1 00011011 9

# 1 0 1 # 0 0 0 # 0 1 0 00100101 5

# 1 1 1 # 0 0 1 # 0 1 0 00101111 5

# 1 0 0 # 0 0 1 # 0 1 1 00111001 3

# 0 1 0 # 0 0 0 # 0 0 1 00010010 0

# 0 0 1 # 0 0 1 # 0 0 1 00011100 6

# 0 1 1 # 0 0 0 # 0 1 0 00100110 5

# 0 0 0 # 0 0 0 # 0 1 1 00110000 2

# 0 1 0 # 0 0 1 # 0 1 1 00111010 3

# 1 1 0 # 0 0 0 # 0 0 1 00010011 0

# 1 0 1 # 0 0 1 # 0 0 1 00011101 6

# 1 1 1 # 0 0 0 # 0 1 0 00100111 5

# 1 0 0 # 0 0 0 # 0 1 1 00110001 5

# 1 1 0 # 0 0 1 # 0 1 1 00111011 10

# 0 0 1 # 0 0 0 # 0 0 1 00010100 6

# 0 1 1 # 0 0 1 # 0 0 1 00011110 5

# 0 0 0 # 0 0 1 # 0 1 0 00101000 3

# 0 1 0 # 0 0 0 # 0 1 1 00110010 5

# 0 0 1 # 0 0 1 # 0 1 1 00111100 3

# 1 0 1 # 0 0 0 # 0 0 1 00010101 9

# 1 1 1 # 0 0 1 # 0 0 1 00011111 9

# 1 0 0 # 0 0 1 # 0 1 0 00101001 0

# 1 1 0 # 0 0 0 # 0 1 1 00110011 0

# 1 0 1 # 0 0 1 # 0 1 1 00111101 5

# 0 1 1 # 0 0 0 # 0 0 1 00010110 5

# 0 0 0 # 0 0 0 # 0 1 0 00100000 0

# 0 1 0 # 0 0 1 # 0 1 0 00101010 0

# 0 0 1 # 0 0 0 # 0 1 1 00110100 5

# 0 1 1 # 0 0 1 # 0 1 1 00111110 6

# 1 1 1 # 0 0 0 # 0 0 1 00010111 5

# 1 0 0 # 0 0 0 # 0 1 0 00100001 0

# 1 1 0 # 0 0 1 # 0 1 0 00101011 0

# 1 0 1 # 0 0 0 # 0 1 1 00110101 0

# 1 1 1 # 0 0 1 # 0 1 1 00111111 6

# 0 0 0 # 0 0 1 # 0 0 1 00011000 6

# 0 1 0 # 0 0 0 # 0 1 0 00100010 0

# 0 0 1 # 0 0 1 # 0 1 0 00101100 5

# 0 1 1 # 0 0 0 # 0 1 1 00110110 5

# 0 0 0 # 0 0 0 # 1 0 0 01000000 0

22

APPENDIX A. MAPPING BETWEEN LBP AND LOP # 1 0 0 # 0 0 0 # 1 0 0 01000001 4

# 1 1 0 # 0 0 1 # 1 0 0 01001011 0

# 1 0 1 # 0 0 0 # 1 0 1 01010101 0

# 1 1 1 # 0 0 1 # 1 0 1 01011111 9

# 1 0 0 # 0 0 1 # 1 1 0 01101001 0

# 0 1 0 # 0 0 0 # 1 0 0 01000010 0

# 0 0 1 # 0 0 1 # 1 0 0 01001100 0

# 0 1 1 # 0 0 0 # 1 0 1 01010110 0

# 0 0 0 # 0 0 0 # 1 1 0 01100000 2

# 0 1 0 # 0 0 1 # 1 1 0 01101010 0

# 1 1 0 # 0 0 0 # 1 0 0 01000011 5

# 1 0 1 # 0 0 1 # 1 0 0 01001101 0

# 1 1 1 # 0 0 0 # 1 0 1 01010111 10

# 1 0 0 # 0 0 0 # 1 1 0 01100001 1

# 1 1 0 # 0 0 1 # 1 1 0 01101011 10

# 0 0 1 # 0 0 0 # 1 0 0 01000100 0

# 0 1 1 # 0 0 1 # 1 0 0 01001110 9

# 0 0 0 # 0 0 1 # 1 0 1 01011000 3

# 0 1 0 # 0 0 0 # 1 1 0 01100010 0

# 0 0 1 # 0 0 1 # 1 1 0 01101100 3

# 1 0 1 # 0 0 0 # 1 0 0 01000101 5

# 1 1 1 # 0 0 1 # 1 0 0 01001111 10

# 1 0 0 # 0 0 1 # 1 0 1 01011001 0

# 1 1 0 # 0 0 0 # 1 1 0 01100011 4

# 1 0 1 # 0 0 1 # 1 1 0 01101101 10

# 0 1 1 # 0 0 0 # 1 0 0 01000110 0

# 0 0 0 # 0 0 0 # 1 0 1 01010000 2

# 0 1 0 # 0 0 1 # 1 0 1 01011010 0

# 0 0 1 # 0 0 0 # 1 1 0 01100100 0

# 0 1 1 # 0 0 1 # 1 1 0 01101110 10

# 1 1 1 # 0 0 0 # 1 0 0 01000111 8

# 1 0 0 # 0 0 0 # 1 0 1 01010001 1

# 1 1 0 # 0 0 1 # 1 0 1 01011011 10

# 1 0 1 # 0 0 0 # 1 1 0 01100101 0

# 1 1 1 # 0 0 1 # 1 1 0 01101111 10

# 0 0 0 # 0 0 1 # 1 0 0 01001000 0

# 0 1 0 # 0 0 0 # 1 0 1 01010010 0

# 0 0 1 # 0 0 1 # 1 0 1 01011100 5

# 0 1 1 # 0 0 0 # 1 1 0 01100110 0

# 0 0 0 # 0 0 0 # 1 1 1 01110000 2

# 1 0 0 # 0 0 1 # 1 0 0 01001001 0

# 1 1 0 # 0 0 0 # 1 0 1 01010011 0

# 1 0 1 # 0 0 1 # 1 0 1 01011101 10

# 1 1 1 # 0 0 0 # 1 1 0 01100111 10

# 1 0 0 # 0 0 0 # 1 1 1 01110001 1

# 0 1 0 # 0 0 1 # 1 0 0 01001010 0

# 0 0 1 # 0 0 0 # 1 0 1 01010100 3

# 0 1 1 # 0 0 1 # 1 0 1 01011110 5

# 0 0 0 # 0 0 1 # 1 1 0 01101000 3

# 0 1 0 # 0 0 0 # 1 1 1 01110010 0

23

APPENDIX A. MAPPING BETWEEN LBP AND LOP # 1 1 0 # 0 0 0 # 1 1 1 01110011 10

# 1 0 1 # 0 0 1 # 1 1 1 01111101 3

# 1 1 1 # 1 0 0 # 0 0 0 10000111 5

# 1 0 0 # 1 0 0 # 0 0 1 10010001 0

# 1 1 0 # 1 0 1 # 0 0 1 10011011 10

# 0 0 1 # 0 0 0 # 1 1 1 01110100 5

# 0 1 1 # 0 0 1 # 1 1 1 01111110 6

# 0 0 0 # 1 0 1 # 0 0 0 10001000 0

# 0 1 0 # 1 0 0 # 0 0 1 10010010 0

# 0 0 1 # 1 0 1 # 0 0 1 10011100 6

# 1 0 1 # 0 0 0 # 1 1 1 01110101 10

# 1 1 1 # 0 0 1 # 1 1 1 01111111 10

# 1 0 0 # 1 0 1 # 0 0 0 10001001 0

# 1 1 0 # 1 0 0 # 0 0 1 10010011 7

# 1 0 1 # 1 0 1 # 0 0 1 10011101 10

# 0 1 1 # 0 0 0 # 1 1 1 01110110 10

# 0 0 0 # 1 0 0 # 0 0 0 10000000 0

# 0 1 0 # 1 0 1 # 0 0 0 10001010 0

# 0 0 1 # 1 0 0 # 0 0 1 10010100 0

# 0 1 1 # 1 0 1 # 0 0 1 10011110 9

# 1 1 1 # 0 0 0 # 1 1 1 01110111 10

# 1 0 0 # 1 0 0 # 0 0 0 10000001 4

# 1 1 0 # 1 0 1 # 0 0 0 10001011 5

# 1 0 1 # 1 0 0 # 0 0 1 10010101 10

# 1 1 1 # 1 0 1 # 0 0 1 10011111 8

# 0 0 0 # 0 0 1 # 1 1 1 01111000 5

# 0 1 0 # 1 0 0 # 0 0 0 10000010 7

# 0 0 1 # 1 0 1 # 0 0 0 10001100 0

# 0 1 1 # 1 0 0 # 0 0 1 10010110 10

# 0 0 0 # 1 0 0 # 0 1 0 10100000 1

# 1 0 0 # 0 0 1 # 1 1 1 01111001 2

# 1 1 0 # 1 0 0 # 0 0 0 10000011 7

# 1 0 1 # 1 0 1 # 0 0 0 10001101 10

# 1 1 1 # 1 0 0 # 0 0 1 10010111 7

# 1 0 0 # 1 0 0 # 0 1 0 10100001 5

# 0 1 0 # 0 0 1 # 1 1 1 01111010 5

# 0 0 1 # 1 0 0 # 0 0 0 10000100 0

# 0 1 1 # 1 0 1 # 0 0 0 10001110 9

# 0 0 0 # 1 0 1 # 0 0 1 10011000 0

# 0 1 0 # 1 0 0 # 0 1 0 10100010 0

# 1 1 0 # 0 0 1 # 1 1 1 01111011 10

# 1 0 1 # 1 0 0 # 0 0 0 10000101 5

# 1 1 1 # 1 0 1 # 0 0 0 10001111 8

# 1 0 0 # 1 0 1 # 0 0 1 10011001 10

# 1 1 0 # 1 0 0 # 0 1 0 10100011 5

# 0 0 1 # 0 0 1 # 1 1 1 01111100 3

# 0 1 1 # 1 0 0 # 0 0 0 10000110 8

# 0 0 0 # 1 0 0 # 0 0 1 10010000 0

# 0 1 0 # 1 0 1 # 0 0 1 10011010 10

# 0 0 1 # 1 0 0 # 0 1 0 10100100 0

24

APPENDIX A. MAPPING BETWEEN LBP AND LOP # 1 0 1 # 1 0 0 # 0 1 0 10100101 10

# 1 1 1 # 1 0 1 # 0 1 0 10101111 8

# 1 0 0 # 1 0 1 # 0 1 1 10111001 10

# 1 1 0 # 1 0 0 # 1 0 0 11000011 7

# 1 0 1 # 1 0 1 # 1 0 0 11001101 5

# 0 1 1 # 1 0 0 # 0 1 0 10100110 10

# 0 0 0 # 1 0 0 # 0 1 1 10110000 0

# 0 1 0 # 1 0 1 # 0 1 1 10111010 5

# 0 0 1 # 1 0 0 # 1 0 0 11000100 0

# 0 1 1 # 1 0 1 # 1 0 0 11001110 5

# 1 1 1 # 1 0 0 # 0 1 0 10100111 7

# 1 0 0 # 1 0 0 # 0 1 1 10110001 1

# 1 1 0 # 1 0 1 # 0 1 1 10111011 10

# 1 0 1 # 1 0 0 # 1 0 0 11000101 7

# 1 1 1 # 1 0 1 # 1 0 0 11001111 8

# 0 0 0 # 1 0 1 # 0 1 0 10101000 0

# 0 1 0 # 1 0 0 # 0 1 1 10110010 10

# 0 0 1 # 1 0 1 # 0 1 1 10111100 5

# 0 1 1 # 1 0 0 # 1 0 0 11000110 7

# 0 0 0 # 1 0 0 # 1 0 1 11010000 5

# 1 0 0 # 1 0 1 # 0 1 0 10101001 10

# 1 1 0 # 1 0 0 # 0 1 1 10110011 10

# 1 0 1 # 1 0 1 # 0 1 1 10111101 10

# 1 1 1 # 1 0 0 # 1 0 0 11000111 7

# 1 0 0 # 1 0 0 # 1 0 1 11010001 5

# 0 1 0 # 1 0 1 # 0 1 0 10101010 10

# 0 0 1 # 1 0 0 # 0 1 1 10110100 10

# 0 1 1 # 1 0 1 # 0 1 1 10111110 6

# 0 0 0 # 1 0 1 # 1 0 0 11001000 0

# 0 1 0 # 1 0 0 # 1 0 1 11010010 10

# 1 1 0 # 1 0 1 # 0 1 0 10101011 7

# 1 0 1 # 1 0 0 # 0 1 1 10110101 10

# 1 1 1 # 1 0 1 # 0 1 1 10111111 10

# 1 0 0 # 1 0 1 # 1 0 0 11001001 5

# 1 1 0 # 1 0 0 # 1 0 1 11010011 5

# 0 0 1 # 1 0 1 # 0 1 0 10101100 10

# 0 1 1 # 1 0 0 # 0 1 1 10110110 10

# 0 0 0 # 1 0 0 # 1 0 0 11000000 4

# 0 1 0 # 1 0 1 # 1 0 0 11001010 10

# 0 0 1 # 1 0 0 # 1 0 1 11010100 10

# 1 0 1 # 1 0 1 # 0 1 0 10101101 10

# 1 1 1 # 1 0 0 # 0 1 1 10110111 10

# 1 0 0 # 1 0 0 # 1 0 0 11000001 4

# 1 1 0 # 1 0 1 # 1 0 0 11001011 5

# 1 0 1 # 1 0 0 # 1 0 1 11010101 10

# 0 1 1 # 1 0 1 # 0 1 0 10101110 9

# 0 0 0 # 1 0 1 # 0 1 1 10111000 2

# 0 1 0 # 1 0 0 # 1 0 0 11000010 5

# 0 0 1 # 1 0 1 # 1 0 0 11001100 10

# 0 1 1 # 1 0 0 # 1 0 1 11010110 10

25

APPENDIX A. MAPPING BETWEEN LBP AND LOP # 1 1 1 # 1 0 0 # 1 0 1 11010111 7

# 0 0 0 # 1 0 0 # 1 1 0 11100000 1

# 1 0 0 # 1 0 1 # 1 1 0 11101001 5

# 0 1 0 # 1 0 0 # 1 1 1 11110010 1

# 1 1 0 # 1 0 1 # 1 1 1 11111011 10

# 0 0 0 # 1 0 1 # 1 0 1 11011000 5

# 1 0 0 # 1 0 0 # 1 1 0 11100001 5

# 0 1 0 # 1 0 1 # 1 1 0 11101010 1

# 1 1 0 # 1 0 0 # 1 1 1 11110011 4

# 0 0 1 # 1 0 1 # 1 1 1 11111100 2

# 1 0 0 # 1 0 1 # 1 0 1 11011001 5

# 0 1 0 # 1 0 0 # 1 1 0 11100010 4

# 1 1 0 # 1 0 1 # 1 1 0 11101011 4

# 0 0 1 # 1 0 0 # 1 1 1 11110100 2

# 0 1 0 # 1 0 1 # 1 0 1 11011010 5

# 1 1 0 # 1 0 0 # 1 1 0 11100011 4

# 0 0 1 # 1 0 1 # 1 1 0 11101100 10

# 1 0 1 # 1 0 0 # 1 1 1 11110101 1

# 1 1 0 # 1 0 1 # 1 0 1 11011011 10

# 0 0 1 # 1 0 0 # 1 1 0 11100100 1

# 1 0 1 # 1 0 1 # 1 1 0 11101101 10

# 0 1 1 # 1 0 0 # 1 1 1 11110110 10

# 0 0 1 # 1 0 1 # 1 0 1 11011100 5

# 1 0 1 # 1 0 0 # 1 1 0 11100101 4

# 0 1 1 # 1 0 1 # 1 1 0 11101110 10

# 1 1 1 # 1 0 0 # 1 1 1 11110111 10

# 1 0 1 # 1 0 1 # 1 0 1 11011101 10

# 0 1 1 # 1 0 0 # 1 1 0 11100110 5

# 1 1 1 # 1 0 1 # 1 1 0 11101111 10

# 0 0 0 # 1 0 1 # 1 1 1 11111000 2

# 0 1 1 # 1 0 1 # 1 0 1 11011110 10

# 1 1 1 # 1 0 0 # 1 1 0 11100111 4

# 0 0 0 # 1 0 0 # 1 1 1 11110000 5

# 1 0 0 # 1 0 1 # 1 1 1 11111001 2

# 1 1 1 # 1 0 1 # 1 0 1 11011111 10

# 0 0 0 # 1 0 1 # 1 1 0 11101000 5

# 1 0 0 # 1 0 0 # 1 1 1 11110001 1

# 0 1 0 # 1 0 1 # 1 1 1 11111010 2

26

# 1 0 1 # 1 0 1 # 1 1 1 11111101 10 # 0 1 1 # 1 0 1 # 1 1 1 11111110 10 # 1 1 1 # 1 0 1 # 1 1 1 11111111 10

Appendix B

Data set for the LOP stability tests We choose 8 very close images of the same person. All these images come from the same sequence, so they are taken nearly exactly in the same conditions.

27