Realtime Kernel based Tracking - Core

[4] to a parametric regression model using non linear basis functions. Kernel .... a kernel function, applied to δz and a basis vector denoted δ∗m z . Let W = (w1 ...
14MB taille 1 téléchargements 402 vues
Electronic Letters on Computer Vision and Image Analysis 8(1):27-43, 2009

Realtime Kernel based Tracking T. Chateau∗ and J.T. Laprest´e∗ ∗

Lasmea, CNRS/Blaise-Pascal University, Clermont-Ferrand, France Received 26 May 2008; accepted 13 January 2009

Abstract We present a solution for realtime tracking of a planar pattern. Tracking is seen as the estimation of a parametric function between observations and motion and we propose an extension of the learning based approach presented simultaneously by Cootes and al. and by Jurie and Dhome. We show that the hyperplane classic algorithm is a specific case of a more generic linearly-weighted sum of fixed non-linear basis functions model. The weights associated to the basis functions (kernel functions) of the model are estimated from a training set of perturbations and associated observations generared in a synthetic way. The resulting tracker is then composed by several iterations on trackers learned with coarse to fine magnitude of perturbations. We compare the performance of the method with the linear algorithm in terms of accuracy and convergence frequency. Moreover, we illustrate the behaviour of the method for several real toy video sequences including different patterns, motions and illumination conditions, and for several real video sequences sampling from rear car tracking databases. Key Words: realtime planar tracking, kernel based regression functions, rear-car tracking

1

Introduction

Tracking a planar textured pattern is a popular topic in computer vision [2]. The aim is to estimate the unknown motion (generally expressed by a homography) of the pattern observed into a video sequence. Solutions to this problem are classified according to the video information available at the estimation time. First approaches, called offline use all the video information to solve the motion estimation problem. In this case, global optimization techniques can be applied. In [6], the author presents an optimization solution to pedestrian trajectories estimation from a video sequence. Moreover, in [16], a multiple target tracking system is proposed, into a global Monte-Carlo framework. Second approaches are called online and only past and present video information are available. Real-time tracking are online methods where the result of the motion estimation must be produced before the next video frame. The method presented in this paper is a real-time method. Generally, real-time tracking can be modelized with a regression function between observations (the image) and a motion model of the template to be tracked. Some methods (called model based approaches) rely on an analytical model of the regression function. In [8], the authors compute a first order approximation of the relation between the observations (luminance) and the motion model and propose to use the Jacobian matrix Correspondence to: Recommended for acceptance by ELCVIA ISSN:1577-5097 Published by Computer Vision Center / Universitat Aut`onoma de Barcelona, Barcelona, Spain

28

Chateau et al. / Electronic Letters on Computer Vision and Image Analysis 8(1):27-43, 2009

to define the regression function. Recently, Benhimane et al. [3] extend the method to a second order approximation of the Hessian matrix. Other methods (called learning based approaches), use a parametric form for the regression function where the parameters are estimated from a training set of motions and associated observations. Jurie and Dhome and [10] and Cootes and al. [4] have simultaneously proposed a first order hyperplane model for the regression function where the parameters of the linear matrix between motion and observations are estimated using a least-square error criteria computed from a training set. Generally, image features used are normalized luminance of pixels. However, In [11], Chateau et al. use Haar-like wavelets to describe the image. We propose to extend the learning based approach presented simultaneously by Jurie and Dhome [10] and Cootes and al. [4] to a parametric regression model using non linear basis functions. Kernel based regression functions have been used recently for tracking objects but using simple motion models (translation/scale). The pioneering work is the one of Avidan [1] (support vector tracking) which uses the output of an SVM based regression function to perform a tracking task. The idea is to link the SVM scores with the motion of the pattern between two images shots. This method provides a way to track classes of objects. No model of the current object is learnt but the classifier uses a generic model learnt offline. Williams [15] proposes a probabilistic interpretation of SVM. He presents a solution based on RVM (relevance vector machine) ([13]), combined with a Kalman temporal filtering. RVM is used to link the image luminance measure to the relative motion of the object with a regression relation. Recently, Thayananthan and al. [12] have presented a learning based approach to track articulated human body motion extending the RVM kernel based machine to multivariate MVRVM. This paper is organized as follows. Section two deals with the general framework of visual tracking. Section three presents the regression function proposed: a kernel based parameter model where parameters are estimated from a training set. The resulting solution is compared with the classical one (hyperplane approximation) in section four, in terms of accuracy and convergence frequency related to several relevant parameter such as the magnitude of perturbations chosen for the learning step or noisy obsevations. Moreover, we illustrate the behaviour of the method for several real toy video sequences including different patterns, motions and illumination conditions, and for several real video sequences sampling from rear car tracking databases.

2

Visual Tracking

We present a generic solution for pattern tracking based on the definition on a regression function between a motion model and the variation of the template appearance. Let us define Ik be an image extracted from a video sequence at time k, and W a planar pattern to be tracked, defined by four corner points into the image. Let us define a state associated to the image position of the pattern by: . pk = (p1k , p2k , p3k , p4k ), a vector composed by the four corners of the pattern, where pik = (uik , vki )t denotes the position of pik expressed into the image based reference frame. ˆ k of the state system, for each new image of the Temporal matching of W can be seen as the estimation p video sequence. It can be realized in a iterative way, from the motion estimation δpk of the four template corners between two successive images: ˆ k = pk−1 + δpk p (1) Let z(Ik , W, pk ), be an observation function which provides a feature vector associated to W for the position defined by the state pk in the image at time k (for example the luminance computed on a sub-sampling grid of W). A direct consequence of the so-called image constancy assumption can be expressed as follows: ∀i, j ∈ {1, .., K}, z(Ii , W, pi ) = z(Ij , W, pj ) = z∗W

(2)

Chateau et al. / Electronic Letters on Computer Vision and Image Analysis 8(1):27-43, 2009

29

with K is the number of images of the video-sequence. z∗W is a feature vector extracted to the pattern in first image. Now, the variation of the observations between two successive images, given the previous state, is defined by: . (3) δzk = z(Ik , W, pk−1 ) − z(Ik−1 , W, pk−1 ) Using (2), the previous equation becomes: δzk = z(Ik , W, pk−1 ) − z∗W

(4)

We propose to link motion δpk and observation variation δzk by a regression function: δpk = f (δzk ; wk ) + k

(5)

where k denotes a random noise and wk is the vector of parameters of the regression function. Since this formulation depends on time (k), parameters must be estimated at each iteration. The idea is to find a new relation, in which the parameters of the regression function have to be estimated only for the first image of the sequence. Let Hk be the homography between the position of the four corners of the pattern  to be tracked and a canonical t t t t reference frame. pk is projected into Pk = P0 = (0, 0) , (0, 1) , (1, 1) , (1, 0) with : ˜0 = P ˜ k ∝ Hk .˜ P pk

(6)

˜ k is introduced to define pk with homogeneous coordinates. Hk is computed simply from matching Notation p P0 with pk and solving the resulting linear system [9]. The variation of the state vector can be expressed in the canonical reference frame by: δ˜Pk = Hk−1 .δ˜p k (7) δPk is the variation of the projection of pk−1 into the canonical reference frame using the previous homography Hk−1 . We propose to link δPk with the variation of the observation by the following regression function: δPk = F(δzk ; w) + εk

(8)

where εk is a random noise. In this expression, the parameter vector w to be estimated does not depend on time. So, it can be estimated once for all frames of the sequence. The resulting template tracking method is summarized into the algorithm 1. and, Figure 1 illustrates the principle of the method.

3 3.1

Kernel based Regression Functions Learning

A key point associated with the method proposed in the previous section concerns the model of the regression function F and the estimation of the associated vector of parameters w. F can be either linear [8] [10], or non-linear [3]. The vector of parameters w can be estimated either in an analytical way (estimation of the Jacobian matrix [8] or a second order matrix in [3]), or using machine learning techniques [10]. . (n) (n) Let V = {δ P , δz (n) } be a learning set built from random motions {δ P }N n=1 of the pattern (projected into a reference frame), associated to the variations of the feature vector (observation) {δz (n) }N n=1 . The learning motions are applied to the reference image (generally the first image of the video sequence). Jurie and Dhome propose to to use a hyperplane model for the regression function: F(δz ; W) = Wδz

(9)

30

Chateau et al. / Electronic Letters on Computer Vision and Image Analysis 8(1):27-43, 2009

Image (k-1)

Image (k)

δ pk

(0, 1)

δ Pk

(1, 1)

Hk

Hk−1 (0, 0)

(1, 0)

Figure 1: Illustration of the tracking method. The reference pattern, at time k is projected into a canonical reference frame using the homography Hk−1 . A regression function estimates the motion (δ pk ) between k − 1 and k according to the observed variation. The motion of the pattern is then simply given by δ˜Pk = Hk−1 .δ˜p k and the homography is updated. The parameter matrix W is learnt from the training set V by minimizing a ”least-square” error measure. We propose to use models which are a linearly-weighted sum of M fixed non-linear basis functions: F(δz ; W) = W.φ(δz ) =

M X

wm φm (δz )

(10)

φ(δz ) = [φ1 (δz ), φ2 (δz ), ..., φM (δz )]T

(11)

m=1

φ(δz ) denotes a vector of M basis functions:

and with φm (δz ) = k(δz , δz∗m ), a kernel function, applied to δz and a basis vector denoted δz∗m . Let W = (w1 , w2 , ..., wM ) be a matrix of parameters associated to the M basis functions. The objective is to find values for W such that F(δz ; W) makes good predictions for new data: i.e. it models the underlying generative function. A classic approach to estimating W is ”least- square”, minimization of the error measure: ED (W) =

N 2 1 X (n) δP − W.φ(δz (n) ) 2

(12)

n=1

This can be rewritten in the following system:     (1) (2) (N ) δ P , δ P , ..., δ P = W φ(δz (1) ), φ(δz (2) ), ..., φ(δz (N ) )  . . Let denote Φ = φ(δz (1) ), φ(δz (2) ), ..., φ(δz (N ) ) and ∆P = pressed under a more compact form: ∆P = W.Φ



(1)

(2)

(N )

δ P , δ P , ..., δ P



(13) ; the system can be ex(14)

31

Chateau et al. / Electronic Letters on Computer Vision and Image Analysis 8(1):27-43, 2009

Algorithm 1 Tracking Input : state p0 , image I0 and regression function F Output : set of states {pk }K k=1 . Initialisation : k = 0, extraction of the reference feature vector z∗W = z(I0 , W, p0 ) and estimation of the canonical homography H0 for k = 1 to K (loop on the images of the video sequence) do Observation : Extraction of the feature vector δzk = z(Ik , W, pk−1 ) − z∗W Estimation : Estimation of the motion into the canonical reference frame, then into the image reference frame : δPk = F(δzk ; w) ˜ δ˜p k ∝ H−1 k−1 .δ P k Update : state vector and homography pk = pk−1 + δpk Hk , Homography between P0 and pk end for and the estimation of the parameter matrix W using (12) is given by:∗ WLS = ∆P Φ+ ,

(15)

Alternative methods based on the ”least-square” criterion can be used to estimate the parameter matrix W. A solution is to place a prior over W in order to set many weights to zero. The resulting model is then called sparse linear model. SVM (Support Vector Machine) [14] is a sparse linear model where the weights are estimated by the minimization of a Lagrange multipliers based functional. Other sparse linear models, like RVM (Relevance Vector Machines) [13] or multivariate RVM [12] may also be employed. Generally, vectors used in basis functions (δz∗m ) can be chosen from the training set. It is also possible to use the entire training set and in this case N = M . Moreover, we can notice that the hyperplane model presented in eq. (9) is a special case of the model (10) with linear basis functions φm (δz ). We make the common choice to use Gaussian data-centred basis functions: h i 2 (16) φm (δz ) = exp − δz − δz∗m /σ 2 , which gives us a ”radial basis function” (RBF) type model from which the parameter σ must be adjusted. On one hand, if σ is too small, the ”design matrix” Φ is mostly composed of zeros. On the other hand, if σ is too large, Φ is mostly composed of ones (exp(0)). We propose to adjust σ using a non-linear optimization maximizing a cost function based on the sum of the variances computed for each line of Φ: σ = arg max[C(σ)] σ

with C(σ) =

N X M  X

 (n) (n) 2 φm (δ z ) − φ(δ z )

(17)

(18)

n=1 m=1

and (n)

φ(δ z ) =

M 1 X (n) φm (δ z ) M m=1

∗ +

Φ denotes the pseudo-inverse of Φ.

(19)

32

Chateau et al. / Electronic Letters on Computer Vision and Image Analysis 8(1):27-43, 2009

An overview of the learning method proposed in this section, called KBT (Kernel based Machine Learning Tracker) is given in algorithm 2. The learning method is called L times, for different values of b, following a coarse to fine scheme [5, 7] Algorithm 2 KBT: learning step Input : Size of the training set N , training parameter b, number of basis functions M , initial homography H0 , initial std. of the kernel function σ0 . Output : parameter matrix W and std. of the kernel function σ. Pattern motion generation : A set of N motion vectors is drawn according to a uniform law on the interval (n) (n) ∼ U(−b, b). [−b, b]: {δ P }N n=1 , P Observation : Compute {δz (n) }N n=1 such as : δz (n) = z(I0 , W, p(n) ) − z∗W , with

˜ ˜ (n) ∝ H−1 p 0 P

(n)

.

(n) }N . Draw basis functions: draw {δz∗m }M n=1 m=1 , using an uniform law from {δz estimation of the kernel function parameter σ : non-linear optimization of σ, such as: (N M ) 2 XX (n) (n) σ = arg max φm (δ z ) − φ(δ z ) σ

n=1 m=1

estimation of the weight (parameter) matrix W: W = ∆P .(ΦT (ΦΦT )−1 )  . with Φ = φ(δz (1) ), φ(δz (2) ), ..., φ(δz (N ) )

3.2

Tracking

The learning step provides, for each training level l: 1. The weight matrix Wl M 2. The set of basis functions {δ ∗m z,l }m=1

3. The width parameter associated with the kernel function: σl . P These parameters are then used to build the regression function F(δz ; Wl ) = M m=1 wm,l φm,l (δz ). Moreover, it is possible to call the regression function several times in order to increase the tracking precision. The resulting method is presented in algorithm 3.

4

Experiments

This section presents the experiments achieved in order to compare the proposed method to the reference linear algorithm. After a description of the datasets and the methodology, experimental results are presented and then discussed.

Chateau et al. / Electronic Letters on Computer Vision and Image Analysis 8(1):27-43, 2009

33

Algorithm 3 KBT: tracking step M Input: Regression function parameters: Wl , {δ ∗m z,l }m=1 , initial state p0 and reference image I0 Output: set of the states {pk }K k=1 . Initialisation: k = 0, extract reference measure vector z∗W = z(I0 , W, p0 ) and compute the initial homography H0 for k = 1 to K (loop on the images) do p0 = pk−1 and H0 = Hk−1 for l = 1 to L (loop on the levels) do for i = 1 to I (loop for each level) do Observation: features vector extraction δz = z(Ik , W, p0 ) − z∗W Estimation : Motion estimation into the canonical reference frame, then into the image reference frame: M X 0 δP = wm,l φm,l (δz ) m=1 0 δ˜p ∝ (H0 )−1 .δ˜P 0

Update: state vector and homography p0 = p0 + δp 0 Estimate H0 , solving P˜0 ∝ H0 .˜ p0 end for end for Final update: pk = p0 and Hk = H0 end for

34

Chateau et al. / Electronic Letters on Computer Vision and Image Analysis 8(1):27-43, 2009

4.1

Datasets and Methodology

Methodology for evaluation of region based tracking methods has been already proposed for both rigid [10, 3, 11, 8] and non rigid objects [4, 7]. Experiments done can be classified in three categories: 1. Accuracy of motion parameters estimation. The aim of this experiment is to show the estimated variation of motion parameters related to the true variation. Experimental data used for this test are generated from a static image. Virtual motion parameters variation and a synthetic resulting region are achieved and stored in a ground truth database. Motions are usually generated in a marginalized way (the variation is applied for one parameter, a translation coordinates for example). The tracking algorithm is then tested using the generated dataset and the motion parameter estimation is compared with the true estimation. Several tests are usually achieved related to the most relevant parameters of the tracking algorithm. 2. convergence frequency. The aim of this test is to compute the convergence frequency of tracking algorithms. In [3], the algorithm diverges when the final SSD motion error is bigger than the initial SSD. A database containing motions and the resulting synthetic region is generated and then used to compute the rate of convergence of the algorithm. This frequency is then presented compared to the amplitude variation of the motion parameters , or compared to the appearance perturbations like illumination variations (affine or gaussian for example). 3. illustration on real sequences. The aim of the experiment is to illustrate the behaviour of the algorithm for several real sequences with illumination variation, clutter background, or partial occlusion. Since ground truth can not be known for such sequence, key samples of the video are extracted and presented with the superimposed tracking region. The methodology proposed here is close to the one already proposed to evaluate similar methods. We compare two algorithms in terms of accuracy and convergence rate for a synthetic image. Moreover three algorithms have been compared on real videos. Two classical algorithms: • LBT: the Linear Based Tracker algorithm proposed in [10]. • ESM: The ESM visual tracking software is based on a fast optimization technique called ESM (Efficient Second-order Minimization). The ESM technique has been proposed for improving standard visual servoing techniques. Thanks to its generality, it has been extended to improve template-based visual tracking techniques [3]. Since we have used the ESM M ATLAB (TM) toolkit provided by the author, the method has been used with default parameters. The proposed algorithm: • KBT : the Kernel Based Tracker. Since LBT and KBT are very close, several experiments compare the two algorithms in relation with their most relevant parameters. The same learning dataset is used for both algorithms. The latter is generated from random (gaussian law) disturbance of motion parameters (variation of the position of the four corners or the patterns). In the following, we define b as the standard deviation of the gaussian law in percentage of the tracked pattern width. The learning step of KBT can be achieved by two methods: least square linear optimization or multivariate relevant vector machine learning. The two algorithms have been tested and results obtained are very close ; so the following results are achieved with the second approach because the learning step is much longer for MVRVM. The observation function is based on the luminance of pixels extracted from a regular sampling of the pattern to track. In the following experiments, the size of the sampling grid is 15×15 pixels (225 points). The resulting

Chateau et al. / Electronic Letters on Computer Vision and Image Analysis 8(1):27-43, 2009

35

feature vector is then zero-normalized in order to be invariant according to affine luminance transformations. Moreover, all the tests presented here have been realized with a coarse to fine learning strategy using three levels and three loops per level.

4.2

Results

In order to assess the algorithms in different controlled conditions, we synthesize images from a template. Moreover, LBT and KBT algorithms have been implemented on a PC Desktop (PIV 3.2GHz) and run at a 6 ms by image for KBT and 3 ms for LBT. The first test shows the ability of the method to estimate a known variation (cf. fig. 2). A horizontal displacement of the pattern is generated from a static image. The training step is realized for three levels (L = 3), with N = 400 and M = N basis functions. Moreover, three iterations are applied to the tracker (I = 3). The figure shows the mean translation error related to the translation amplitude (% of the pattern width), for four values of the training parameter b (disturbance magnitude used to build the training set for the first level). The figure shows the accuracy of LBT and KBT algorithm. The second test shows the convergence frequency of for LBT and KBT, in relation with the disturbance magnitude used for the training set. We define a successful convergence with a threshold on the quadratic error between the estimated position (the four corners of the pattern to track) and the real position. A random perturbation (gaussian law) is applied to the four corners of the pattern and a resulting synthetic dataset is generated using one thousand realisations of this process. Figure 3 shows the convergence rate according to the quadratic motion generated, and for b = 0.1 (training parameter) for the left sub figure and b = 0.2 for the right sub figure. The third test compares the convergence frequency for LBT and KBT, in relation with noisy observations. A subset of the observation vector z(Ik , W, pk−1 ) (randomly selected) has been replaced by a random value drawn from an uniform distribution between 0 and 255 (the gray level range). Figure 6 shows the convergence rate for LBT and KBT related to the percentage of noisy features. The two above described algorithms have been compared on three sets of real data. For the first set (see fig. 7), the two methods have been compared with ESM. The figure shows the output of the three algorithms for key images of the video. The toy video selected provides high rotations and scale variations. The pattern to be tracked is selected on the first image. The second set is composed by five other toy sequences, with several patterns, motions and illumination conditions. Figure 6 shows the results of the tracking process for 4 sampled images of each sequence. The ”Box” sequence is quite simple, with no specularity, slow motion and scale variation. ”Crisp 1” and ”Crisp 2” sequences contain strong rotations with blur and specularities. ”Lipton 1” and ”Lipton 2” sequences provides large scale variations. Moreover, for ”Lipton 2”, the learning step is achieved using a low resolution pattern. For each sequence, we measure the number of successfully tracked frames by each method before divergence of the algorithm. Table 1 summarizes the resulting frequency rates. The third set is composed by four real video sequences extracted from the PETS† dataset which provides traffic videos from a camera embedded within a vehicle. One of the related application is collision avoidance or automatic cruise control using rear car vision based tracking. Figure 7 shows the results of the tracking process for 4 sampled images of each sequence and table 1 summarizes the computed frequency rates.

4.3

Discussion

The accuracy test (2) shows that for small training amplitudes b = 0.1 and b = 0.2, motion translation estimation is correct for values within the training area (lower than the training amplitude b) and accuracy is the similar for LBT and KBT. For high translation, the error increases for both the LBT and the KBT method. For a training amplitude b = 0.3, the LBT fails while the KBT still provides correct translation estimation. The †

Performance Evaluation of Tracking and Surveillance

36

Chateau et al. / Electronic Letters on Computer Vision and Image Analysis 8(1):27-43, 2009

Seq. name (#nb. frames)

Box (#198) Crisp 1 (#200) Crisp 2 (#249) Lipton 1 (#144) Lipton 2 (#341)

LBT Nb. (%) of successfully tracked frames 191 (96%) 15 (7%) 165 (66%) 17 (12%) 107 (31%)

KBT Nb. (%) of successfully tracked frames 198 (100%) 200 (100%) 249 (100%) 144 (100%) 267 (78%)

Table 1: Convergence frequency of LBT and KBT for five toy sequences.

Seq. name (#nb. frames)

Vehicle. 1 (#495) Vehicle. 2 (#458) Vehicle. 3 (#765) Vehicle. 4 (#97)

LBT Nb. (%) of successfully tracked frames 60 (12%) 57 (12%) 0 (0%) 7 (7%)

KBT Nb. (%) of successfully tracked frames 495 (100%) 198 (43%) 180 (23%) 97 (100%)

Table 2: Convergence frequency of LBT and KBT for four sequences related to rear car tracking.

first order approximation used for the LBT learning step is only correct for small motions. Since the KBT is a non-linear model, it is possible to learn the non linear regression function for large motions (until b = 0.4). The convergence rate test presented in fig. 3 shows that KBT has higher convergence rates than the LBT algorithm. The convergence rate of the LBT decreases because the first order approximation made in the method is correct only for small motion. These results are linked to the accuracy test. Since KBT can learn higher motions than LBT, the convergence area is higher for KBT than for LBT. It results a higher convergence basin for KBT. Fig. 3 shows that the convergence rate of the LBT decreases to 50% with only 0.5% of noisy features. The KBT provides a higher convergence rate than the LBT with 50% of convergence for 4.4% of noisy features. The reason is that when a noisy observation occurs, the vector δz produces high values. For the LBT, δz is directly multiplied with the interaction matrix and generates high displacements. For the KBT,δz is used in kernel functions and the resulting scalar vector is often small; so the motion generated is only slightly modified by the noisy observation. This is an important property of the KBT because if the features are far from the features used in the learning dataset, the estimated motion is near zero. Both the accuracy and the convergence tests show that KBT has good performances compared to LBT. In order to illustrate these results on real data, six video sequences have been chosen, with different conditions. Table 1 reports the number of successfully tracked frames by each method before divergence. KBT algorithm give a higher successfully tracked frames rate than LBT. The last experiment illustrates the performance of the method for a rear vehicles tracking application. Figure 7 shows the results of the tracking process for 4 sampled images of each sequence. Table 2 shows the number of successfully tracked frames by each method before divergence of the algorithm. For all the sequences, the frequency rate of the KBT method is higher than the frequency rate of LBT.

Chateau et al. / Electronic Letters on Computer Vision and Image Analysis 8(1):27-43, 2009

5

37

Conclusion

We have proposed an extention of the linear based planar pattern tracking algorithm to non-linear models. The learning based framework provides an efficient way to build a regression function between observation and motion. Moreover, an empirical rule is proposed to estimate the parameters of the kernel function in order to give an informative design matrix. This method has been implemented using simple gray-level features and experiments have been performed to compare it to the linear algorithm in terms of accuracy and frequency convergence. For small learning magnitudes, accuracy is the similar for LBT and KBT. Moreover, frequency basin is higher for KBT than for LBT. However, further tests on a large number of patterns are needed to establish this firmly. The algorithm has been also tested for realtime rear car tracking, a necessary subtask for applications like vision based automatic cruise control or vision based collision avoidance. Results indicates that KBT has better convergence frequency than LBT for the sampled video sequences. Future works will be done in order to improve robustness against noisy data provided by partial occlusion or specularities. Since the method works at 6 ms for one image, more complex kernel function may be used (robust kernel functions).

References [1] S. Avidan. Support vector tracking. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’2001), Hawaii, December 2001. [2] S. Baker and I. Matthews. Lucas-Kanade 20 years on : A unifying framework. IJCV International Journal on Computer Vision, 56(3):221–255, 2004. [3] S Benhimane and E Malis. Real-time image-based tracking of planes using efficient second-order minimization. In IEEE/RSJIROS, Japan, October 2004. [4] T.F. Cootes, G.J. Edwards, and Taylor C.J. Active appareance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6):681–685, 2001. [5] C. Dehais, M. Douze, V. Charvillat, and G. Morin. Augmented reality through real-time tracking of video sequences using a panoramic view. Int. Conf. on Pattern Recognition, 2004. [6] F. Fleuret, J. Berclaz, R. Lengagne, and P. Fua. Multi-camera people tracking with a probabilistic occupancy map. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007. [7] V. Gay-Bellile, A. Bartolli, and P. Sayd. Feature-driven non-rigid image registration. In BMVC, British Machine Vision Conference. Warwick, United Kindom, September 2007. [8] G.D. Hager and P.N. Belhumeur. Efficient region tracking with parametric models of geometry and illumination. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(10):1025–1039, October 1998. [9] R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, ISBN: 0521540518, second edition, 2004. [10] F. Jurie and M. Dhome. Real time template matching. In International Conference on Computer Vision, pages 544–549, Vancouver, Canada, July 2001. [11] T. Chateau, F. Jurie, M. Dhome, and X. Clady. Real-time tracking using Wavelets Representation. In Symposium for Pattern Recognition, DAGM’02,, pages 523–530, Zurich, September 2002. Springer.

38

Chateau et al. / Electronic Letters on Computer Vision and Image Analysis 8(1):27-43, 2009

[12] A. Thayananthan, R. Navaratnam, B. Stenger, P. Torr, and R. Cipolla. Multivariate relevance vector machines for tracking. In ECCV, European Conference on Computer Vision, 2006. [13] M.E. Tipping. Sparse Bayesian learning and the relevance vector machine. Journal of Machine Larning Research, 1:211–244, 2001. [14] V.N. Vapnik. Statistical Learning Theory. John Wiley and Sons, 1998. [15] O. Williams, A. Blake, and R. Cipolla. A sparse probabilistic learning algorithm for real-time tracking. pages 353–361, Nice, France, 2003. [16] Q. Yu, G. Medioni, and I. Cohen. Multiple target tracking using spatio-temporal markov chain monte carlo data association. In International Conference on Computer Vision and Pattern Recognition (CVPR), 2007.

Chateau et al. / Electronic Letters on Computer Vision and Image Analysis 8(1):27-43, 2009

b=0.1

Translated pattern Reference pattern

Error /pixels

0

−50

b=0.2

50

Error /pixels

50

−0.5 -50

0

0.5 50

0

−50

−0.5 -50

0

0.5 50

Translation / % of the width of the pattern b=0.3

b=0.4

Error /pixels

50

Error /pixels

50

0

−50

−0.5 -50

0

0.5 50

0

−50

−0.5 -50

0

Translation / % of the width of the pattern

LBT

0.5 50

KBT

Training area

Figure 2: Comparison of the LBT and the KBT in terms of accuracy, against horizontal displacement magnitude, and for four different values of the training parameter b.

39

40

Chateau et al. / Electronic Letters on Computer Vision and Image Analysis 8(1):27-43, 2009

Training parameter b=0.1

Training parameter b=0.2

1

1 HAT LBT KMTL KBT

0.8

0.8

0.7

0.7

0.6 0.5 0.4

0.6 0.5 0.4

0.3

0.3

0.2

0.2

0.1

0.1

0 0

0.2

0.4 0.6 0.8 Motion amplitude

HAT LBT KBT KMLT

0.9

Convergence rate

convergence rate

0.9

1

0 0

0.2

0.4 0.6 0.8 Motion amplitude

1

Figure 3: Comparison of the LBT and the KBT method in terms of convergence frequency, against the displacement magnitude

1 KBT LBT

0.9

Convergence rate

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 0%

2 0.5%

3 4 5 4.4% 1.3% 2.2% Percentage of noisy features

6 9%

7 18%

Figure 4: Comparison of the LBT based tracker and the KBT method against noisy observations.

Chateau et al. / Electronic Letters on Computer Vision and Image Analysis 8(1):27-43, 2009

LBT

KBT

ESM

Figure 5: Comparison of three tracking algorithms, on a real video sequence: LBT for the left column, KBT for the median column and ESM for the right column. Default parameters have been used for ESM. Morevover 1000 samples have been generated for the training step of KHT and KBT and 100 basis functions have been used for KBT.

41

42

Chateau et al. / Electronic Letters on Computer Vision and Image Analysis 8(1):27-43, 2009

X Box

X

Crisp.1

X

X

Crisp.2

X

X

ZOOM

X

Lipton 1

X

X

ZOOM

ZOOM

X

X

Lipton 2

LBT: OK

X

LBT: ERROR

X X

KBT: OK

X

KBT: ERROR

Figure 6: Comparison of the LBT and the KBT for five sequences related to rear car tracking.

43

Chateau et al. / Electronic Letters on Computer Vision and Image Analysis 8(1):27-43, 2009

X

X

X ZOOM

Vehicle 1

X ZOOM

ZOOM

ZOOM

X X

X Vehicle 2

ZOOM

X

ZOOM

X

ZOOM

X X

X Vehicle 3

X

X

X X

Vehicle 4 LBT: OK

X

LBT: ERROR

KBT: OK

X

KBT: ERROR

Figure 7: Comparison of the LBT and the KBT for five sequences related to rear car tracking.