Cina MOTAMED, Olivier WALLART - CiteSeerX

is performed by a co-operative distributed architecture based on a ..... model is centred at the instant mt with a core and a base of 2.σt and 3.σt ..... NOVA Science Book, 2001. ... [10] B. Reid, “An algorithm for tracking multiple targets”, IEEE.
762KB taille 2 téléchargements 42 vues
A Cooperative Distributed Vision Algorithm for Wide Area Vehicle Tracking Cina MOTAMED, Olivier WALLART Laboratoire LASL, Université du Littoral Côte d'Opale, Bat 2, 50 Rue F. Buisson, 62228 Calais, France. [email protected]

Abstract In this paper we address the design of a distributed vision system for vehicle surveillance in a freeway. The tracking is performed by re-identification of objects perceived by fixed distant cameras without recovery of their fields of view. The cross-camera tracking of objects is performed by a co-operative distributed architecture based on a multi-agent society framework. The data association is distributed over Local Vision Systems (LVS) and is controlled by a temporal fusion scheme. Decisions combine temporal and visual information. The 3D dimension and the normalised colour histogram of the object constitute the visual information. Temporal constraints based on the acceleration between nodes, are updated continuously with respect to the observed traffic behaviour. These constraints maintain a dynamic lifespan for all tracked objects. Management of the uncertainty represents an important component of the system. Statistical algorithms have been used for low level decisions and a possibilistic approach permits to control the data association stage.

1. Introduction The development of distributed vision systems for monitoring or surveillance of sites is an interesting field of investigation. Indeed, motivations are multiple and concern various domains as diverse as monitoring of protected sites, control and estimation of flows (freeway, airport, port) and continuous coverage over large battlefield areas. Because of the rapid evolution in the fields of data processing and network communications, such applications become possible. For several years, distributed multi-camera systems have been developed for wide area traffic monitoring. The first configuration consists in completely covering a scene by a set of cameras with adjacent Fields Of View (FOV) [4]. Many people from the computer vision community have worked on the geometrical aspect of this configuration. By using several overlapping calibrated

cameras, the system generates a global virtual view of the scene. The use of multiple views in the tracking process provides the ability to resolve a part of occlusion situations. A second configuration, less explored, is based on a network of non-overlapping cameras. This second configuration is economically attractive because it permits the efficient reduction of the number of sensors. The system has to recognise objects from one sensor to another (cross-camera tracking). However, the incomplete coverage makes the tracking problem more difficult. The main difficulty is the establishment of correspondence between objects captured by multiple sensors. We have focused our work on this second configuration. Our primary objective is the development of a robust architecture for surveillance applications by using a set of interconnected monocular vision systems without recovery of their fields of view. The principal task of the proposed perception system becomes the cross-camera tracking of specific objects during their movement. Such a capability is interesting for several security tasks: • Global tracking of critical conveys for military, bank transfer, prisoner transfer, dangerous chemical container. • Detection and tracking of dangerous drivers; tracking of thieves, terrorists … Section 2 presents the cross-camera tracking problem and the associated research. In the Section 3, we present the global strategy of our distributed tracking system. Section 4 exposes the image processing stage for object detection and visual features extraction. Section 5 explains the temporal constraints generation. Section 6 presents the data association stage. In Section 7, some experimental results are discussed.

2. The problem of cross-camera tracking The cross-camera tracking task can be considered close to conventional target tracking (radar or vision), because the trajectories of objects are constructed from observations. The main difference is that in conventional tracking algorithms based on state filtering, observations

are obtained synchronously in a continuous space. In cross-camera tracking, objects move between discrete locations and can disappear for a long time. Cross-camera tracking over wide outdoor traffic scene induces several difficulties: • • • •

There are numerous cases where objects look similar. Objects’ behaviours are unpredictable or coarsely defined. Variation of outdoor illumination, sensor response and object orientation, induces changes in the visual appearance of objects. Occlusion situations induce some lost of visibility of the object.

Under such hypotheses, the re-identification for cross-camera object tracking becomes a complex task. It has to support the incompleteness and uncertainty of available information. The design of an effective surveillance system requires the efficient management of the uncertainty of decisions. In order to make the re-identification easier, in addition to the object’s signature, the system has to integrate the ability to constrain the object motion in blind zones. This prediction mechanism is the core of all online tracking systems. It permits focusing of attention to interesting candidates in front of a set of sensors. A vehicle re-identification system for consecutive detectors on a freeway has been developed by using magnetic dual loop detectors [1]. Vehicle measurement made at a downstream detector station is matched with the vehicle’s corresponding measurement at an upstream station. The data association algorithm is based on temporal ordering of the arrival of objects in front of the down-stream sensors completed by the vehicle’s length signature. Other signatures from scanning laser or camera-based system are developed for re-identification systems [2]. In the work of Huang and Russell [5], prediction takes the form of appearance probabilities. It defines how an object observed at some location in the past can be expected to appear at other locations in the future. Huang [5] and Kettnaker [6] have addressed the problem of data association in cross-camera tracking in the context of the Artificial Intelligence and Computer Vision communities, respectively. Huang’s algorithm has been tested in freeway environments, and the Kettnaker’s algorithm has been designed for human re-identification inside an office. Both algorithms use the Bayesian formalism and attempt to track all possible detected objects over a network of camera. The data association stage is transformed to a weighted assignment problem solving. This approach seems to be robust. However, the major drawback of these algorithms is that they work with a centralized organisation. In a configuration

containing distant sensors, the difficulty appears from the transmission of voluminous information to a central node. Previous algorithms are not explicitly designed for online surveillance. The main applications of previous reidentification systems are the link travel time between locations and origin-destination counting, except the kettnaker’s system, which is designed for human activity monitoring. The specificities of our work concern several points. The principal objective is the design of a modular and distributed system for online surveillance, which can tolerate typical wide area network bandwidth constraints. The tracking algorithm has to focus its attention on a specific target or group of targets and has to track them securely without confusion. The algorithm is goaloriented, so it has not the obligation to track the totality of objects. On the other hand, it is important to manage explicitly, the uncertainty of the object’s identity. In other words, the system has to notify situations where the risk of errors becomes important. This capacity is essential for a safety device and is known as “positive security”. In our system, high-level uncertainties are represented and managed by the Possibility Theory. This formal approach introduced by Zadeh [11] is based on the notion of fuzzy sets. A classic probabilistic approach can also be used, but we think that the possibilistic approach becomes more flexible when the system has to deal with coarse or qualitative model, and incomplete information. In common with the Dempster Shafer approach, it can efficiently represents the notion of ignorance. In this theory, each fact is associated with a degree of possibility and a degree of necessity. The necessity expresses the certainty of a fact by taking into account all eventualities. A high degree of possibility means that the fact is fully possible. If the degree of necessity is low with a high degree of possibility, it means that the system understands nothing about the fact. In addition, in the framework of the Possibility Theory, the choice of combination operators is very large and spans from disjunctive to conjunctive behaviour.

3. Strategy of distributed cross-camera tracking Under a surveillance objective, the system has to react in real time. The system has to manage efficiently its distributed sensors’ resources. In particular, in a multicamera organisation, raw image sequences cannot be sent easily over the network, and in addition, the central unit has not generally enough capacity to compute alone all information, simultaneously. The distributed organisation is adapted for reduced network bandwidth constraint. In fact, all low level information can be processed locally and only high-level information is transmitted over the network. The decentralised architecture has in addition,

many other advantages. Among the principal ones, we quote the notions of modularity and survivability. It means that the system can extend easily its coverage by adding new sensors and it can tolerate some local sensor failure. The proposed algorithm integrates a cooperative and decentralized strategy. It is composed of a network of local vision systems (LVS). In addition, the system contains a central node, called supervisor, whose role is to define and to transmit the global task of surveillance and then to receive intermediate high level tracking decisions from LVSs. The role of the local vision system is focused on the re-identification of specific objects under their FOV. The cooperative architecture is formulated by a multi-agent society framework. The multi-agent system strategy is considered as an interesting approach when the distributed problem can be represented and solved with a group of cooperative intelligent entities [7]. From a communication point of view, we consider that the distance between each LVS and supervisor is important and they have to use a reduced bandwidth (1 Mbps). Each LVS has the capacity to communicate with its neighbour LVSs. Because of their proximity, the communication between neighbours is possible at higher speed (100 Mbps). Once an object is detected by an LVS, a first level society is generated in order to wait its future reappearance over the network (Figure 1.a). The society contains experts positioned on neighbour LVS’s likely to perceive the mobile object (awaited object). The manager of the society is located on the LVS that has detected the object. The waiting procedure is activated during a time delay controlled by the temporal constraints. Under this organisation, the system works under a strategy of prediction–verification. The manager sends, to each expert of the society, a message containing visual and temporal constraints associated to the appearance of the awaited object (figure 1.b). These messages permit the creation of temporary tracks between the manager and each expert. When an expert Ei re-identifies the awaited object, it sends a message to the manager of the society and confirms the current track of the object (Figures 1.c, 1.d, 1.e). The initial society is then destroyed, and the previous expert E1 becomes a new manager, creating a new society in order to follow the track. At each reidentification, before the destruction of the society, the manager sends to the supervisor a report containing the decision of the association.

Figure 1: Organisation of society for cooperative tracking

4. Local image analysis module The goal of the image analysis module is to extract vehicle visual features under the field of view of each LVS. Features are region, speed, dimension and colour. These measurements are used for the estimation of visual compatibility between detected objects. The vehicle’s speed is used for object prediction in order to generate temporal constraints. The extraction of the object is performed by motion detection. For fixed cameras, the standard motion detection approach is to model the stationary background. In this case, the moving objects representing the foreground are easily extracted by a simple difference of the current image from the background. In an outdoor context, the background updating process is an important component of the detection algorithm. It allows the toleration of low variation of illumination. k +1

k

k

R (p)=a.R (p)+(1−a).I (p)

(1) R and I represent respectively the background and the current image at frame k. The parameter ‘a’ controls the updating process, which has to be slower than object’s flow. In order to tolerate slow traffic, we have developed a selective reference-updating algorithm [8], which takes into account object motion at the pixel level. The strategy consists in enabling the updating of each pixel once a stability of observations is established throughout a defined time interval. The stability concerns the fact that no motion is detected over a pixel. The stability indicator accumulates the history of the pixel behaviour. This indicator S is based on the inter-frame difference. Its result is similar to the Motion History Image used in gesture recognition. At each pixel inter-frame motion detection, the indicator S reaches its minimal zero value. After each non-detection, the indicator S is increased k

k

recursively until the value is 1. The indicator elementary step value θ is defined with respect to a defined time delay and so it depends directly on the working temporal resolution. The time delay, called also the integration delay, is tuned from contextual information (T=2s). The updating is then activated only once the stability indicator reaches the value 1. It permits not to update the recent parts of the background where objects have induced some apparent motion. The following algorithm updates the stability indicator S for each pixel P: if ( max I ck ( P ) − I ck −1( P ) < ω ) c = R ,G , B

(2)

→ { if (S

k −1

k

(P ) ω ) = R ,G , B

→ D ( P ) =1 k

(3)

else → D ( P ) = 0 k

Dk represents the detection decision (1: moving object, 0: background). The threshold ω for the detection is the same used for updating the stability indicator. ω is equal to twice the median absolute deviation of image noise computed over a preliminary training sequence. The motion detection result generally presents many artefacts. Two kinds of cleaning procedures are applied. Firstly, we use standard morphological operations of erosion and dilatation for reducing foreground noise. Then, too small uninteresting regions are removed. Typically we consider that a vehicle is observed by the LVS during a short sequence of at least 1 second (10 frames). The calibration of the scene is obtained by a rectangular grid placed on road image by the analyst and permits the removal of perspective distortions. This procedure is specific to straight roads. The speed of the object is obtained by tracking. The tracking is obtained by associating the centre of gravity of each detected region during the observation sequence. The dimension of the vehicle is deduced by fitting the projection of an oriented parallelepiped to the detected region. The parallelepiped is road direction oriented and verifies the vanishing point from the calibration grid. Measured speed and dimensional information integrate some errors due to the motion segmentation artefacts and calibration procedure. Measurements are approximated by a normal distribution, represented by a mean and variance values. The final image feature is the

RGB colour histogram of pixels inside of detected blobs. The colour histogram is an efficient signature for object recognition, but in the context outdoor multi-camera colour comparison, this information has to be normalized [9]. The “grey world” normalization is used, independently through each channel; it permits removal of a part of the illumination artefacts. The histogram is reduced to 32 bins per colour channel in order to reduce the information quantity. The visual compatibility between an observation and an awaited object combines distance between measured features (length, width, height and the colour). In order to take into account uncertainty of measurement, for each feature n, and each pair of association candidates, the Mahalanobis distance dn is computed. For the colour histogram, this distance is the mean of Mahalanobis distances computed for each bin independently. All variance values are estimated offline, under a training sequence and remain fixed. The global visual compatibility Cv between object A and B is defined by a conjunctive operator:

C v ( A , B ) = ∏ e − d n (A , B )

(4)

n

5. Temporal constraints Temporal constraints are represented in our system by a fuzzy temporal interval called (DOP: Domain Occurrence Possibility) represented by a possibility distribution. A DOP(Oi,Sj,Sl) is the prediction generated by the sensor Sj, explaining the temporal appearance possibility of an awaited object Oi in the field of a specific sensor Sl. Each society manager has to generate these coarse temporal constraints for each expert of its society positioned on neighbour LVSs. The DOP between two nodes is estimated by the kinematic equation: A .t 2 (5) d = v 0 .t + 2 => t = − v 0 +

2

v 0 + 2 . A . d or A

t= d , (if A=0) v0

with V0 : the measured local speed of the object A : the model of object acceleration between the two nodes d : the distance d between the two nodes

The computation of the parameter t has to integrate acceleration variability and measurement errors. For this, all variables are approximated by a normal distribution with respect to their mean and variance values. The distribution of the variable t (mt,σt) is computed by using specific arithmetic operators for the normal distributions [3]. This procedure allows the propagation of the theoretical errors over the expression of t.

The DOP is then built by integrating the parameters of the normal distribution of the variable t. A standard transformation from a normal distribution to a trapezoidal possibilistic distribution is applied. The trapezoidal model is centred at the instant mt with a core and a base of 2.σt and 3.σt respectively. After analysing quantitatively real freeway scenes, we have observed that the vehicle’s acceleration depends mainly on its dimension. Consequently, for freeway surveillance, we have defined four classes linked to vehicle dimension. The defined classes are motorbike, vehicle, van and truck. These classes are built with a fuzzy membership function according to the external object’s dimension. Membership functions are adjusted by a human analyst with respect to manufacturer information. The augmentation of acceleration models induces an additional classification step, but it permits integration of more adjusted prediction information. This classification may be also helpful for some of the supervisor’s high-level queries. It permits, for example, to just focus the surveillance on trucks. A human analyst also coarsely initialises generic acceleration models for each class, by using the contextual knowledge. These models are defined imprecise, in order to take into account variability of each class. They represent essentially physical limitations of the class behaviour between two nodes. After their initialisation, these models are adjusted and then updated continuously by taking into account regularly observed vehicles from different classes. This is performed by measuring the time-delay of each vehicle reidentification. This updating of the acceleration model is managed locally between LVS without consultation of the global supervisor. The updating of acceleration of each class is controlled by two parameters α and β. In our experimentation we have used α= 0.2 and β=0.05. It permits rapid adaptation of the mean of the acceleration and slowly, its variance. The parameter σs is included for preserving a minimal dispersion for each acceleration model. mat+1=(1-α) .mat + α .mot (6) (7) σat+1=max( (1- β) .σat + β σ0t , σs) With mat+1, σat+1 : acceleration model mot ,σot : mean and variance of last observations of the class

6. Data association strategy In a complex multi-sensor network, there is no effective bijection between awaited objects and observed objects. The system has to take into account the existing entrance and exit and has to deal with non-detected and lost objects. The two principal strategies of data

association in tracking are firstly the instantaneous approach and the second is the deferred one. The completely deferred algorithm is naturally not adapted for online surveillance. The instantaneous approach has also some limitation with asynchronous observations. In fact, in some ambiguous situations, the system may have not received all interesting information. An intermediate and efficient approach is the temporal fusion strategy. This last strategy tries to improve dynamically the quality of decisions with more observations. For a distributed multisensor architecture with no common field of view, the data fusion has to work mainly with complementarities of observations over time and over the field of sensors. The global compatibility between two objects is obtained by the product of their visual compatibility with their temporal compatibility. The temporal compatibility value represents the value of the DOP at the time of object detection. The strategy of declaring a new object is chosen close to the Open World assumption existing in the Demspter Shafer Theory of Evidence. When the observation is not compatible with any of the awaited objects, the compatibility of the new object is favoured.

∀ Oi∈Φ , C j (*, obs ) =1− max C j (Oi ,obs ) Oi ∈Φ

(7)

With Φ : set of awaited objects C j(Oi,obs) : compatibility of the observation with object i at the sensor j

The track termination of an object is decided when an awaited object is not detected by its current society during its lifespan controlled by its DOPs. When several objects are tracked simultaneously, multiple societies have to work together. For each observation, within the LVS, measures of compatibility with all awaited objects represents a local distribution of preference. The validation of an association Hi by an expert is performed by computing the possibility P(Hi) and the necessity N(Hi) of the association. The distribution of the possibility is built by normalizing the distribution of preference inside the segment [0,1]. The necessity of a hypothesis translates the notion of uniqueness of the hypothesis with respect to other alternatives. If the necessity is high, it means that no ambiguity is present. In this situation the expert can validate the association locally within the LVS. When the necessity of the association is low, a notion of ambiguity is declared. (8) N ( H i ) = 1 − P( H i ) With

H i =Φ*−{H i} Φ* = Φ U {*} P (H

i

) = Hmax∈ H j

P (H i

j

)

We present in Figure 2, two situations of ambiguity appearing during the association of two visually compatible objects. Dop1 and Dop2 represent predictions of object O1 and object O2, respectively, on the downstream sensor. Obs1 and Obs2 are the observations of O1 and O2 detected by the downstream sensor. In the first scenario, the first observation, with respect to the predictions cannot be associated to a unique object. The second observation can remove the ambiguity by using a temporal fusion strategy. In the second scenario, the two observations are detected within the intersection of the two predictions. The system decides that it cannot make the association. However, the system reacts quickly, once it has detected the two observations, without waiting the end of awaited objects lifespans.

Figure 2: Scenarios of ambiguity with two similar objects When a first level expert detects an ambiguity containing two or more objects, a second level society is generated by integrating all concerned first level societies containing objects of ambiguity. The expert which has detected the ambiguity manages this new society. The objective of the second level society is to develop a temporal fusion procedure. In order to compare efficiently all hypotheses, the temporal fusion uses a MHT tree [10] (Multiple Hypotheses Testing). When a second level society is activated, the attached first level societies stop their own decisions and wait for the decisions of the temporal fusion. The MHT tree describes all feasible hypotheses resulting from observations obtained by the set of LVS concerned by the second level society. A MHT hypothesis integrates a set of associations with their global degree of compatibility. The data association decision is then deferred as long as the confidence level of one of MHT hypothesis is not significant enough compared to the other ones. At each observation, the tree is updated. For each MHT hypothesis, a possibility and a necessity measurement is estimated. A pruning procedure removes a MHT hypothesis when its possibility is low (< 0.1). The possibility measurement of a MHT hypothesis is built by

the product of the compatibility score of its associations. The MHT tree is stopped when the relative necessity measurement of a MHT hypothesis is sufficient (> 0.3). Unfortunately, there are three distinct situations where the temporal fusion procedure cannot resolve the ambiguity: - The first case is when all observations are detected without success (low necessity). The temporal fusion rapidly stops (scenario 2). - The second case is when a part of awaited observations are not detected at the end of predicted DOPs. The temporal fusion is stopped at the end of the DOPs. - The third case is when the number of awaited objects, interacting in the MHT tree, becomes too important (object number > 5). The temporal fusion stops prematurely in order to avoid the explosion of the MHT tree. Decisions of success or failure obtained by the temporal fusion are sent to managers of all primary societies. Then, in their turn, they forward their final decisions to the supervisor. Complementary to the association decision from the first level society, the network supervisor can also resolve, at a higher level, some specific ambiguities. The principal reason comes from that the local vision systems consider that the predictions are right. Errors appear when an unexpected event slow down or speed up an object’s behaviour. In such situations, the supervisor can focus its attention to newly detected objects, and try to associate them coherently to lost objects. Figure 3 illustrates a pedagogical example where the temporal fusion strategy, based on a MHT tree, efficiency resolves an ambiguity associated to a scenario of type 1. The ambiguity concerns two visually similar vehicles (O1,O2).

Figure 3: Resolution of scenario of type 1

The detection of the first observation (Obs1) generates an ambiguity on the LVS associated to the camera C1. The second observation (Obs2) from camera C2 permits the elimination of the ambiguity (table 1). The MHT hypothesis 2.1 is accepted and the Obs1 is associated to O1 and the Obs2 to O2. MHT Hyp. N° 1.1 1.2 2.2

Observation 1 camera C1 (compat.) O1(1,0) O2(0,9) O1(1,0) O2(0,9)

Observation 2 camera C2 (compat.) O2(1,0) O1(0,3)

Possibility of the MHT hyp. 1,0 0,9 1,0 0,27

Necessity of the MHT hyp. 0,1 0,0

Type Upstream detected vehicle Downstream detected vehicle Lost object Correct association Association errors Maintained ambiguity False new object Verification of temporal constraint by observation

Percentage 100 % 99% 1% 79% 3% 15% 2% 95%

Table 2: Algorithm results in the first configuration.

0,0

Table 1. – MHT tree with possibility and necessity of hypotheses, first example

7. Experimentation We have tested our algorithm through real and simulated sequences. In order to measure the performance of the system with sufficient samples, the system has the task of tracking all objects. Real sequences are obtained by recording synchronized image sequences from two distant cameras along a freeway over 45 minutes. The distance between the two cameras is approximately 2.9 Km. The objective of this first experimentation is to test the visual feature extraction module and temporal constraints under real conditions (1045 vehicles). The configuration of the observed scene is without an intermediate entrance and exit. Three models of acceleration are associated with three classes of vehicles. Classes represent vehicle, van and truck. Figure 4 shows detection results and the model adjustment. Table 2 summarizes some indicators of the data association decisions.

Figure 5: Sensors configuration In order to evaluate the multi-sensor data association strategy in a more complex configuration, we have developed a traffic simulator. We present here the result of an experimentation using a configuration with four cameras (one sensor in upstream and three in downstream, Figure 5). The traffic composed by five visually different patterns. The traffic is randomly generated from the upstream sensor C1 in the direction of the three downstream sensors. Figure 6 shows, on the left, the global volume of the generated traffic. The average speed of vehicles measured by upstream sensor decreases when the volume of the traffic increases (Figure 6, right).

Figure 6: The volume of simulated traffic (left), measured speed from upstream sensor (right)

Figure 4: Vehicle detection, tracking, and model fitting.

The performance of the temporal fusion algorithm (TF) is compared with the standard “Nearest Neighbour” (NN) approach and evaluated in terms of data association errors (Figure 7). The NN algorithm decides

the association of the observation with the best instantaneous candidate. When the traffic is less than 20 vehicles per minutes, TF algorithm reacts correctly without error with an acceptable non-decision rate (Figure 8). But, after this limit, the performance of the TF algorithm decreases significantly. In fact, ambiguous situations increases and the “positive security’ strategy of the algorithm, induces higher non decision situations than errors of the NN algorithm. An excessive augmentation of vehicle traffic favours scenarios of type 2 which are detected but are not solved by the temporal fusion

high degree of adaptability with respect to the global traffic behaviour. Statistical tools have permitted representation of the measurement errors and their propagations in higher decisions. We have validated our distributed algorithm in a software environment by using recorded and synchronised image sequences.

9. Acknowledgements This work is partially supported by the Nord/.Pas de Calais Regional Council (AVIVA & RaVioLi Projects).

10. References [1] B. Coifman, “Vehicle Re-identification and travel time measurement in real time on freeways using the existing loop detectors infrastructure”, Transportation Research, Record 1643, Transportation research board, pages 181-191, 1998. [2] B. Coifman, D. Beymer, P. Mc Lauchlan, and J. Malik, “A real-time computer vision system for vehicle tracking and traffic surveillance”, Transportation Research: Part C, vol 6, no 4, pages 271-288, 1998.

Figure 7: Association indicators of the temporal and NN algorithms

[3] P.Courtney, N.A.Thacker, “Performance characterization in computer vision: The role of statistics in testing and design” in Imaging and Vision Systems : Theory, Assessment and Applications, Jacques Blanc-Talon and Dan Popescu (Eds) NOVA Science Book, 2001. [4] M. Forthofer, S. Bouzar, F. Lenoir, J.M. Blosseville, D.Aubert, “Automatic incident detection : wrong way vehicle detection using image processing”. Proc. of Third Annual World Congress on Intelligent Transportation System, 1996. [5] T.Huang, S.Russell, “Object identification in a Bayesian context”, In International Joint Conference on Artificial Intelligence, pages 1276-1282, 1997. [6] V.Kettnaker and R. Zabih, “Bayesian multi-camera surveillance”. In IEEE Conference on Computer Vision and Pattern Recognition, pages 253-259, June 1999. [7] T. Matsuyama. “Cooperative distributed vision: dynamic integration of visual perception, camera action, and network communication” 4th international Workshop on Cooperative Distributed Vision, March 22-24, Kyoto Japan, 2001.

Figure 8: Representation of performance of the temporal fusion

8. Conclusion In this paper, we have described a framework for vehicles tracking over wide area including blind zones. The performance of the system is acceptable when the density of traffic is low or moderate. The core of the algorithm is composed of the distributed cooperative strategy and a temporal reasoning scheme that permits management of the lifespan of all decisions. The online learning of temporal constraints brings to the system a

[8] C. Motamed, '”Motion detection and tracking using belief indicators for videosurveillance applications” First IEEE Workshop on Performance Evaluation in Tracking and Surveillance, PETS'2000, Grenoble mars 2000. [9] J.Orwell, P.Remagnino and G.A.Jones, “Multi-camera colour Tracking”, In IEEE Workshop on Visual Surveillance, IEEE Computer Society, pages 14-21, 1999. [10] B. Reid, “An algorithm for tracking multiple targets”, IEEE Transactions on Automatic Control, AC-24(6), pages 843-854, 1979. [11] L.A. Zadeh, “Fuzzy sets as a basis for a Theory of Possibility”, Fuzzy Sets and Systems, N°1, pages 3-28, 1978.