ITS09a-Final Transitions v4

Other parking systems are now commonly available. All those systems assist the parking maneuver, but none of them detects obstacles using camera(s) - some ...
2MB taille 3 téléchargements 300 vues
TRANSITION: A RELEVANT IMAGE FEATURE FOR FAST OBSTACLE DETECTION C. Vestri* R. Bendahan F. Abad S. Wybo S. Bougnoux IMRA Europe 220, rue Albert-Caquot 06904 Sophia-Antipolis France [email protected]

ABSTRACT Currently, the automotive industry is actively seeking generic obstacle sensors based on monocular vision and able to run on low frequency CPUs. We have tackled the challenge of designing a vision-based obstacle detection system using a common in-vehicle micro controller: an 80 MHz 32 bits RISC CPU. This system uses a single wide angle rear camera commercially available. To insure real-time obstacle detection, we proposed a novel fast feature-point extractor applied along a 1D signal. We named it transition. It is 100 times faster than Harris and relevant on man-made objects. We present feature extraction results and demonstrate that it can be used to detect various types of obstacles with an 80MHz CPU.

KEYWORDS Feature detector, Transition, SIFT, Edge points, Structure from motion, 3D reconstruction

INTRODUCTION Toyota proposed for its Prius an Intelligent Parking Assist system (IPA) that automatically parks the car. Other parking systems are now commonly available. All those systems assist the parking maneuver, but none of them detects obstacles using camera(s) - some systems use sonars. There are many works on obstacle detection and reconstruction in automotive such as [1-6]. But all of them are designed to run on high power CPUs. Our objective is to develop this vision-based obstacle detection system, using a common in-vehicle micro controller: an 80 MHz 32 bits RISC CPU. There are two main challenges. The first challenge is to detect obstacles, especially at the epipole1. The second challenge is to achieve the detection of obstacles in real-time using this targeted CPU. This low speed CPU produces strong processing time constraints for selected vision algorithms. We defined in another paper [5] a novel algorithm that ensures the successes of both implementation and real-time application. In this paper we present a new image feature point, named transition, which is a key element for the success of the real time and complete obstacle detection. After presenting our vehicle prototype, we will motivate and detail the transition feature. We will compare it to classic feature point detectors such as SIFT

1

The 3D reconstruction of obstacles is difficult near the epipole (a 2D location generally inside the image). It comes from the geometry of monocular vision systems.

1

and show that it has good properties for each stage of our system: extraction, matching, 3D points refining. We will finish by presenting detection results on various obstacles.

VEHICLE PROTOTYPE The objective of our research is to provide a perception of the environment using only cameras: detect and identify obstacles of the scene, understand their behavior, and perform risk assessment [6]. As a first application, we aim at detecting all kinds of obstacles during a backing maneuver using a single camera. This system supposes that: objects are static, road and vehicle displacement are flat, and vehicle speed is low (≈10km/h).

Figure 1 : Vehicle prototype. Left picture presents a global view of our test vehicle. System acquires images and odometry. After processing, an interface presents live image, detected obstacle shape with distance to our vehicle. Right pictures present prototype display and board.

Our prototype vehicle is equipped with odometry sensors to gather information about vehicle displacement (see Figure 1). A steering wheel angle sensor is also used to provide additional data. The reverse gear is detected to know movement direction. Finally, a wide field of view camera is mounted at the back of the vehicle; it provides the images used to detect close obstacles. These sensors are linked to an automotive-grade computer equipped with additional data acquisition board and frame grabber.

MOTIVATION Our previous system was a classic structure-from-motion system [2]. It used a Kitchen and Rosenfelt [7] feature-point detector and a correlation-based matching algorithm. These two image processing functions are expensive in processing time. They cannot be used in the 2

targeted low-speed CPU. We then reviewed and analyzed feature point detectors with the objective to find a very fast feature point detector for object reconstruction. An interesting review and evaluation of feature-point extractor has been done by Schmid in [8]. At that time, common point extractors used in Computer Vision were those of Moravec Kanade-Lucas-Tomasi (KLT) [9] and Harris [10]. Later, SIFT was introduced by Lowe in [11] and became the most popular. More recently, other feature point detectors have been introduced such as SURF [12], CenSurE [13] or Star [14]. All of them have nice invariance properties related to point of view variation. But they have two main drawbacks related to our needs: they are processing time consuming and not adapted for obstacle detection application. The first drawback of those methods is a processing time too large for our targeted CPU. Table 1 presents a comparison of processing times between various feature extractors. Timings have been averaged over about 1000 images (640*480) on a Dell Precision PWSS670 Intel bi-Xeon CPU at 3GHz with 2Go of Ram. Canny, Harris and KLT processing times have been obtained using OpenCV library [15], SIFT using Hess implementation [16] and Star using Willow Garage code [14]. Our transition extractor has been designed for a specific obstacle detector system described in [5]. It is not directly applied to the image, but to scanlines (epipolar lines) of image pairs. It operates on 1D signal, so processing time of this extractor depends on number of processed scanlines. In Table 1 we compare two versions of our transition detector: Transition (dense) which processes the complete image (about 400 scanlines) and Transition (optimized) which is an optimized version (the one we used for final results). Note that this processing time comparison is not strictly fair since we are comparing 2D and 1D operators. But even with 1D implementation of those algorithms, processing time factors should remain relatively similar. Harris and KLT feature detectors have been tuned to extract same density of points (about 400) as reference timing2: Transition (optimized). SIFT and Star detectors have just been tuned to give representative results. Density of detected point features is much lower (see Figure 2).

Table 1 : Comparison of processing times between various image feature extractors. Canny, Harris and KLT processing times have been obtained using OpenCV library [15], SIFT using Hess implementation [16] and Star using Willow Garage code [14]. Transition extractor is the fastest algorithm. Algorithm Transition (optimized) Transition (dense) Canny Harris KLT Star SIFT

Feature Point Point Edge Point Point Point Point

Dimension 1D 1D 2D 2D 2D 2D 2D

2

Average time/image 0.13ms 2ms 10ms 34ms 43ms 82ms 483ms

KLT : cvGoodFeaturesToTrack(im, tmp, tmp2, corners, &n, 0.025, 7); Harris : cvGoodFeaturesToTrack(im, tmp, tmp2, corners, &n, 0.001, 7,0,3,1,0.04);

3

Factor/Transitions 1 15 75 260 330 630 3700

Average processing time of dense version of transition extraction is about 2ms. Transitions are at least 5 times faster than other extractors. The combination of our transition extractor and our incremental processing strategy reduces the average processing time to 0.13ms/image. Obstacle detection strategy [5] alone reduces processing time by a factor of 15. Overall extraction system is much faster than all others; it is its main advantage. There are two reasons: first, other extractors apply 2D processing on the entire image while transitions analyze only selected scanlines of this image. Second, transition extractor is a very low complexity algorithm. Other algorithms can be strongly accelerated using GPU or dedicated hardware, but they cannot work in real-time in our configuration: an 80 Mhz CPU.

Figure 2 : Extraction result of studied feature detectors. Transition is the only extractor that detects features on car bumper in column 2 and hood limit in column 4. But it cannot detect ground white lines in column 3.

A second drawback of most of feature point extraction methods is a lack of pertinence. Figure 2 presents extraction results of studied feature detectors. All extractors are able to detect features on cars (columns 2, 3, and 5), but only transition is able to detect points on vehicle

4

bumper (column 2) and on hood limit (column 4). Detecting features on bumpers or more generally on limits of objects is mandatory for the application. Those limits usually correspond to image edges. SIFT and Star feature point detectors have been designed to detect feature points robust to large image view point changing3. They are not adapted for obstacle reconstruction. Harris and KLT are more adapted but search for corners and junctions. Canny search edges but bumper edge is too smoothed to be extracted. Transition extractor searches for edges along scanlines. It is a well adapted strategy to detect bumpers, hoods and other kinds of obstacles. Its main drawback is its dependence to scanline and so to calibration. It cannot detect contours which are aligned with epipole. Because the camera is moving along its optical axe, the epipole is inside the image; its position in image depends on camera pose. Undetected edges are those parallel to vehicle trajectory (see section Discussions). An example of failure can be seen in transition extractor results in column 3: ground white lines are not detected. In practice, obstacles are never only composed by those edges. We show several examples of results in Figure 5 The two drawbacks of available classic extractors decided us to design our own feature detector, intended for real-time obstacle detection on an 80 MHz CPU. We wanted a technique very fast and working both on limits and on surfaces of objects. We designed the transition extractor that has been introduced before. We will now describe it in details.

TRANSITION FEATURE The transition extractor has been designed to extract edges from a 1D signal occurring along an epipolar line. This transition is similar to the weak string defined in the edge extractor of Blake and Zisserman in [18]. But extraction of this feature is achieved much faster (0.13ms/image, see Table 1). Our transition models an edge between two different regions. Those regions are modeled by plateaus. A transition is defined by two plateaus as presented in Figure 3a. The intensity difference between the two plateaus is called the height of the transition while the distance between the two plateaus is called the ambiguity of the transition. The height represents the contrast of the transition and the ambiguity its sharpness. Those attributes will be used to characterize a transition. For example: an important transition has large height and plateau lengths; an inaccurate transition has a high ambiguity level.

Transition extraction The transition extraction procedure is a multi-resolution approach. We remind the reader that it is applied onto a 1D signal. It could be an image line/column or an epipolar line. It is divided into three stages, performed at different image resolution levels: transition localization, transition computation, and ambiguity reduction. The first stage locates transitions in a low-resolution scanline. We compute the slopes of the signal (monotonous variations of intensity). Then, transitions are localized by selecting the most relevant slopes: those which have an intensity variation above a given height threshold. 3

They are not adapted for point extraction. It has been shown in [17] that SIFT descriptor can be used as a similarity measure for dense matching.

5

In the second stage, transitions are computed using the full-resolution scanline. Each transition is deduced from computation of its left and right plateaus. It is illustrated in Figure 3c. Plateau computation consists in finding the location of its two extremities and its mean intensity value. Extremities are computed by moving the initial extremities of the slope leftward and rightward. The process is stopped when the intensity value is outside an intensity search area centered on the intensity value of each plateau. This plateau intensity is obtained by averaging intensity values of pixels between plateau extremities. The third step of transition extraction tries to get transitions as less ambiguous as possible. The presented transition computation procedure is simple. It cannot deal with some particularly irregular slopes (with large steps, oscillations). As a consequence, they may return transitions which are too much ambiguous. Non ambiguous transitions are kept; filters are applied to the others to recover them. These filters try to reduce the ambiguity of the transitions by adjusting their plateaus to their underlying 1D intensity variation. Each filter is adapted to an irregular slope configuration. As presented in Figure 3b, transitions are computed individually, plateaus of different transitions may overlap. An important point is that smoothed transitions should remain ambiguous. The filters presented before must fail to reduce the ambiguity of smooth but regular transitions. Indeed in these transitions, it is impossible to localize any important intensity variation. If ambiguity is low, the center of transition is accurate. It can be used for matching and reconstruction. If a transition is ambiguous, it is rejected.

Figure 3 : Feature Extraction. (a) Transition definition. Transition models an edge between two different regions (named Plateaus). Center of the transition is used for reconstruction when ambiguity is low. (b) Example of transitions extracted from a scanline. (c) Transition computation. Transition is deduced from computation of its left and right plateaus.

6

Discussions The main advantage of the transition feature-point extractor is its very low processing time. This comes from its intrinsic computation simplicity and from 1D processing. Also our algorithm has many good characteristics for the intended outdoor urban application: • Invariance to geometric and photometric changes. It extracts image edges. • Pertinence. Extractor has a high response on man-made objects and a low response on highly-textured and chaotic image regions. Transitions focus on edges that generally delimit homogeneous regions. Color homogeneity is a common characteristic of manmade objects and pedestrians (pants or coat for pedestrian, bumper and body parts for cars, poles, stopping blocks). • Extraction of smooth edges. Gradient-based edge extractors generally cannot recover smooth edges. Transition extractor avoids using gradient computation. The algorithm is robust and able to extract smooth edges. However, very smooth edges are extracted but cannot be used for 3D reconstruction since they are generally ambiguous. • Simplicity of the transition model. An edge is modeled by very few but relevant parameters: intensity levels, lengths of regions, and transition height (contrast). This last characteristic is good for fast and simple discrimination of transitions during matching or tracking (see 3D point updating in [5]). • Scale invariance. Transitions are able to evaluate the scale of the edges since plateaus are not limited in size. This characteristic is good for discriminating transitions during matching. Even if transition is the fastest feature-point extractor, it is not sufficient to achieve obstacle detection is all situations. The main drawback of our obstacle detection strategy is the impossibility to extract and reconstruct contours parallel to epipolar lines. This drawback is important since there are a lot of contours in this situation (lines parallel to vehicle trajectory: ground lines, horizontal lines on side vehicle, vertical pole in epipole). But it is not critical since there are always other lines on obstacles (see section Results). Another limit is that the transitions focus mainly on contours between homogeneous regions. Highly textured and chaotic image regions such as bushes or leaves are badly detected.

MATCHING TRANSITIONS Feature points are fast detected along scanlines; we are now proposing a fast matching strategy based on good properties of transitions. This matching strategy is part of our global detection system described in [5] (we advise the reader to read this paper). Scanline matching is a key element of the processing chain since there are only few data to match. Detection results depend on its ability to recover a maximum of good matches while avoiding wrong ones. A transition models an edge by very few but relevant parameters: intensity levels, lengths of regions, and transition height (contrast). It is then simple to compare transitions for matching. We use this benefit to define a simple and fast strategy. We propose a matching strategy that privileges important transitions while dealing automatically with ambiguous matches. This strategy has three steps. It constructs every possible sequences of matching candidates (a list of selected matching candidates), evaluates the cost of each sequence, and selects the best sequence (the one with lowest cost).

7

We define a transition difference cost measure based on the average of differences of intensity and lengths of both plateaus. This cost measure is sufficient to recover correct matches. Before matching, we build the list of all possible matching candidates. Transition difference cost is evaluated for all candidates; candidates with non similar transitions are rejected. Candidates are then sorted by importance/quality. High-quality candidates possess two transitions with large plateaus and large heights. They will be matched first. Sorted candidates are added progressively to construct the sequences. We start with an empty sequence. Candidates are added one by one to all the existing sequences. When an added candidate is ambiguous (common transition with one or several other candidates of the list), ambiguity is avoided by duplicating all the existing sequences. Each ambiguous candidate is added to a different copy of the sequences. After sequence construction, the cost of each sequence is computed. It is the average cost of candidates divided by number of candidates. A good sequence is made of good matches (their average score must be small) and it has a large number of matches (for dense reconstruction).

Figure 4 : Scanlines matching of two epipolar lines. In top and bottom of the figure are presented intensity profiles (in red) of the epipolar lines and detected transitions (in blue). Middle part presents image line intensity and detected transitions (in green). Matching consists in selecting the best list of matches (3 in this example).

The matching result corresponds to candidates of sequence with lowest cost. This matching is applied twice: first only on high quality candidates, then with all remaining candidates. Ambiguity is first solved for important transitions. This ensures they are not mismatched. An example of matching is presented in Figure 4.

8

This simple strategy has some limitations. First, the algorithm can solve some simple ambiguities (when a transition has two candidates) but not complex ambiguities (repetitive textures). One solution is to enforce ordering constraint but it is not satisfying when some transitions are missing and with poles. Occluding contours are the second limitation. One plateau of the transition corresponds to the background; it can be completely different when the viewpoint changes. This is currently not managed. Last limitation concerns the large variations of intensity between two images. It often appears with hood reflections and with ambient shadow of our own vehicle. The matching sometimes rejects some good matches. However, matching strategy is sufficient. It is fast and main transitions are rarely mismatched.

RESULTS Figure 5 presents detection results. All presented results have been obtained in the following conditions: speed below 10km/h, and straight backward maneuver. The prototype is able to detect various kinds of objects from cars to poles. All those obstacles are detected in real-time using the 80MHz CPU. Cars are detected from about 6 meters, humans from 3 to 4 meters and smaller objects from 2 meters. Because of the inaccuracy of reconstructed 3D points, the shape is not always correct at those distances and is unstable. Below 2 meters, the accuracy is better and the shape becomes generally correct and stable (as we can see in top views of the figure). This system is currently working in our prototype vehicle (see Figure 1). It demonstrated the feasibility of an obstacle detector using a single camera and a 80MHz CPU. The transition feature extractor is a big part of this success.

Figure 5 : Detection results on various objects. Each result presents the live image on left and the top view on the scene on right (grid represents 1m). The prototype vehicle is displayed on top and outlines of detected obstacles are displayed in blue. All situations are backward maneuvers toward obstacles. Cars, humans and poles could be detected. The shape of detected obstacle is generally correct below 2 meters.

CONCLUSION We presented in this paper a novel fast feature-point extractor. We compare this extractor to various feature detectors. This extractor is fast and proved to perform well on man-made objects. We used this extractor in a complete structure-from-motion system for obstacle

9

detection. This system is able to detect various kinds of objects from cars to poles up to 6 meters. We demonstrated that transition can be used in a vision-based obstacle detector. Its low processing time allows achieving detection with an 80MHz CPU.

REFERENCES [1] AISIN SEIKI. Parking assist system. www.aisin.com/product/automotive/info/ot.html [2] Fintzel, K., R Bendahan., C. Vestri, S. Bougnoux, T. Kakinami (2004). 3D parking assistant system, International Symposium on Intelligent Vehicles 881–886 [3] Yamaguchi, K., Kato, T., Ninomiya, Y. (2006). Moving obstacle detection using monocular vision. In: Intelligent Vehicles Symposium 2006, Tokyo, Japan, June 13-15. [4] Michels, J., A. Saxena, A.Y. Ng (2005). High speed obstacle avoidance using monocular vision and reinforcement learning. In Proceedings of the 22th International Conference on Machine Learning, Bonn, Germany. [5] Vestri, C., R. Bendahan, F. Abad, S. Wybo, S. Bougnoux (2009). Real-time monocular 3D vision system, 16th World Congress on Intelligent Transport Systems, Stockholm, Sweden. [6] Wybo S., D. Tsishkou, C. Vestri, F. Abad, R. Bendahan, S. Bougnoux (2008). Obstacles avoidance by monocular multi-cue image analysis, 15th World Congress on Intelligent Transport Systems, New-york, USA. [7] Kitchen, L., A. Rosenfeld (1982). Gray level corner detection. Pattern Recognition Letters 1 95–102. [8] Schmid, C., R. Mohr, C. Bauckhage (2000). Evaluation of interest point detectors. The International Journal of Computer Vision 37 151–172. [9] Tomasi, C., T. Kanade (1991). Detection and tracking of Point Features, Carnegie Mellon University, Tech. Report CMU-CS-91-132. [10] Harris, C., M. Stephens (1988). A combined corner and edge detector. In: Alvey Conference. 147–152. [11] Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. The International Journal of Computer Vision 60 91–110. [12] Bay, H., T. Tuytelaars, L. Van Gool (2006). SURF: Speeded up robust features. In Proceedings of the 9th European Conference on Computer Vision. [13] Agrawal, M., K. Konolige, M. R. Blas (2008). CenSurE: Center Surround Extremas for Realtime Feature Detection and Matching. In Proceedings of the 10th European Conference on Computer Vision. [14] Star feature detector (2008). http://pr.willowgarage.com/wiki/Star_Detector [15] OpenCV: Open source computer vision library. (opencvlibrary.sourceforge.net/) [16] Rob Hess SIFT implementation (2009). http://web.engr.oregonstate.edu/~hess/ [17] Tola, E., V. Lepetit, P. Fua (2008). A fast local descriptor for dense matching, IEEE Conference on In Computer Vision and Pattern Recognition, Anchorage, Alaska. [18] Blake, A., A. Zisserman (1987). Visual Reconstruction. MIT Press.

10