Selfsupervised Learning From High Dimensional Data for Autonomous OffRoad Driving
Ayse Naz Erkan1 Raia Hadsell1
Marc”Aurelio Ranzato1 Urs Muller2
Pierre Sermanet1,2 Yann LeCun1 Koray Kavukcuoglu1
(1) Courant Institute of Mathematical Sciences, New York University (2) NetScale Technologies, Morganville, NJ
Problem: Autonomous, Vision-based Navigation in Complex Off-Road Environments Stereobased navigation uses simple heuristics to identify pixels as ground or obstacle. Stereo is insufficient: ● sparse, noisy, and shortrange (012 meters) ● pure stereo navigation is myopic – driving in fog
The Platform: LAGR Mobile Robot
●
Challenge: Visionbased Navigation for Mobile Robots Why is it hard? Extreme environmental variability Visual complexity – shadows, clutter Hilly, bumpy, uneven terrain Realtime constraints on processing Tricks – collapsible vegetation, hidden obstacles Position estimation errors – wheel slip, GPS Planning with uncertainty Lighting variability – glare, time of day
●
Challenges for machine learning solutions: ● supervised learning limits the variability of environments ● online learning is adaptive, but has no memory ● large image patches are necessary for accurate learning high dimension ● generalization from nearrange to farrange (inverse size/distance) ● planning with uncertainty from classifiers ● concept drift
LAGR (Learning Applied to Ground Robots)
DARPA program 20052008, 8 competing research labs develop navigation for fixed platform
Periodic testing in unfamiliar terrain
CMU & NREC designed platform and baseline software: 4 color cameras (2 stereo pairs, 640x480) GPS receiver for global navigation 2 front bumper switches Onboard IMU (inertial measurement unit) 4 onboard Linux computers 2 “eye” machines (dual core 2 Gz) 1 “planning” machine (single core 1 Gz) 1 lowlevel control computer (single core)
The Solution: Online Self-Supervised Learning
Strategy: Online NeartoFar Learning Inputs: large windows in image Labels: heuristics from stereo module Classifier: unsupervised autoencoder + online logistic regression
input image
stereo labels (012 m)
classifier prediction (580 m)
Stereobased obstacle detector
i
y=gW ' D
●
W'D
∥Y ' −F dec Z ' ∥2
2
∥Y ' −F dec Z ' ∥
Robust feature extraction Trained offline 100000 training images from log files
Y'=Z
F ' dec Z ' Z'
Y ' =Z
Kernels (2 layers) learned by Auto-Encoder: 20x7x6 in first layer; 300x6x5 in second layer
D
D=F W X
Loss:
samples labels
n
L=−∑ log g y⋅W ' D− RW
●
Learning:
●
Inference: where:
i=1
∂L =y⋅g−y⋅W ' D D ∂W y=gW ' D g z=
X (yuv: 13x24x3)
F ' enc Y '
1 1e
Online Ensemble Learning Mixture of Experts Architecture F ' dec Z ' Z'
Input patch
1. . p
dimensional features extracted via unsupervised autoencoder network Weights W are trained with cross entropy loss function Regularization: decay to default weights, L2 regularization
W
2 Layer Auto-Encoder Network
,Y
The online classifier is trained at each frame using gradient descent on the 100
Input image is normalized such that size of an object is independent of its distance from the robot Allows consistent processing of windows at different scales Distance normalization allows learning using large, context-rich windows
X
1. . p
FW X
Online Learning – Logistic Regression with gradient descent
Distance-Normalized Image Pyramid
Autoencoder FW network
Input: calibrated stereo images Output: training set of labeled feature vectors
Architectures that combine highcapacity slow learners and low capacity, highly adaptive controllers could solve the memory problem: a single online classifier exhibits fast learning and fast forgetting. Online mixture of experts is one such architecture.
code
F ' enc Y '
−z
Output ∑ Controller
Expert
Expert
Expert
Input
Results: Evaluation of Learning and Driving Performance ex. A
road following and man-made obstacle detection
Input image
Stereo Labels
Classifier Output
Input image
Stereo Labels
Classifier Output
ex. B
difficult ground recognition multi-color and shadows
Start
No Learning
With Learning
ex. C
very long range vision to the horizon Input image
Stereo Labels
Classifier Output
Direct path to goal ends in culdesac Shortrange stereo (