Local Invariant Features: Detection & Description
Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented
Visual Perceptual Object and Recognition Sensory Augmented Tutorial Computing
Motivation
Presentation: Frédéric Devernay, INRIA With many slides from Tinne Tuytelaars and others
1. Find a set of distinctive keypoints
B3
Articulation
Intra-category variations
2
A3 B2 B1 Similarity measure
e.g. color
e.g. color
2. Define a region around each keypoint 3. Extract and normalize the region content 4. Compute a local descriptor from the normalized region
N pixels
5. Match local descriptors
3
• Hessian & Harris • Laplacian, DoG
[Beaudet ‘78], [Harris ‘88]
• • • • • •
[Mikolajczyk & Schmid ‘01]
Harris-/Hessian-Laplace Harris-/Hessian-Affine EBR and IBR MSER Salient Regions Others…
Translation, rotation, scale changes (Limited out-of-plane (≈affine) transformations) Lighting variations
• We need a sufficient number of regions to cover the object
• The regions should contain “interesting” structure
4
Keypoint Localization
[Crowley & Parker ‘84], [Lindeberg ‘93-‘98], [Lowe ‘99] [Mikolajczyk & Schmid ‘04] [Tuytelaars & Van Gool ‘04] [Matas ‘02] [Kadir & Brady ‘01]
5
erceptual and Sensory Augmented Computing
Many Existing Detectors Available
Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented
Occlusions
• Region extraction needs to be repeatable and precise Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented
A2
Requirements
A1
N pixels
Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented
Approach
• Global representations have major limitations • Instead, describe and match only local regions • Increased robustness to
• Goals:
Repeatable detection Precise localization Interesting content
⇒ Look for two-dimensional signal changes 6
Hessian Detector [Beaudet78]
• Hessian determinant
• Hessian determinant Ixx
Iyy Ixy
Intuition: Search for strong derivatives in two orthogonal directions 7
erceptual and Sensory Augmented Computing
erceptual and Sensory Augmented Computing
Hessian Detector [Beaudet78]
In Matlab: 8
Hessian Detector – Responses [Beaudet78]
Effect: Responses mainly on corners and strongly textured areas.
9
10
Harris Detector [Harris88]
Harris Detector [Harris88]
• Second moment matrix
• Second moment matrix (autocorrelation matrix)
Intuition: Search for local neighborhoods where the image content has two main directions (eigenvectors). 11
erceptual and Sensory Augmented Computing
(autocorrelation matrix) erceptual and Sensory Augmented Computing
Iyy Ixy
erceptual and Sensory Augmented Computing
erceptual and Sensory Augmented Computing
Hessian Detector – Responses [Beaudet78]
Ixx
Ix
Iy
Ix 2
Iy 2
Ix I y Iy
g(Ix2)
g(Iy2)
1. Image derivatives 2. Square of derivatives 3. Gaussian filter g(σI)
g(IxIy)
4. Cornerness function – both eigenvalues are strong
5. Non-maxima suppression
g(IxIy) har 15
Harris Detector – Responses [Harris88]
Effect: A very precise corner detector.
16
erceptual and Sensory Augmented Computing
erceptual and Sensory Augmented Computing
Harris Detector – Responses [Harris88]
Automatic Scale Selection
17
Automatic Scale Selection
18
erceptual and Sensory Augmented Computing
Same operator responses if the patch contains the same image up to scale factor How to find corresponding patch sizes?
19
Automatic Scale Selection
Automatic Scale Selection
• Function responses for increasing scale (scale signature)
• Function responses for increasing scale (scale signature)
20
erceptual and Sensory Augmented Computing
erceptual and Sensory Augmented Computing
erceptual and Sensory Augmented Computing
• Function responses for increasing scale (scale signature)
21
• Function responses for increasing scale (scale signature)
• Function responses for increasing scale (scale signature)
22
erceptual and Sensory Augmented Computing
Automatic Scale Selection
erceptual and Sensory Augmented Computing
Automatic Scale Selection
• Function responses for increasing scale (scale signature)
• Laplacian-of-Gaussian = “blob” detector
24
Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented
What Is A Useful Signature Function?
erceptual and Sensory Augmented Computing
Automatic Scale Selection
Laplacian-of-Gaussian (LoG) space of Laplacian-ofGaussian
25
Results: Laplacian-of-Gaussian
σ5
σ4
σ3
σ2
σ
⇒ List of (x, y, s)
26
erceptual and Sensory Augmented Computing
• Local maxima in scale erceptual and Sensory Augmented Computing
23
27
Difference-of-Gaussian (DoG)
DoG – Efficient Computation
• Difference of Gaussians as approximation of the
• Computation in Gaussian scale pyramid
=
-
28
erceptual and Sensory Augmented Computing
erceptual and Sensory Augmented Computing
Laplacian-of-Gaussian
Results: Lowe’s DoG
Sampling with step σ4 =2
σ σ σ Original image
σ
29
Harris-Laplace [Mikolajczyk ‘01]
Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented
30
σ4
σ3
σ2
σ Computing Harris function Detecting local maxima 31
Harris-Laplace [Mikolajczyk ‘01]
Maximally Stable Extremal Regions [Matas ‘02]
1. Initialization: Multiscale Harris corner detection 2. Scale selection based on Laplacian
• Based on Watershed segmentation algorithm • Select regions that stay stable over a large parameter
(same procedure with Hessian ⇒ Hessian-Laplace) Harris points
Harris-Laplace points 32
Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented
Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented
Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented
1. Initialization: Multiscale Harris corner detection
range
33
Example Results: MSER
You Can Try It At Home… • For most local feature detectors, executables are
34
Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented
Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented
available online:
Orientation Normalization
35
Local Descriptors
• Compute orientation histogram
• The ideal descriptor should be
[Lowe, SIFT, 1999]
(gradient magnitude and orientation by finite differences) • Select dominant orientation (gradient histogram weighted by magnitude and Gaussian window, σ=1.5s) • Normalize: rotate to fixed orientation (interpolate with parabola, keep all peaks withing 80% of max)
0
2π
36
Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented
Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented
• http://robots.ox.ac.uk/~vgg/research/affine • http://www.cs.ubc.ca/~lowe/keypoints/ • http://www.vision.ee.ethz.ch/~surf
Repeatable Distinctive Compact Efficient
• Most available descriptors focus on edge/gradient information
Capture texture information Color still relatively seldomly used (more suitable for homogenous regions)
37
T. Tuytelaars, B. Leibe
SIFT Descriptor
Histogram of oriented gradients • Captures important texture information • Robust to small translations / affine deformations [Lowe, ICCV 1999] [Lowe, IJCV 2004]
Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented
Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented
Local Descriptors: SIFT Descriptor
• 4x4 Gradient window • Histogram of 4x4 samples per window in 8 directions • Gaussian weighting around center( is 0.5 times that of the scale of a keypoint)
• 4x4x8 = 128 dimensional feature vector
Image from: Jonas Hurrelmann
• • • •
Local Descriptors: SURF • Fast approximation of SIFT idea
Gains do not affect gradients Normalization to unit length removes contrast Saturation affects magnitudes much more than orientation Threshold gradient magnitudes to 0.2 and renormalize
Efficient computation by 2D box filters & integral images ⇒ 6 times faster than SIFT Equivalent quality for object identification
• GPU implementation available
Feature extraction @ 100Hz (detector + descriptor, 640×480 img) http://www.vision.ee.ethz.ch/~surf
[Bay, ECCV’06], [Cornelis, CVGPU’08]
Methodology
41
SURF-64 descriptor
• Using integral images for major speed up Integral Image (summed area tables) is an intermediate representation for the image and contains the sum of gray scale pixel values of image Second order derivative and Haar-wavelet response
• In order to bring in information about the polarity of the intensity changes, extract the sum of absolute value of the responses feature vector of length 64 • Normalize the vector into unit length
Cost four additions operation only
SURF descriptor examples
Local Descriptors: Shape Context Count the number of points inside each bin, e.g.: Count = 4 ...
Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented
Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented
SIFT Descriptor – Lighting changes
Count = 10 Log-polar binning: more precision for nearby points, more flexibility for farther points.
Belongie & Malik, ICCV 2001
Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented
3D modelling
(from Sudderth et al., 2006)
Simultaneous Localization and Mapping (SLAM)
[Se, Lowe & Little, 2001] Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented
Panorama stitching High-resolution document scan (similar to panorama) 3D modelling Location recognition
Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented
Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented
• • • •
(with a laptop) • Robust to motion blur • Source code available (PTAM)
Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented
Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented
Applications of local invariant features Panorama stitching
Brown, ICCV 2003
Location recognition
Recent SLAM improvements
• Real time
[Klein et al.]
So, What Local Features Should I Use? • There have been extensive evaluations/comparisons
[Mikolajczyk et al., IJCV’05, PAMI’05] All detectors/descriptors shown here work well
• Best choice often application dependent
MSER works well for buildings and printed things Harris-/Hessian-Laplace/DoG work well for many natural categories
• More features are better
Combining several detectors often helps
52