Local Invariant Features: Detection & Description - Frédéric Devernay

Efficient. • Most available descriptors focus on edge/gradient information. ➢. Capture texture information .... Log-polar binning: more precision for nearby points,.
4MB taille 1 téléchargements 50 vues
Local Invariant Features: Detection & Description

Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented

Visual Perceptual Object and Recognition Sensory Augmented Tutorial Computing

Motivation

Presentation: Frédéric Devernay, INRIA With many slides from Tinne Tuytelaars and others

1. Find a set of distinctive keypoints

B3



Articulation



Intra-category variations

2

A3 B2 B1 Similarity measure

e.g. color

e.g. color



2. Define a region around each keypoint 3. Extract and normalize the region content 4. Compute a local descriptor from the normalized region

N pixels

5. Match local descriptors

3

• Hessian & Harris • Laplacian, DoG

[Beaudet ‘78], [Harris ‘88]

• • • • • •

[Mikolajczyk & Schmid ‘01]

Harris-/Hessian-Laplace Harris-/Hessian-Affine EBR and IBR MSER Salient Regions Others…

 

Translation, rotation, scale changes (Limited out-of-plane (≈affine) transformations) Lighting variations

• We need a sufficient number of regions to cover the object

• The regions should contain “interesting” structure

4

Keypoint Localization

[Crowley & Parker ‘84], [Lindeberg ‘93-‘98], [Lowe ‘99] [Mikolajczyk & Schmid ‘04] [Tuytelaars & Van Gool ‘04] [Matas ‘02] [Kadir & Brady ‘01]

5

erceptual and Sensory Augmented Computing

Many Existing Detectors Available

Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented

Occlusions

• Region extraction needs to be repeatable and precise Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented

A2



Requirements

A1

N pixels

Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented

Approach

• Global representations have major limitations • Instead, describe and match only local regions • Increased robustness to

• Goals:   

Repeatable detection Precise localization Interesting content

⇒ Look for two-dimensional signal changes 6

Hessian Detector [Beaudet78]

• Hessian determinant

• Hessian determinant Ixx

Iyy Ixy

Intuition: Search for strong derivatives in two orthogonal directions 7

erceptual and Sensory Augmented Computing

erceptual and Sensory Augmented Computing

Hessian Detector [Beaudet78]

In Matlab: 8

Hessian Detector – Responses [Beaudet78]

Effect: Responses mainly on corners and strongly textured areas.

9

10

Harris Detector [Harris88]

Harris Detector [Harris88]

• Second moment matrix

• Second moment matrix (autocorrelation matrix)

Intuition: Search for local neighborhoods where the image content has two main directions (eigenvectors). 11

erceptual and Sensory Augmented Computing

(autocorrelation matrix) erceptual and Sensory Augmented Computing

Iyy Ixy

erceptual and Sensory Augmented Computing

erceptual and Sensory Augmented Computing

Hessian Detector – Responses [Beaudet78]

Ixx

Ix

Iy

Ix 2

Iy 2

Ix I y Iy

g(Ix2)

g(Iy2)

1. Image derivatives 2. Square of derivatives 3. Gaussian filter g(σI)

g(IxIy)

4. Cornerness function – both eigenvalues are strong

5. Non-maxima suppression

g(IxIy) har 15

Harris Detector – Responses [Harris88]

Effect: A very precise corner detector.

16

erceptual and Sensory Augmented Computing

erceptual and Sensory Augmented Computing

Harris Detector – Responses [Harris88]

Automatic Scale Selection

17

Automatic Scale Selection

18

erceptual and Sensory Augmented Computing

Same operator responses if the patch contains the same image up to scale factor How to find corresponding patch sizes?

19

Automatic Scale Selection

Automatic Scale Selection

• Function responses for increasing scale (scale signature)

• Function responses for increasing scale (scale signature)

20

erceptual and Sensory Augmented Computing

erceptual and Sensory Augmented Computing

erceptual and Sensory Augmented Computing

• Function responses for increasing scale (scale signature)

21

• Function responses for increasing scale (scale signature)

• Function responses for increasing scale (scale signature)

22

erceptual and Sensory Augmented Computing

Automatic Scale Selection

erceptual and Sensory Augmented Computing

Automatic Scale Selection

• Function responses for increasing scale (scale signature)

• Laplacian-of-Gaussian = “blob” detector

24

Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented

What Is A Useful Signature Function?

erceptual and Sensory Augmented Computing

Automatic Scale Selection

Laplacian-of-Gaussian (LoG) space of Laplacian-ofGaussian

25

Results: Laplacian-of-Gaussian

σ5

σ4

σ3

σ2

σ

⇒ List of (x, y, s)

26

erceptual and Sensory Augmented Computing

• Local maxima in scale erceptual and Sensory Augmented Computing

23

27

Difference-of-Gaussian (DoG)

DoG – Efficient Computation

• Difference of Gaussians as approximation of the

• Computation in Gaussian scale pyramid

=

-

28

erceptual and Sensory Augmented Computing

erceptual and Sensory Augmented Computing

Laplacian-of-Gaussian

Results: Lowe’s DoG

Sampling with step σ4 =2

σ σ σ Original image

σ

29

Harris-Laplace [Mikolajczyk ‘01]

Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented

30

σ4

σ3

σ2

σ Computing Harris function Detecting local maxima 31

Harris-Laplace [Mikolajczyk ‘01]

Maximally Stable Extremal Regions [Matas ‘02]

1. Initialization: Multiscale Harris corner detection 2. Scale selection based on Laplacian

• Based on Watershed segmentation algorithm • Select regions that stay stable over a large parameter

(same procedure with Hessian ⇒ Hessian-Laplace) Harris points

Harris-Laplace points 32

Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented

Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented

Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented

1. Initialization: Multiscale Harris corner detection

range

33

Example Results: MSER

You Can Try It At Home… • For most local feature detectors, executables are

34

Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented

Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented

available online:

Orientation Normalization

35

Local Descriptors

• Compute orientation histogram

• The ideal descriptor should be

[Lowe, SIFT, 1999]

(gradient magnitude and orientation by finite differences) • Select dominant orientation (gradient histogram weighted by magnitude and Gaussian window, σ=1.5s) • Normalize: rotate to fixed orientation (interpolate with parabola, keep all peaks withing 80% of max)

0



36



Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented

Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented

• http://robots.ox.ac.uk/~vgg/research/affine • http://www.cs.ubc.ca/~lowe/keypoints/ • http://www.vision.ee.ethz.ch/~surf

  

Repeatable Distinctive Compact Efficient

• Most available descriptors focus on edge/gradient information  

Capture texture information Color still relatively seldomly used (more suitable for homogenous regions)

37

T. Tuytelaars, B. Leibe

SIFT Descriptor

Histogram of oriented gradients • Captures important texture information • Robust to small translations / affine deformations [Lowe, ICCV 1999] [Lowe, IJCV 2004]

Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented

Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented

Local Descriptors: SIFT Descriptor

• 4x4 Gradient window • Histogram of 4x4 samples per window in 8 directions • Gaussian weighting around center( is 0.5 times that of the scale of a keypoint)

• 4x4x8 = 128 dimensional feature vector

Image from: Jonas Hurrelmann

• • • •

Local Descriptors: SURF • Fast approximation of SIFT idea

Gains do not affect gradients Normalization to unit length removes contrast Saturation affects magnitudes much more than orientation Threshold gradient magnitudes to 0.2 and renormalize





Efficient computation by 2D box filters & integral images ⇒ 6 times faster than SIFT Equivalent quality for object identification

• GPU implementation available 



Feature extraction @ 100Hz (detector + descriptor, 640×480 img) http://www.vision.ee.ethz.ch/~surf

[Bay, ECCV’06], [Cornelis, CVGPU’08]

Methodology 



41

SURF-64 descriptor

• Using integral images for major speed up Integral Image (summed area tables) is an intermediate representation for the image and contains the sum of gray scale pixel values of image Second order derivative and Haar-wavelet response

• In order to bring in information about the polarity of the intensity changes, extract the sum of absolute value of the responses  feature vector of length 64 • Normalize the vector into unit length

Cost four additions operation only

SURF descriptor examples

Local Descriptors: Shape Context Count the number of points inside each bin, e.g.: Count = 4 ...

Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented

Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented

SIFT Descriptor – Lighting changes

Count = 10 Log-polar binning: more precision for nearby points, more flexibility for farther points.

Belongie & Malik, ICCV 2001

Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented

3D modelling

(from Sudderth et al., 2006)‫‏‬

Simultaneous Localization and Mapping (SLAM)

[Se, Lowe & Little, 2001] Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented

Panorama stitching High-resolution document scan (similar to panorama) 3D modelling Location recognition

Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented

Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented

• • • •

(with a laptop) • Robust to motion blur • Source code available (PTAM)

Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented

Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented

Applications of local invariant features Panorama stitching

Brown, ICCV 2003

Location recognition

Recent SLAM improvements

• Real time

[Klein et al.]

So, What Local Features Should I Use? • There have been extensive evaluations/comparisons  

[Mikolajczyk et al., IJCV’05, PAMI’05] All detectors/descriptors shown here work well

• Best choice often application dependent  

MSER works well for buildings and printed things Harris-/Hessian-Laplace/DoG work well for many natural categories

• More features are better 

Combining several detectors often helps

52