Deep Learning Compact and Invariant Image Descriptors for Instance

Jun 8, 2016 - Training K clusters of local descriptors ... Convolution & Pooling. 11 .... Compact. Binary Hash. 64-1K bits. Matching & non-matching pairs. High- ...
13MB taille 0 téléchargements 287 vues
Deep Learning Compact and Invariant Image Descriptors for Instance Retrieval Olivier Morère 08/06/2016

About this thesis

Antoine VEILLARD Daniel RACOCEANU

Vijay CHANDRASEKHAR Hanlin GOH

2

Image Instance Retrieval Existing image database

New query image

Retrieve database images depicting the same object

3

A Wide Range of Applications

4

A Challenging Problem

5

Comparing Image Global Descriptors for Retrieval New query image

Pairwise distances

Images global descriptors

Image global descriptor

1 1 0 1 1 0 … 0 0

0 1 0 1 1 0 … 0 0

1 0 0 0 0 1 … 1 0

Existing image database

0 0 1 1 1 1 … 0 1 6

SIFT Descriptor

Local Descriptor SIFT S S S S S S S S

Dxx

Maxima Dxy

y

DxxDyy-(0.9Dxy)2 Dyy

Orient along dominant gradient

Filters

x Oriented Patch Gradient Field

Blob Response

Global Descriptor: VLAD/FV K x 128 Step 4 Concatenate residuals into the global descriptor

Step 1 Training K clusters of local descriptors on the image training set

Step 2 Extract local descriptors from new image

128

128



Step 3 Compute local descriptor residual statistics in each bin

8

ImageNet Results over the Years

9

Deep Convolutional Neural Networks (CNN)

[Krizhevsky, 2012]

10

Convolution & Pooling

• Convolution • Local connectivity • Stationarity of the signal

• Pooling • Dimensionality reduction • Local invariance 11

Learning Abstract Visual Representations [Zeiler, 2013] Layer 1

Layer 4

Layer 2

Layer 5

Layer 3

Original Contributions • Image classification with Convolutional Neural Networks (CNN), ImageNet2014 • Thorough comparison study of FV and CNN for image instance retrieval • Hashing CNN descriptors • Unsupervised dimensionality reduction • Descriptor finetuning with unsupervised and semisupervised metric learning

• Robust CNN descriptors with i-theory 13

List of Publications (1/2) Conference papers • O. Morère, J. Lin, A. Veillard, V. Chandrasekhar, T. Poggio. Nested Invariance Pooling and RBM Hashing for Image Instance Retrieval. Submitted to European Conference on Computer Vision (ECCV) 2016 • O. Morère, J. Lin, V. Chandrasekhar, A. Veillard, H. Goh. Co-Sparsity Regularized Deep Hashing for Image Instance Retrieval. Accepted in International Conference on Image Processing (ICIP) 2016 • O. Morère, J. Lin, J. Petta, V. Chandrasekhar and A. Veillard Tiny descriptors for image retrieval with unsupervised triplet hashing. Data Compression Conference (DCC) 2016. • C. Dao-Duc, H. Xiaohui and O. Morère. Maritime Vessel Images Classification Using Deep Convolutional Neural Networks. Symposium on Information and Communication Technology (SOICT) 2015 • V. Chandrasekhar, J. Lin, O. Morère, A. Veillard and H. Goh. Compact Global Descriptors for Visual Search. Data Compression Conference (DCC) 2015 14

List of Publications (2/2) Technical reports

• O. Morère, A. Veillard, J. Lin, J. Petta, V. Chandrasekhar and T. Poggio. Group In- variant Deep Representations for Image Instance Retrieval. Center for Brains, Minds and Machines (CBMM) 2016 • O. Morère, J. Lin, V. Chandrasekhar, A. Veillard and H. Goh. Deephash: Getting Regularization, Depth and Fine-tuning Right. arXiv preprint arXiv:1501.04711 2015

Contests and workshops

• O. Morère, A. Veillard, H. Goh. Team “LateFusion”. Kaggle National Data Science Bowl Challenge 2015 • O. Morère, H. Goh, A. Veillard, V. Chandrasekhar. Large Scale Image Classifica- tion on a Shoe String. ImageNet Large Scale Visual Recognition Challenge, European Conference on Computer Vision (ECCV) 2014

Journal articles

• O. Morère, V. Chandrasekhar, J. Lin, H. Goh and A. Veillard. A Practical Guide to CNNs and Fisher Vectors for Image Instance Retrieval. Accepted in Signal Processing (SIGPRO) 2016 15

PART 1 1. Thorough comparison study of FV and CNN for image instance retrieval 2. Hashing CNN descriptors 3. Robust CNN descriptors with i-theory

16

CNN vs Fisher Vector Fisher Vector

FV global descriptor

Input Image

Deep Convolutional Neural Network

CNN global descriptor

17

How to Extract CNN Descriptors for Image Instance Retrieval ? CAT

VISUAL

?

SEMANTIC

Retrieval Data Sets Stanford Mobile Visual Search [Chandrasekhar, 2011]

University of Kentucky Benchmark [Nister, 2006]

Oxford Buildings [Philbin, 2007]

INRIA Holidays [Jégou, 2008]

Object centric

Scene centric

19

Evaluation Metrics |{RelevantIndividuals} \ {RetrievedIndividuals}| Recall = |RelevantIndividuals|

AP =

Pn

⇥ isRelevant(k)) |RelevantIndividuals|

k=1 (P recision(k)

20

Best Practices for CNN Descriptors

21

Best Practices for FV Interest Points

• Multi-scale improves performance over single-scale • DoG is required when there are big scale and rotation changes (e.g. Graphics) • Dense works better with highly textured images (e.g. Holidays) 22

Impact of Rotation Query image vectors Feature extraction Unknown objects Matching

Feature extraction

Rotations

Database image vector

Unknown orientations Database images

Query image

23

Impact of Rotation - CNN Holidays 0.8

pool5 fc6 fc7 fc8

Mean Average Precision

0.7

0.6

0.5

0.4

0.3

0.2

0.1 −200

−150

−100

−50

0

50

100

150

200

Query Rotation Angle

• Very limited invariance to rotation • Invariance to rotation does not increase with depth

24

Impact of Rotation - CNN vs FV Holidays 0.9

OxfordNet−FC6 . FVDoG FVDS FVDM

Mean Average Precision

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 −200

−150

−100

−50

0

50

100

150

200

Query Rotation Angle 25

Further Readings O. Morère, V. Chandrasekhar, J. Lin, H. Goh and A. Veillard. A Practical Guide to CNNs and Fisher Vectors for Image Instance Retrieval. Accepted in Signal Processing (SIGPRO) 2016

26

Part 1: Summary • CNN performs well for image instance retrieval • Two problems remain to be addressed: • Descriptor dimensionality • Lack of robustness

27

PART 2 1. Thorough comparison study of FV and CNN for image instance retrieval 2. Hashing CNN descriptors 3. Robust CNN descriptors with i-theory

28

Why 64-bit Hash ? • Motivation • Billions of images can be stored in RAM • Fast matching with ultra-fast Hamming distance • Challenges • Global descriptors are very high dimensional • Uncompressed: 4K-25K floating point numbers

29

Hashing Outline Global Feature Extraction High-dimensional Image Descriptor

Training Phase 1: Unsupervised Stacked Regularized RBMs W1

W2

WL ...

Input Image Training

Fisher Vector or Deep Convolutional Neural Network

8K-64K dim.

or

Transfer model

Training Phase 2: Fine-Tuning

4K dim.

Deep Siamese Network W1

W2

WL ...

Testing

Trained DeepHash Model

Image Descriptor Hashing (Testing)

W1

W2

Compact Binary Hash

Matching & non-matching pairs

W1

WL

Loss1

Loss2

W2

LossL

WL ...

... 64-1K bits

30

Hashing Outline 1. Dimensionality Reduction with Stacked RBM 2. Semi-supervised fine-tuning with Siamese networks 3. Unsupervised fine-tuning with triplet networks

31

Generate latent

RBM

Reconstruct input

• Bipartite graph model with input units and latent units • Closed form expression from one to the other

P (zj = 1|x) = sigmoid(

X

weights

wij xi )

i

P (xi = 1|z) = sigmoid(

X j

wij zj ) x

w

z 32

Contrastive Divergence Input Layer Original training data x

[Hinton, 2002]

Latent Layer

P (z|x)

Latent representation z

}

P ositiveij = xi zj

}

N egativeij = x0i zj0

P (x|z)

Reconstructed data x’

P (z|x)

Latent representation z’

W = ✏(P ositive

N egative) 33

Greedily Training Stacked RBM

[Hinton, 2006 Bengio, 2006]

RBM 2 64 units RBM 1 Stacked RBM

1024 units

8192 units Step 1: - Training RBM1

Step 2: - Freezing RBM1 weights - Training RBM2 using RBM1 latent layer as input layer 34

Regularization

[Hinton, 2009]

• It is often desirable that the latent variables follow certain distributions • For example, sparse distributions work well for classification • Hinton proposes regularization that encourages each unit activation q towards p |n A m j=0

0

1 n1

m 1 1 @X XG,i,n (x) = fi (gj .x)n A m j=0

59

Nested Invariance Pooling (NIP) Translation Invariance

Rotation Invariance

Scale Invariance

60

Distances for Three Matching Pairs

(a)

(b)

(c) 61

NIP Evaluation

62

NIP + RBMH • NIP transformation invariant descriptors can be hashed into compact binary codes using RBMH Batch Regularizer

Convolutional Neural Net

0.5 0.5 0.5 0.5 0.5 0.5

0 1 0 1

GS ; n = 1

GT ; n = 2

1 0 0 1

0 1 0 1

1 0 1 0

0 1 1 0

1 0 1 0

Invariant Descriptor

1. CNN feature extraction

GR ; n ! 1 2. Nested Invariance Pooling

0.5 0.5 0.5 Compact Hash

Group Transform.

Input Image

0.5

RBMH

32#256&bits

512&dim.

3. RBM for Hashing

63

NIP + RBMH - Evaluation

64

Conclusion • Thorough comparison study of FV and CNN for image instance retrieval • Hashing CNN descriptors • Unsupervised dimensionality reduction • Descriptor finetuning with unsupervised and semisupervised metric learning

• Robust CNN descriptors with i-theory

65

Acknowledgements Antoine VEILLARD Daniel RACOCEANU

Vijay CHANDRASEKHAR Hanlin GOH Jie Lin