SPATIAL RELATIONSHIPS IN VISUAL GRAPH MODELING FOR IMAGE CATEGORIZATION Trong-Ton Pham1,2, Philippe Mulhem2, Loic Maisonnasse3
[email protected],
[email protected],
[email protected]
Grenoble INP1 - Laboratoire Informatique de Grenoble2 - TecKnowMetrix3 Acknowledgment: AVEIR (ANR France), Merlion Ph.D (Singapore), ACM SIGIR travel grant
Introduction
Visual Graph Modeling Visual graph Gi for an image i defined by Gi=
C1
•S
C3 C2
Feature Extraction
Graph Modeling
Concept Construction
Graph Matching
WC
set of weighted vissual concepts set WC
WCi
•S
Visual Graph Modeling
Bag-of-word Model
i
[Blei and Jordan 2003] [Barnard et al. 2003] [Fei-Fei and Fergus 2007]
i
WE
set of weighted relations set WE
i WE
9
Goal: to provide a still image representation to fill the semantic gap and to allow a fast retrieval.
= {(c, # (c, i)) | c ∈ C} = {(( c , c ' ), l , # ( c , c ' , l , i )) | ( c , c ) ∈ C × C ' , l ∈ L}
Example:
Our proposal: • a graph-based representation of image content • a fast graph matching inspired by Language Model of IR Application: robot localization, scene identification
Language Model for Graph Matching Given the query graph Gq=, the probability generated by a trained graph Gd is computed as
q q q P (G q | G d ) = P( SWC | G d ) × P ( SWE | SWC ,Gd )
• Relation sets independence hypothesis
• Concept sets independence hypothesis
∏ P (W
q C
q q P ( S WE | S WC ,G d ) =
|Gd )
q P (WEq | SWC ,Gd ) ∝
P (c | G d ) # ( c , q )
q | SWC ,G d )
∏ P ( L (c , c ' ) = l | W
q q d # ( c ,c ',l , q ) C ,WC ' , G )
( c ,c ',l )∈C ×C '× L
c∈C
• Jelinek-Mercer smoothing
• Jelinek-Mercer smoothing P (c | G d ) = (1 − λc )
q E
• Multinomial distribution model
• Multinomial distribution model
∏
∏ P (W q W Eq ∈ S WE
q W Cq ∈ S WC
P(WCq | G d ) ∝
[Song and Croft, 1999]
Relations
Concepts q P ( S WC |Gd ) =
extension of standard Language Modeling
# (c , d ) # (c , D ) + λc # (*, d ) # (*, D )
P ( L(c, c' ) = l | WCq , WCq' , G d ) = (1 − λl )
# (c , c ' , l , d ) # (c , c ' , l , D ) + λl # (c, c' ,*, d ) # (c, c' ,*, D )
Case 1: Robot Localization
Case 2: Scene Identification
Objective: localization a mobile robot within a known environment with visual information
Objective: a mobile image search platform to enhance tourist experiences (Snap2Tell)
Collection: RobotVision for ImageCLEF’09
Training set: night condition
• 3 image sets: training set of 1034 images, validation set of 909 images and test set of 1690 images
Collection: Singapore Tourist Object Identification Collection (STOIC) • Training set of 3189 images, test set of 660 images
• 5 rooms and an unknown room in test set
• 101 Singapore popular landmarks (101 classes) Validation set: sunny condition (after 6 months)
Proposed models:
Proposed models:
• Graph without relation LM =< {WCpatch, WCsift}, Φ >
• Graph without relation LM = < {WCpatch}, Φ >
• Graph with inter-relation set VGM=< {WCpatch, WCsift}, {WEinside} >
• Graph with intra-relation sets VGM = < {WCpatch}, {WEleft_of ,WEtop_of }>
Test set: unknown condition (after 20 months)
WCpatch #1
p1
#1
top_of, #1 left_of, #1
#2
p2 #2
#2
#5
#3
#1
p4
left_of, #1 top_of, #2 #3
WEinside
c2
c1
left_of, #1
top_of, #2 left_of, #1 #2
c3
#3
s1
si
left_of, #1
#1
c4
WCsift
top_of, #1 left_of, #1
Result & Discussion #class
LM
VGM
SVM
RobotVision Validation Test
5 6
0.579 0.416
0.675 (+16.6%) 0.449 (+7.9%)
0.535 0.439
STOIC-101
101
0.789
0.809 (+2.5%)
0.744
Categorizing of STOIC-101 and RobotVision image collections
This result shown: •
stability of visual graph induction process from different types of visual concepts
•
benefits of using spatial relationships among different visual concepts
•
good matching performance of visual graphs (~ 5 graphs/sec)
Future works: •
adding more visual concepts and integrating new type of relations
•
completing the general graph theory and framework for image search