Euclidean representations of a set of hierarchies using Multiple Factor Analysis Cadoret M.*, Lê S.* and Pagès J.* * Applied mathematics department Agrocampus Ouest, France 9 February 2011
Correspondence Analysis and Related Methods 2011
Laboratoire de Mathématiques Appliquées Agrocampus
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Outline
1
Introduction
2
Data coding
3
Statistical analysis
4
Application
5
Conclusion
2/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Introduction Interested in: Set of non-indexed hierarchies Synthetic graphical representations At least 2 possible graphical representations: As a hierarchy consensus (Adams, 1972) Same shape of the data Consensus difficult to obtain when the number of hierarchies increases
As an Euclidean representation of the hierarchies: representation of the terminal nodes, etc.
3/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Outline
1
Introduction
2
Data coding
3
Statistical analysis
4
Application
5
Conclusion
4/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Data coding (1) L1
H I J K L M N O P
A B C D E F G
A
A
C B
B
D
H I J K L
E F G
C D
E
F
G
H I JK L
M O
N P
M N OP
L2
L3
A B C D E F G H I J K L M N O P
5/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Data coding (1)
L1
A B C D E F G
H I J K L M N O P
A B C D E F G H I J K L M N O P
L1 G1 G1 G1 G1 G1 G1 G1 G2 G2 G2 G2 G2 G2 G2 G2 G2
L2
L3
5/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Data coding (1)
L1 H I J K L M N O P
A B C D E F G L2 A
C B
D
E F G
H I J K L
M O
N P
A B C D E F G H I J K L M N O P
L1 G1 G1 G1 G1 G1 G1 G1 G2 G2 G2 G2 G2 G2 G2 G2 G2
L2 G1 G1 G2 G2 G2 G2 G2 G3 G3 G3 G3 G3 G4 G4 G4 G4
L3
5/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Data coding (1)
L1 H I J K L M N O P
A B C D E F G L2 A
C B
D
H I J K L
E F G
M O
N P
L3 A
B
C D
E
F
G
H I JK L
M N OP
A B C D E F G H I J K L M N O P
L1 G1 G1 G1 G1 G1 G1 G1 G2 G2 G2 G2 G2 G2 G2 G2 G2
L2 G1 G1 G2 G2 G2 G2 G2 G3 G3 G3 G3 G3 G4 G4 G4 G4
L3 G1 G1 G2 G2 G3 G3 G3 G4 G4 G4 G4 G4 G5 G5 G5 G5
5/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Data coding (2)
Hierarchy 1
Hierarchy j
Hierarchy J
L1 L2 L3
L1 L2
L1 L2 L3 L4
1
I
6/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Outline
1
Introduction
2
Data coding
3
Statistical analysis
4
Application
5
Conclusion
7/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Check on data coding and analysis of 1 hierarchy Data table with qualitative variables Multiple Correspondence Analysis + Ascendant Hierarchical Classification on the dimensions
8/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Check on data coding and analysis of 1 hierarchy
0.8
1.0
Data table with qualitative variables Multiple Correspondence Analysis + Ascendant Hierarchical Classification on the dimensions
B
H I J K L
C D
E
F
G
H I JK L
M O
N
0.4
D
E F G
P
M N OP
0.2
A
C B
0.0
A
0.6
H I J K L M N O P
A B C D E F G
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
8/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Check on data coding and analysis of 1 hierarchy
0.8
1.0
Data table with qualitative variables Multiple Correspondence Analysis + Ascendant Hierarchical Classification on the dimensions
B
H I J K L
C D
E
F
G
H I JK L
M O
N
0.4
D
E F G
P
M N OP
0.2
A
C B
0.0
A
0.6
H I J K L M N O P
A B C D E F G
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
⇒ We found the initial hierarchy 8/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Objectives
From a data table with a group structure on the variables, we want to perform a global factorial analysis such as:
9/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Objectives
From a data table with a group structure on the variables, we want to perform a global factorial analysis such as: it provides graphical representations of objects, hierarchies and levels of hierarchy the influence of each hierarchy is balanced
9/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Objectives
From a data table with a group structure on the variables, we want to perform a global factorial analysis such as: it provides graphical representations of objects, hierarchies and levels of hierarchy the influence of each hierarchy is balanced ⇒ Multiple Factor Analysis (MFA; Escofier and Pagès (1982)) in which 1 hierarchy corresponds to 1 group of variables
9/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Multiple Correspondence Analysis (MCA) MCA is looking for dimensions zs that maximize: Q 1 X 2 η (zs , Lq ), Q q
with: Q the number of qualitative variables zs the axis s Lq the qualitative variable q
10/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Multiple Correspondence Analysis (MCA) MCA is looking for dimensions zs that maximize: Q 1 X 2 η (zs , Lq ), Q q
with: Q the number of qualitative variables zs the axis s Lq the qualitative variable q
2 0
10/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Multiple Correspondence Analysis (MCA) MCA is looking for dimensions zs that maximize: Q 1 X 2 η (zs , Lq ), Q q
with: Q the number of qualitative variables zs the axis s Lq the qualitative variable q
2 0 2 1 10/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Multiple Factor Analysis (MFA) MFA is looking for dimensions zs that maximize the following criterion: Qj J X 1 X 2 η (zs , Lq ), Qj q j
with: Qj the number of level of hierarchy j zs the axis s Lq the level q of the hierarchy j
11/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Multiple Factor Analysis (MFA) MFA is looking for dimensions zs that maximize the following criterion: Qj J X 1 X 2 η (zs , Lq ), Qj q j
with: Qj the number of level of hierarchy j zs the axis s Lq the level q of the hierarchy j ⇒ In this particular case: criterion maximized by MFA ⇔ sum of criteria maximized by MCA 11/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Disjunctive data table associated with one hierarchy j L1 L1 1
Lqj
LQj
1
LQj
k
Kj
1
i
I
Lqj
01
yik
000001000
I1
Ik
IKj
I
Each level (associated with a hierarchy) is represented by a set of dummy variables 12/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Object representation Distance between 2 objects: X 1 X I X 2 d 2 (i, l ) = (yik − ylk )2 = dMCA (i, l ), j Qj Ik j
k∈Kj
j
with: Qj the number of level of hierarchy j I the number of objects Ik the number of objects into the group k yik the element of the disjunctive data table which is equal to 1 if the object i belong to group k and 0 in the opposite case In this particular case: sum of usual distance in MCA ⇒ 2 objects will be closer than they belong to the same group for a lot of hierarchies 13/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Global hierarchy representation Coordinate of hierarchy j on axis s: 1 X 2 η (zs , Lq ), Qj
1 H1
q∈Qj
with: H2
Lg ( z , H ) 2
Qj the number of level of hierarchy j
H3
2
zs the axis s
0 0
Lg ( z , H ) 1
2
1
Lq the level q of the hierarchy j
14/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Level representation Coordinate of level q on axis s: η 2 (zs , Lq ),
1
with: H3L3 H3L2
Lq the level q
H3
2 consequences:
H3L1 0 0
zs the axis s
1
Levels ordered along each axis Hierarchy = barycenter of its levels
15/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Outline
1
Introduction
2
Data coding
3
Statistical analysis
4
Application
5
Conclusion
16/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Data 16 advertisements concerning an orange juice Advertisements built according to a 25−1 fractional factorial design 22 subjects Hierarchical sorting
17/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Example of hierarchical sorting: subject number 3
K E
C M B H
N J G F
I
P
O L
D A 18/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Example of hierarchical sorting: subject number 5
D A N
B L
C M
E F
P
J K I
O H
G
19/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
6
Advertisement representation
4
F
K
0
2
B
-2
λ1 = 16.55 CM P I O L
N J
-4
Dim 2 (14.18 %)
A D
E H
G
-6
-4
-2
0
2
4
6
Dim 1 (15.62 %)
20/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
6
Advertisement representation
4
F
K
0
2
B
-2
λ1 = 16.55 CM P I O L
N J
-4
Dim 2 (14.18 %)
A D
E H
G
-6
-4
-2
0
2
4
6
Dim 1 (15.62 %)
Background color
20/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
6
Advertisement representation
4
F
2 0
λ1 = 16.55 -2
Dim 2 (14.18 %)
K
B
CM P I O L
N J
-4
Figurative
A D
E H
G
-6
-4
-2
0
2
4
6
Dim 1 (15.62 %)
Background color
20/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
1.0
Hierarchy representation 7
15
6
0.8
8 11 18 22
10
9 17 20
21 1 16 2
0.6
14
12 13
0.4
4
0.2
3
5 0.0
Dim 2 (14.18 %)
19
0.0
0.2
0.4
0.6
0.8
1.0
Dim 1 (15.62 %)
21/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
15 10
7 6
14 4
8 9 11 17 18 20 22
19 21 1 16 2
12
13
3
L2
0.0
0.2
L1
Dim 2 (14.18 %) 0.4 0.6 0.8
1.0
Hierarchy representation: subject number 3
5 0.0
0.2
0.4
0.6
0.8
1.0
6
Dim 1 (15.62 %)
Dim 2 (14.18 %) 0 2 4 -2
CM NJ
-4
L3
FG A D
E H K B
-6
-4
P I OL -2
0
2
4
6
Dim 1 (15.62 %)
22/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
1.0
Level representation: subject number 3
0.6 0.4 0.2
3
3.L2 3.L1
0.0
Dim 2 (14.18%)
0.8
3.L3
0.0
0.2
0.4
0.6
0.8
1.0
Dim 1 (15.62%)
23/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Level representation: trajectories
1.0
7.L3
0.8
6.L1 6.L2 7.L1 7.L2 10.L115.L1
7.L4
6.L3
6.L4
12.L4
13.L3
12.L3 14.L3
12.L2
0.6
1.L3 3.L3 8.L4 9.L4 10.L4 11.L4 16.L2 17.L3 18.L3 19.L3 20.L2 21.L2 22.L3
2.L2 8.L2 9.L2 10.L2 11.L2 15.L2 16.L3 17.L4 18.L4 19.L4 20.L3 21.L3 22.L4
4.L3
4.L2
0.4
14.L1
13.L2
1.L1 3.L1 8.L1 11.L1 16.L1 18.L1 20.L1 22.L1
0.2
4.L1 13.L1
5.L2 0.0
Dim 2 (14.18%)
14.L2
1.L2 2.L3 8.L3 9.L3 10.L3 11.L3 15.L3 17.L2 18.L2 19.L2 19.L5 20.L4 22.L2
5.L1 0.0
0.2
0.4
0.6
Dim 1 (15.62%)
0.8
2.L1 3.L2 9.L1 12.L1 17.L1 19.L1 21.L1
1.0
24/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Level representation: trajectories
1.0
7.L3
7.L4
6.L3
6.L4
18% 0.8
6.L1 6.L2 7.L1 7.L2 10.L115.L1
12.L4
13.L3
12.L3 14.L3 12.L2
0.6
1.L3 3.L3 8.L4 9.L4 10.L4 11.L4 16.L2 17.L3 18.L3 19.L3 20.L2 21.L2 22.L3
2.L2 8.L2 9.L2 10.L2 11.L2 15.L2 16.L3 17.L4 18.L4 19.L4 20.L3 21.L3 22.L4
4.L3 4.L2
63%
14.L1 0.4
18% 13.L2
1.L1 3.L1 8.L1 11.L1 16.L1 18.L1 20.L1 22.L1
0.2
4.L1 13.L1
5.L2 0.0
Dim 2 (14.18%)
14.L2
1.L2 2.L3 8.L3 9.L3 10.L3 11.L3 15.L3 17.L2 18.L2 19.L2 19.L5 20.L4 22.L2
5.L1 0.0
0.2
0.4
0.6
Dim 1 (15.62%)
0.8
2.L1 3.L2 9.L1 12.L1 17.L1 19.L1 21.L1
1.0
24/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Outline
1
Introduction
2
Data coding
3
Statistical analysis
4
Application
5
Conclusion
25/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
Conclusion
Methodology providing: Representation of objects, hierarchies, levels Representations related to each other Representations interpretable according to simple rules
In the example, suggests groups of hierarchies Allows the simultaneous taking into account of hierarchies and partitions in a same analysis Program available in the SensoMineR package
26/ 27
Introduction
Data coding
Statistical analysis
Application
Conclusion
References
References
Adams, E. I. (1972). Consensus techniques and the comparison of taxonomic trees. Systematic Zoology, 21:390–397. Escofier, B. and Pagès, J. (1982). Comparaison de groupes de variables définies sur le même ensemble d’individus. Rapport de recherche INRIA, 149.
27/ 27