Euclidean representations of a set of hierarchies ... - Marine Cadoret

Introduction. Data coding. Statistical analysis. Application. Conclusion. References. Data coding (1). A B. H. C. I. D. J K L. E. M. N. F. G. O P. L1. L1. L2. L3. A. G1.
1MB taille 2 téléchargements 297 vues
Euclidean representations of a set of hierarchies using Multiple Factor Analysis Cadoret M.*, Lê S.* and Pagès J.* * Applied mathematics department Agrocampus Ouest, France 9 February 2011

Correspondence Analysis and Related Methods 2011

Laboratoire de Mathématiques Appliquées Agrocampus

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Outline

1

Introduction

2

Data coding

3

Statistical analysis

4

Application

5

Conclusion

2/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Introduction Interested in: Set of non-indexed hierarchies Synthetic graphical representations At least 2 possible graphical representations: As a hierarchy consensus (Adams, 1972) Same shape of the data Consensus difficult to obtain when the number of hierarchies increases

As an Euclidean representation of the hierarchies: representation of the terminal nodes, etc.

3/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Outline

1

Introduction

2

Data coding

3

Statistical analysis

4

Application

5

Conclusion

4/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Data coding (1) L1

H I J K L M N O P

A B C D E F G

A

A

C B

B

D

H I J K L

E F G

C D

E

F

G

H I JK L

M O

N P

M N OP

L2

L3

A B C D E F G H I J K L M N O P

5/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Data coding (1)

L1

A B C D E F G

H I J K L M N O P

A B C D E F G H I J K L M N O P

L1 G1 G1 G1 G1 G1 G1 G1 G2 G2 G2 G2 G2 G2 G2 G2 G2

L2

L3

5/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Data coding (1)

L1 H I J K L M N O P

A B C D E F G L2 A

C B

D

E F G

H I J K L

M O

N P

A B C D E F G H I J K L M N O P

L1 G1 G1 G1 G1 G1 G1 G1 G2 G2 G2 G2 G2 G2 G2 G2 G2

L2 G1 G1 G2 G2 G2 G2 G2 G3 G3 G3 G3 G3 G4 G4 G4 G4

L3

5/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Data coding (1)

L1 H I J K L M N O P

A B C D E F G L2 A

C B

D

H I J K L

E F G

M O

N P

L3 A

B

C D

E

F

G

H I JK L

M N OP

A B C D E F G H I J K L M N O P

L1 G1 G1 G1 G1 G1 G1 G1 G2 G2 G2 G2 G2 G2 G2 G2 G2

L2 G1 G1 G2 G2 G2 G2 G2 G3 G3 G3 G3 G3 G4 G4 G4 G4

L3 G1 G1 G2 G2 G3 G3 G3 G4 G4 G4 G4 G4 G5 G5 G5 G5

5/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Data coding (2)

Hierarchy 1

Hierarchy j

Hierarchy J

L1 L2 L3

L1 L2

L1 L2 L3 L4

1

I

6/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Outline

1

Introduction

2

Data coding

3

Statistical analysis

4

Application

5

Conclusion

7/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Check on data coding and analysis of 1 hierarchy Data table with qualitative variables Multiple Correspondence Analysis + Ascendant Hierarchical Classification on the dimensions

8/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Check on data coding and analysis of 1 hierarchy

0.8

1.0

Data table with qualitative variables Multiple Correspondence Analysis + Ascendant Hierarchical Classification on the dimensions

B

H I J K L

C D

E

F

G

H I JK L

M O

N

0.4

D

E F G

P

M N OP

0.2

A

C B

0.0

A

0.6

H I J K L M N O P

A B C D E F G

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

8/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Check on data coding and analysis of 1 hierarchy

0.8

1.0

Data table with qualitative variables Multiple Correspondence Analysis + Ascendant Hierarchical Classification on the dimensions

B

H I J K L

C D

E

F

G

H I JK L

M O

N

0.4

D

E F G

P

M N OP

0.2

A

C B

0.0

A

0.6

H I J K L M N O P

A B C D E F G

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

⇒ We found the initial hierarchy 8/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Objectives

From a data table with a group structure on the variables, we want to perform a global factorial analysis such as:

9/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Objectives

From a data table with a group structure on the variables, we want to perform a global factorial analysis such as: it provides graphical representations of objects, hierarchies and levels of hierarchy the influence of each hierarchy is balanced

9/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Objectives

From a data table with a group structure on the variables, we want to perform a global factorial analysis such as: it provides graphical representations of objects, hierarchies and levels of hierarchy the influence of each hierarchy is balanced ⇒ Multiple Factor Analysis (MFA; Escofier and Pagès (1982)) in which 1 hierarchy corresponds to 1 group of variables

9/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Multiple Correspondence Analysis (MCA) MCA is looking for dimensions zs that maximize: Q 1 X 2 η (zs , Lq ), Q q

with: Q the number of qualitative variables zs the axis s Lq the qualitative variable q

10/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Multiple Correspondence Analysis (MCA) MCA is looking for dimensions zs that maximize: Q 1 X 2 η (zs , Lq ), Q q

with: Q the number of qualitative variables zs the axis s Lq the qualitative variable q

2  0

10/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Multiple Correspondence Analysis (MCA) MCA is looking for dimensions zs that maximize: Q 1 X 2 η (zs , Lq ), Q q

with: Q the number of qualitative variables zs the axis s Lq the qualitative variable q

2  0 2 1 10/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Multiple Factor Analysis (MFA) MFA is looking for dimensions zs that maximize the following criterion: Qj J X 1 X 2 η (zs , Lq ), Qj q j

with: Qj the number of level of hierarchy j zs the axis s Lq the level q of the hierarchy j

11/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Multiple Factor Analysis (MFA) MFA is looking for dimensions zs that maximize the following criterion: Qj J X 1 X 2 η (zs , Lq ), Qj q j

with: Qj the number of level of hierarchy j zs the axis s Lq the level q of the hierarchy j ⇒ In this particular case: criterion maximized by MFA ⇔ sum of criteria maximized by MCA 11/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Disjunctive data table associated with one hierarchy j L1 L1 1

Lqj

LQj

1

LQj

k

Kj

1

i

I

Lqj

01

yik

000001000

I1

Ik

IKj

I

Each level (associated with a hierarchy) is represented by a set of dummy variables 12/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Object representation Distance between 2 objects: X 1 X I X 2 d 2 (i, l ) = (yik − ylk )2 = dMCA (i, l ), j Qj Ik j

k∈Kj

j

with: Qj the number of level of hierarchy j I the number of objects Ik the number of objects into the group k yik the element of the disjunctive data table which is equal to 1 if the object i belong to group k and 0 in the opposite case In this particular case: sum of usual distance in MCA ⇒ 2 objects will be closer than they belong to the same group for a lot of hierarchies 13/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Global hierarchy representation Coordinate of hierarchy j on axis s: 1 X 2 η (zs , Lq ), Qj

1 H1

q∈Qj

with: H2

Lg ( z , H ) 2

Qj the number of level of hierarchy j

H3

2

zs the axis s

0 0

Lg ( z , H ) 1

2

1

Lq the level q of the hierarchy j

14/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Level representation Coordinate of level q on axis s: η 2 (zs , Lq ),

1

with: H3L3 H3L2

Lq the level q

H3

2 consequences:

H3L1 0 0

zs the axis s

1

Levels ordered along each axis Hierarchy = barycenter of its levels

15/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Outline

1

Introduction

2

Data coding

3

Statistical analysis

4

Application

5

Conclusion

16/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Data 16 advertisements concerning an orange juice Advertisements built according to a 25−1 fractional factorial design 22 subjects Hierarchical sorting

17/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Example of hierarchical sorting: subject number 3

K E

C M B H

N J G F

I

P

O L

D A 18/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Example of hierarchical sorting: subject number 5

D A N

B L

C M

E F

P

J K I

O H

G

19/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

6

Advertisement representation

4

F

K

0

2

B

-2

λ1 = 16.55 CM P I O L

N J

-4

Dim 2 (14.18 %)

A D

E H

G

-6

-4

-2

0

2

4

6

Dim 1 (15.62 %)

20/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

6

Advertisement representation

4

F

K

0

2

B

-2

λ1 = 16.55 CM P I O L

N J

-4

Dim 2 (14.18 %)

A D

E H

G

-6

-4

-2

0

2

4

6

Dim 1 (15.62 %)

Background color

20/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

6

Advertisement representation

4

F

2 0

λ1 = 16.55 -2

Dim 2 (14.18 %)

K

B

CM P I O L

N J

-4

Figurative

A D

E H

G

-6

-4

-2

0

2

4

6

Dim 1 (15.62 %)

Background color

20/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

1.0

Hierarchy representation 7

15

6

0.8

8 11 18 22

10

9 17 20

21 1 16 2

0.6

14

12 13

0.4

4

0.2

3

5 0.0

Dim 2 (14.18 %)

19

0.0

0.2

0.4

0.6

0.8

1.0

Dim 1 (15.62 %)

21/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

15 10

7 6

14 4

8 9 11 17 18 20 22

19 21 1 16 2

12

13

3

L2

0.0

0.2

L1

Dim 2 (14.18 %) 0.4 0.6 0.8

1.0

Hierarchy representation: subject number 3

5 0.0

0.2

0.4

0.6

0.8

1.0

6

Dim 1 (15.62 %)

Dim 2 (14.18 %) 0 2 4 -2

CM NJ

-4

L3

FG A D

E H K B

-6

-4

P I OL -2

0

2

4

6

Dim 1 (15.62 %)

22/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

1.0

Level representation: subject number 3

0.6 0.4 0.2

3

3.L2 3.L1

0.0

Dim 2 (14.18%)

0.8

3.L3

0.0

0.2

0.4

0.6

0.8

1.0

Dim 1 (15.62%)

23/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Level representation: trajectories

1.0

7.L3

0.8

6.L1 6.L2 7.L1 7.L2 10.L115.L1

7.L4

6.L3

6.L4

12.L4

13.L3

12.L3 14.L3

12.L2

0.6

1.L3 3.L3 8.L4 9.L4 10.L4 11.L4 16.L2 17.L3 18.L3 19.L3 20.L2 21.L2 22.L3

2.L2 8.L2 9.L2 10.L2 11.L2 15.L2 16.L3 17.L4 18.L4 19.L4 20.L3 21.L3 22.L4

4.L3

4.L2

0.4

14.L1

13.L2

1.L1 3.L1 8.L1 11.L1 16.L1 18.L1 20.L1 22.L1

0.2

4.L1 13.L1

5.L2 0.0

Dim 2 (14.18%)

14.L2

1.L2 2.L3 8.L3 9.L3 10.L3 11.L3 15.L3 17.L2 18.L2 19.L2 19.L5 20.L4 22.L2

5.L1 0.0

0.2

0.4

0.6

Dim 1 (15.62%)

0.8

2.L1 3.L2 9.L1 12.L1 17.L1 19.L1 21.L1

1.0

24/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Level representation: trajectories

1.0

7.L3

7.L4

6.L3

6.L4

18% 0.8

6.L1 6.L2 7.L1 7.L2 10.L115.L1

12.L4

13.L3

12.L3 14.L3 12.L2

0.6

1.L3 3.L3 8.L4 9.L4 10.L4 11.L4 16.L2 17.L3 18.L3 19.L3 20.L2 21.L2 22.L3

2.L2 8.L2 9.L2 10.L2 11.L2 15.L2 16.L3 17.L4 18.L4 19.L4 20.L3 21.L3 22.L4

4.L3 4.L2

63%

14.L1 0.4

18% 13.L2

1.L1 3.L1 8.L1 11.L1 16.L1 18.L1 20.L1 22.L1

0.2

4.L1 13.L1

5.L2 0.0

Dim 2 (14.18%)

14.L2

1.L2 2.L3 8.L3 9.L3 10.L3 11.L3 15.L3 17.L2 18.L2 19.L2 19.L5 20.L4 22.L2

5.L1 0.0

0.2

0.4

0.6

Dim 1 (15.62%)

0.8

2.L1 3.L2 9.L1 12.L1 17.L1 19.L1 21.L1

1.0

24/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Outline

1

Introduction

2

Data coding

3

Statistical analysis

4

Application

5

Conclusion

25/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

Conclusion

Methodology providing: Representation of objects, hierarchies, levels Representations related to each other Representations interpretable according to simple rules

In the example, suggests groups of hierarchies Allows the simultaneous taking into account of hierarchies and partitions in a same analysis Program available in the SensoMineR package

26/ 27

Introduction

Data coding

Statistical analysis

Application

Conclusion

References

References

Adams, E. I. (1972). Consensus techniques and the comparison of taxonomic trees. Systematic Zoology, 21:390–397. Escofier, B. and Pagès, J. (1982). Comparaison de groupes de variables définies sur le même ensemble d’individus. Rapport de recherche INRIA, 149.

27/ 27