Hierarchical clustering - FactoMineR .fr

Page 1 ..... of gravity. • Assign the points to the closest center. • Calculate anew the. Q centers of gravity q ..... Idea : rank the variables by decreasing |test statistic|.
2MB taille 0 téléchargements 366 vues
Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Hierarchical clustering François Husson Applied Mathematics Department - Rennes Agrocampus

[email protected]

1 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Hierarchical clustering 1

Introduction

2

Principles of hierarchical clustering

3

Example

4

K-means : a partitioning algorithm Extras

5

• • • • 6

Making more robust partitions Clustering in high dimensions Qualitative variables and clustering Combining with factor analysis - clustering

Describing classes of individuals 1 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Hierarchical clustering 1 Introduction 2 Principles of hierarchical clustering 3 Example 4 Partitioning algorithm : K-means 5 Extras 6 Characterizing classes of individuals

2 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Introduction • Definitions : • Clustering is : making or building classes • Class : set of individuals (or objects) with similar shared characteristics • Examples • of clustering : animal kingdom, computer hard disk, geographic division of France, etc. • of classes : social classes, political classes, etc. • Two types of clustering : • hierarchical : tree • partitioning methods

3 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Hierarchical example : the animal kingdom

4 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Hierarchical clustering 1 Introduction 2 Principles of hierarchical clustering 3 Example 4 Partitioning algorithm : K-means 5 Extras 6 Characterizing classes of individuals

5 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

What data ? What goals ?

3 2

H

F

G

E

B

D

in the population

C

• detects a “natural” number of classes

1

individuals or groups of individuals

0

• shows hierarchical links between

A

Goals : build a tree structure that :

4

Clustering is for data tables : rows of individuals, columns of quantitative variables

6 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Critères Measuring similarity of individuals : • Euclidean distance • similarity indices • etc.

Similarity between groups of individuals : • minimum jump or single linkage (smallest distance) • complete linkage (largest distance)

x x x x x

x xx xxx x x x

• Ward criterion

7 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Algorithm ABC DEFGH 4.07

6th grouping

ABC DE DE 4.72 FGH 4.07 1.81

{ABCDEFGH}

D E FG H

ABC D E FG 4.72 5.55 1.00 4.07 2.01 1.81 4.75 3.16 2.90 1.12

{ABC},{DEFGH}

{ABC},{DE},{FGH} {ABC},{DE},{FG},{H}

1

4th grouping

ABC DE FG DE 4.72 FG 4.23 1.81 H 4.07 2.90 1.12

3

5th grouping

2

4

7th grouping

G

{ABC},{D},{E},{FG},{H} {ABC},{D},{E},{F},{G},{H}

1.00 2.01 2.06 2.06 1.81 0.61 3.16 2.90 1.28 1.12

{AC},{B},{D},{E},{F},{G},{H}

AC 0.50 4.80 5.57 4.07 4.68 4.75

B 4.72 5.55 4.23 4.84 5.02

D

E

F

G

1.00 2.01 2.06 2.06 1.81 0.61 3.16 2.90 1.28 1.12

B C D E F G H

A 0.50 0.25 5.00 5.78 4.32 4.92 5.00

B 0.56 4.72 5.55 4.23 4.84 5.02

C

E

H

F D

G

E

A

2nd grouping

B D E F G H

B

F

D

E

C

D

0

D E F G H

3rd grouping

ABC 4.72 5.55 4.07 4.68 4.75

{A},{B},{C},{D},{E},{F},{G},{H} F

G

1st grouping 4.80 5.57 4.07 4.68 4.75

1.00 2.01 2.06 2.06 1.81 0.61 3.16 2.90 1.28 1.12

8 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found



0.0

0.5

Hierarchical Clustering

Trees always end up . . . cut through !

1.0

1.5

Trees and partitions Click to cut the tree

Choosing a height to cut at gives a partition

1.0

1.5

inertia gain

Casarsa Parkhomenko YURKOV Lorenzo NOOL BOURGUIGNON MARTINEAU Karlivans BARRAS Uldal HERNU Turi Karpov Clay Sebrle Schoenbeck Ojaniemi Barras Qi Smirnov Gomez Zsivoczky Macey Smith McMULLEN Bernard ZSIVOCZKY Hernu KARPOV SEBRLE Terek Pogorelov Korkizoglou CLAY BERNARD Nool Warners Drews WARNERS Schwarzl Averyanov

0.0

0.5



Remark : given how it was made, the partition is interesting but not optimal

9 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Partition quality When is a partition a good one ? • If individuals placed in the same class are close to each other • If individuals in different classes are far from each other

Mathematically speaking ? • small within-class variability • large between-class variability

=⇒ Two criteria. Which one to use ?

10 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Partition quality x¯k the mean of the xk , x¯qk the mean of the xk in class q K Q I X X X

(xiqk − x¯k )2 =

k=1 q=1 i=1

|

K Q I X X X

(xiqk − x¯qk )2 +

k=1 q=1 i=1

{z

total inertia

}

|

{z

within-class inertia

K Q I X X X

(¯ xqk − x¯k )2

k=1 q=1 i=1

}

|

{z

between-class inertia

}

x2 x1 x

x

x3

=⇒ 1 criterion only ! 11 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Partition quality Partition quality is measured by : 0≤

between-class inertia ≤1 total inertia

inertia between = 0 =⇒ ∀k, ∀q, x¯ = x¯ qk k inertia total by variable, classes have the same means Doesn’t allow us to classify inertia between = 1 =⇒ ∀k, ∀q, ∀i, x = x¯ iqk qk inertia total individuals in the same class are identical Ideal for classifying Warning : don’t just accept this criteria at face value : it depends on the number of individuals and classes 12 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Ward’s method • Initialize : 1 class = 1 individual =⇒ Between-class inertia =

total inertia • At each step : combine classes a and b that minimize the

decrease in between-class inertia ma mb Inertia(a) + Inertia(b) = Inertia(a ∪ b) − d 2 (a, b) ma + mb |

{z

to minimize

}

Group together objects with small weights and avoid chain effects Saut minimum

Group together classes with similar centers of gravity

Ward

0

2

4

6

8

−2

0

2

4

6

8

1 6 5 10 7 13 8 11 12 2 9 3 15 4 14 16 18 25 26 24 28 29 17 19 20 30 23 21 22 27

10

+ +++++ + + ++*+ *+* + * * *** *** *** *** xx x*x***** xxx*x*xx** x xx x 10

Saut minimum Saut minimum

Ward

Ward

Direct use for clustering

1 31 32 6 5 10 33 8 11 12 7 35 34 13 36 2 9 3 15 4 14 16 18 25 24 28 29 17 19 20 30 23 21 53 54 55 22 27 26 57 56 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

−2 0

2

4

6

8 10

−2

1 6 10 5 3 15 2 4 7 13 9 8 11 12 14 16 18 25 26 19 20 30 23 22 27 24 28 29 17 21

xx xx xxxxxx x xx x

1 31 32 6 10 33 7 35 34 13 36 5 37 38 39 40 41 42 43 44 45 46 47 48 49 26 57 56 50 51 52 53 54 55 18 25 3 15 22 27 19 20 30 23 2 4 24 28 29 21 9 8 11 12 14 16 17

−2 0

2

4

6

8 10

+ ++++ + ++ ++++ +

13 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Hierarchical clustering 1 Introduction 2 Principles of hierarchical clustering 3 Example 4 Partitioning algorithm : K-means 5 Extras 6 Characterizing classes of individuals

14 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Temperature data • 23 individuals : European capitals • 12 variables : mean monthly temperatures over 30 years Amsterdam Athens Berlin Brussels Budapest Copenhagen Dublin Elsinki Kiev Krakow Lisbon London Madrid Minsk Moscow Oslo Paris Prague Reykjavik Rome Sarajevo Sofia Stockholm

Jan 2.9 9.1 -0.2 3.3 -1.1 -0.4 4.8 -5.8 -5.9 -3.7 10.5 3.4 5.0 -6.9 -9.3 -4.3 3.7 -1.3 -0.3 7.1 -1.4 -1.7 -3.5

Feb 2.5 9.7 0.1 3.3 0.8 -0.4 5.0 -6.2 -5.0 -2.0 11.3 4.2 6.6 -6.2 -7.6 -3.8 3.7 0.2 0.1 8.2 0.8 0.2 -3.5

Mar 5.7 11.7 4.4 6.7 5.5 1.3 5.9 -2.7 -0.3 1.9 12.8 5.5 9.4 -1.9 -2.0 -0.6 7.3 3.6 0.8 10.5 4.9 4.3 -1.3

Apr 8.2 15.4 8.2 8.9 11.6 5.8 7.8 3.1 7.4 7.9 14.5 8.3 12.2 5.4 6.0 4.4 9.7 8.8 2.9 13.7 9.3 9.7 3.5

May 12.5 20.1 13.8 12.8 17.0 11.1 10.4 10.2 14.3 13.2 16.7 11.9 16.0 12.4 13.0 10.3 13.7 14.3 6.5 17.8 13.8 14.3 9.2

Jun 14.8 24.5 16.0 15.6 20.2 15.4 13.3 14.0 17.8 16.9 19.4 15.1 20.8 15.9 16.6 14.9 16.5 17.6 9.3 21.7 17.0 17.7 14.6

Jul 17.1 27.4 18.3 17.8 22.0 17.1 15.0 17.2 19.4 18.4 21.5 16.9 24.7 17.4 18.3 16.9 19.0 19.3 11.1 24.4 18.9 20.0 17.2

Aug 17.1 27.2 18.0 17.8 21.3 16.6 14.6 14.9 18.5 17.6 21.9 16.5 24.3 16.3 16.7 15.4 18.7 18.7 10.6 24.1 18.7 19.5 16.0

Sep 14.5 23.8 14.4 15.0 16.9 13.3 12.7 9.7 13.7 13.7 20.4 14.0 19.8 11.6 11.2 11.1 16.1 14.9 7.9 20.9 15.2 15.8 11.7

Oct 11.4 19.2 10.0 11.1 11.3 8.8 9.7 5.2 7.5 8.6 17.4 10.2 13.9 5.8 5.1 5.7 12.5 9.4 4.5 16.5 10.5 10.7 6.5

Nov 7.0 14.6 4.2 6.7 5.1 4.1 6.7 0.1 1.2 2.6 13.7 6.3 8.7 0.1 -1.1 0.5 7.3 3.8 1.7 11.7 5.1 5.0 1.7

Dec 4.4 11.0 1.2 4.4 0.7 1.3 5.4 -2.3 -3.6 -1.7 11.1 4.4 5.4 -4.2 -6.0 -2.9 5.2 0.3 0.2 8.3 0.8 0.6 -1.6

Area West South West West East North North North East East South North South East East North West East North South South East North

Which cities have similar weather patterns ? How to characterize groups of cities ? 15 / 42

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Temperature data : hierarchical tree Hierarchical clustering

0

1

5

2

3

4

5

6

6

7

Cluster Dendrogram

Sofia

Sarajevo

Berlin Prague

Kiev

Krakow Budapest

Brussels Copenhagen

Paris

London Amsterdam

Stockholm Dublin

Elsinki Oslo

Rome

Reykjavik

Madrid

Athens Lisbon

Moscow Minsk

1

2

3

4

inertia gain

0

Introduction

16 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

3 4 5 1 2

23 clusters to 22 clusters: 0.01 22 clusters to 21 clusters: 0.01 21 clusters to 20 clusters: 0.01 ………………………………………….… 9 clusters to 8 clusters: 0.15 8 clusters to 7 clusters: 0.16 7 clusters to 6 clusters: 0.27 6 clusters to 5 clusters: 0.29 5 clusters to 4 clusters: 0.60 4 clusters to 3 clusters: 0.76 3 clusters to 2 clusters: 2.36 2 clusters to 1 clusters: 6.76

0

Loss in between-inertia when going from

6

Temperature data

inertia gain

Important loss when going from 2 clusters to a unique cluster thus we prefer to keep 2 custers

Sum of losses of inertia = 12 17 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Using the tree to build a partition Should we make 2 groups ? 3 ? 4 ? Hierarchical clustering

Cut into 2 groups :

5

6

7

Cluster Dendrogram

Sofia

Berlin Prague

Sarajevo

Kiev

Krakow Budapest

Brussels Copenhagen

Paris

London Amsterdam

Stockholm Dublin

Elsinki Oslo

Rome

Madrid

Reykjavik

Moscow Minsk

1 0

Athens Lisbon

What can we compare this percentage with ?

2

3

4

between-class inertia 6.76 = = 56% total inertia 12

18 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Using the tree to build a partition

4

66 % of the information is contained in this 2-class cut What can we compare this percentage with ?

Kiev

Budapest

Minsk Krakow Sofia Oslo Prague Sarajevo Stockholm Berlin Paris Copenhagen Brussels London Amsterdam

0

Elsinki

-2

Dim 2 (15.40%)

2

Moscow

Reykjavik -5

Madrid

Athens Rome

Lisbon

Dublin 0 Dim 1 (82.90%)

5

19 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Using the tree to build a partition Hierarchical clustering

6

7

Cluster Dendrogram

4

5

Separate cold cities into 2 groups :

Sofia

Sarajevo

Berlin Prague

Kiev

Krakow Budapest

Brussels Copenhagen

Paris

London Amsterdam

Elsinki Oslo

Stockholm Dublin

Rome

Moscow Minsk

Madrid

Reykjavik

Athens Lisbon

0

1

2

3

between-class inertia 2.36 = = 20% total inertia 12

19 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Using the tree to build a partition

4

The move from 23 cities to 3 classes : 56 % + 20 % = 76 % of the variability in the data

Kiev

Budapest

Minsk Krakow

Prague Sarajevo Sofia Oslo Elsinki Berlin Stockholm Copenhagen Paris Brussels Amsterdam London

Athens

Madrid Rome

Lisbon

-2

Dim 2 (15.40%) 0 2

Moscow

Reykjavik

-5

Dublin

0 Dim 1 (82.90%)

5

20 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Determining the number of classes • Plot with the bars

• Starting from the tree

• Ultimate criterion :

• Depends on the use

interpretability of the classes

(survey, etc.) Hierarchical clustering

0

2

2

3

4

4

6

5

6

7

Cluster Dendrogram

Sofia

Berlin Prague

Sarajevo

Kiev

Krakow Budapest

Brussels Copenhagen

Paris

London Amsterdam

Elsinki Oslo

Stockholm Dublin

Rome

Moscow Minsk

Madrid

Reykjavik

Athens Lisbon

0

1

inertia gain

20 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Hierarchical clustering 1 Introduction 2 Principles of hierarchical clustering 3 Example 4 Partitioning algorithm : K-means 5 Extras 6 Characterizing classes of individuals

21 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Partitioning algorithm : K-means Algorithm for aggregating around moving centers (K-means) 2



Budapest

Moscow

Kiev Budapest



Moscow

Kiev Budapest



Minsk

Kiev

2

Moscow

Kiev



2

Moscow

2

• Choose

Budapest



Minsk



Minsk

Minsk





Paris Brussels Amsterdam London

Lisbon

0

6

8

0

Budapest





Paris Brussels ● Amsterdam ● London

Lisbon

0

Moscow

6

8

Paris ● Brussels ● Amsterdam ● London

Lisbon

8



1

Paris ● Brussels ● Amsterdam ● London

0

2

4

6

8

8

Athens Madrid



1



Sarajevo

Rome



Berlin ●

Copenhagen ●



Paris ● Brussels ● Amsterdam ● London

Lisbon

Lisbon

−2



Dublin

Dublin





Reykjavik −3

Reykjavik

Dim 1 ( 82.9 %)

6

Budapest

Krakow Sofia ● Prague

Stockholm



−2 −2

4







−3 −4

2

Kiev

Elsinki Oslo

Rome



Copenhagen



−3 6

0

Minsk

Lisbon

−2

−2 −3

4

−2

Moscow

Athens Madrid







2

−4

Dim 1 ( 82.9 %)

Sarajevo

Dublin Reykjavik

0

8

Berlin

0

1





Dim 1 ( 82.9 %)

6

Budapest

Krakow Sofia ● Prague

Stockholm

−1

Dim 2 ( 15.4 %)

−1







4



Copenhagen

Dublin

−2

2

Kiev

Elsinki Oslo

Rome



0



Paris ● Brussels ● Amsterdam ● London

Reykjavik

0

Minsk

Madrid





Sarajevo Berlin

Stockholm

−1

Dim 2 ( 15.4 %)

1 0





0 −2 −3

Moscow

Athens

Krakow Sofia ● Prague Elsinki Oslo

Rome



Copenhagen

−2



Minsk Athens Madrid





Sarajevo Berlin



Dim 1 ( 82.9 %)

Budapest



Krakow Sofia ● Prague

Stockholm

−4

−4

Kiev

2

2

Budapest

Minsk

Dim 2 ( 15.4 %)

4

Dim 1 ( 82.9 %)

Kiev ●

2

Lisbon



Dublin Reykjavik

2

Moscow

Elsinki Oslo

Paris ● Brussels ● Amsterdam ● London

Lisbon

−2 −2





−3 −4

Rome



Copenhagen

2

8

Athens Madrid





1

6

8

Budapest





Sarajevo

Dublin Reykjavik

−3 4

6

Berlin



−2

−2 −3

2



Krakow Sofia ● Prague

Stockholm

−1

1

Copenhagen



4







0

2

Kiev

Elsinki Oslo

Rome



Dublin

Dim 1 ( 82.9 %)

0

Minsk Athens Madrid



Dim 2 ( 15.4 %)

Paris Brussels ● Amsterdam ● London

Lisbon

Reykjavik

−2

1

Budapest





Sarajevo

0

1 0



Dublin

−2

Dim 1 ( 82.9 %)

Berlin

Stockholm

−1

Paris Brussels Amsterdam London

−4

Moscow

Krakow Sofia ● Prague Elsinki Oslo

Rome



Copenhagen

Reykjavik

8



Athens Madrid





Dim 2 ( 15.4 %)



6

Minsk





Sarajevo Berlin

Stockholm

−1

Copenhagen

4

Kiev

2



2

Dim 1 ( 82.9 %)



Dim 2 ( 15.4 %)

1 0

−2

Moscow

Krakow Sofia ● Prague Elsinki Oslo

Rome





−4

0 −2 −3

−4

Minsk Athens Madrid

● Sarajevo Berlin

−1

Dim 2 ( 15.4 %)

4

Kiev

2

2

2

Dim 1 ( 82.9 %)

2

−2



• Calculate

anew the Q centers of gravity

1 −3

−4

0

8

−1

6

Budapest





Lisbon

Dublin Reykjavik

Dim 2 ( 15.4 %)

4

Moscow

Krakow Sofia ● Prague

Stockholm

Paris Brussels Amsterdam London

Dublin

−3 2

Kiev

Elsinki Oslo

Rome



Lisbon

−2

−2

−2

0

Minsk

points to the closest center

Athens Madrid









Copenhagen

Reykjavik

Dim 1 ( 82.9 %)

• Assign the

Lisbon

Reykjavik

−2

Moscow



Dublin





−4

Copenhagen Paris Brussels Amsterdam London



Sarajevo Berlin

Stockholm





Dublin Reykjavik



−1

Copenhagen

Krakow Sofia ● Prague Elsinki Oslo

Rome

Berlin

Stockholm 0

1







Dim 2 ( 15.4 %)

Paris ● Brussels ● Amsterdam ● London



Madrid





● Sarajevo

−1



Berlin

Stockholm

Athens

Krakow Sofia ● Prague Elsinki Oslo

Rome

Dim 2 ( 15.4 %)

Copenhagen





0

● ●

Madrid





Sarajevo

−1

Dim 2 ( 15.4 %)

1 0



Athens

Krakow Sofia ● Prague Elsinki Oslo

Rome

Berlin



−1

Dim 2 ( 15.4 %)





Stockholm



Madrid





Sarajevo



−3

randomly Q centers of gravity

Athens

Krakow Sofia ● Prague Elsinki ● Oslo

−4

−2

0

2

Dim 1 ( 82.9 %)

4

6

8

−4

−2

0

2

4

6

8

Dim 1 ( 82.9 %)

22 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Hierarchical clustering 1 Introduction 2 Principles of hierarchical clustering 3 Example 4 Partitioning algorithm : K-means 5 Extras 6 Characterizing classes of individuals

23 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Robustifying a partition obtained using hierarchical clustering The partition obtained by hierarchical clustering is not optimal and can be improved or made robust using K-means Algorithm : • use the obtained hierarchical partition to initialize K-means • run a few iterations of K-means

=⇒ potentially improved partition Advantage : more robust partition Disadvantage : loss of hierarchical structure

24 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Hierarchical clustering in high dimension • If many variables : do PCA and keep only first axes =⇒ takes

us to classical case • If many individuals, hierarchical algorithm is too long • Use K-means to partition into around 100 classes • Build tree using these classes (weighted by the number of individuals in each class) • Gives us the “top” of the tree

25 / 42

0.00

0.02

0.04

0.04

0.06

0.06

0.08

0.08

Example

Cluster Dendrogram

Tree from original data 1 14 33 32 2 40 42 6 35 36 38 26 47 11 25 43 5 16 30 39 19 49 10 17 3 27 23 13 20 8 22 21 50 18 7 41 29 31 34 15 37 24 4 45 28 46 48 9 44 12

0.02

Principles of hierarchical clustering

0.00

158 124 146 118 123 117 169 139 140 285 115 155 176 70 116 11 90 125 130 143 129 218 298 114 174 106 107 73 147 282 292 105 150 137 94 149 82 23 161 83 197 47 194 142 181 290 1 263 88 46 75 26 294 86 54 205 6 4 223 217 198 191 203 209 262 180 189 186 102 226 188 187 200 38 167 154 40 163 37 63 152 166 213 228 296 20 231 164 275 29 84 291 260 109 27 7 30 178 45 214 119 111 5 77 170 71 121 12 134 55 193 237 219 76 97 258 41 144 141 177 36 184 254 230 145 250 175 43 247 284 91 216 85 157 104 179 131 293 126 245 93 16 288 206 196 24 81 113 153 15 159 9 128 151 13 19 49 42 300 156 183 2 232 162 34 242 96 132 8 14 257 256 227 222 211 212 249 165 199 195 182 202 220 190 53 100 208 74 95 31 168 248 299 171 287 272 297 229 261 62 66 22 44 127 148 204 273 274 135 136 239 103 98 80 286 289 210 215 221 28 72 236 69 138 281 78 89 270 61 99 267 233 57 18 279 244 234 60 246 225 50 10 173 17 35 278 269 240 224 120 52 271 277 68 108 101 264 133 268 87 259 265 122 251 252 59 295 207 283 67 192 276 92 65 110 25 238 79 266 255 39 160 64 3 201 253 48 112 243 241 56 32 51 172 58 21 33 185 280 235

Introduction K-means Extras Describing the classes found

Hierarchical clustering in high dimension

• If many variables : do PCA and keep only first axes =⇒ takes

us to classical case

• If many individuals, hierarchical algorithm is too long • Use K-means to partition into around 100 classes • Build tree using these classes (weighted by the number of individuals in each class) Hierarchical clustering Hierarchical Clustering • Gives us the “top” of the tree Hierarchical Classification

Tree using classes 25 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Hierarchical clustering on qualitative data

Two strategies :

• Transform them to quantitative data • Do MCA and keep only the first dimensions • Do hierarchical clustering using the principal axes of the MCA • Use measures/indices suitable for qualitative variables :

similarity indices, Jaccard index, etc.

26 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Doing factor analysis followed by clustering • Qualitative data : MCA outputs quantitative principal

components • Factor analysis eliminates the last components, which are just

noise =⇒ more stable clustering x.1

x.k

x.K

F1

FQ

FK

PCA Data

Structure

Noise

27 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Doing factor analysis followed by clustering • Representation of the tree and classes on two factor axes

=⇒ FA gives continuous information, the tree gives discontinuous information. The tree hints at information hidden in further axes Hierarchical clustering on the factor map

2

3

height

4

5

6

7

cluster 1 cluster 2 cluster 3

1

Moscow Kiev Minsk Krakow Prague Sofia Oslo Sarajevo Berlin Stockholm Paris Copenhagen Brussels Reykjavik Dublin LondonAmsterdam

Budapest

0

Elsinki

-6

-4

-2

0

2

Athens Madrid Rome Lisbon

4

6

-3

-2

-1

0

1

2

3

8

Dim 1 (82.9%)

28 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Hierarchical clustering 1 Introduction 2 Principles of hierarchical clustering 3 Example 4 Partitioning algorithm : K-means 5 Extras 6 Characterizing classes of individuals

29 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

The class make-up : using “model individuals” Model individuals : the ones closest to each class center Oslo 0.339

Cluster 2:

Berlin 0.5764

Cluster 3:

Rome 0.360 4

Cluster 1:

Helsinki 0.884

Stockholm 0.9224

Minsk 0.9654

Sarajevo 0.7164

Brussels 1.038

Prague 1.0556

Lisbon 1.737

Madrid 1.835

Athens 2.167

Kiev

2

Budapest

Minsk

cluster 1

Krakow

Prague Sofia

Sarajevo Oslo Elsinki Berlin Stockholm Copenhagen cluster 2 Paris Brussels Amsterdam London

Athens

Madrid Rome

cluster 3

Lisbon

-2

0

Amsterdam 1.124

cluster 1 cluster 2 cluster 3 Moscow

Dim 2 (15.40%)

Moscow 1.7664

Reykjavik

-5

Dublin

0 Dim 1 (82.90%)

5

30 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Characterizing/describing classes

• Goals : • Find the variables which are most important for the partition • Characterize a class (or group of individuals) in terms of quantitative variables • Sort the variables that best describe the classes

• Questions : • Which variables best characterize the partition • How can we characterize individuals in the 1st class ? • Which variables describe them best ?

31 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Characterizing/describing classes Which variables best represent the partition ? • For each quantitative variable : • build an analysis of variance model between the quantitative

variable and the class variable • do a Fisher test to detect class effect

• Sort the variables by increasing p-value October March November September April February December January August July May June

Eta2 0.8990 0.8865 0.8707 0.8560 0.8353 0.8246 0.7730 0.7477 0.7160 0.6309 0.5860 0.5753

P-value 1.108e-10 3.556e-10 1.301e-09 3.842e-09 1.466e-08 2.754e-08 3.631e-07 1.047e-06 3.415e-06 4.690e-05 1.479e-04 1.911e-04 32 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Characterizing classes using quantitative variables January



February

● ● ●



● ●●

● ● ●●

●● ●● ● ● ●



●● ●●

●● ●● ●●

● ●●● ●















Elsinki ● ●● ● ●● ●● ● ● ●●● ●●● ● ● ● ● ● ● Kiev Minsk ●●● ● ●●● ●● ●● ●● ●●● ● ●● ● ● ● Moscow Oslo Reykjavik ● ● ● ● ● ● ●●● ●●●● ●● ●● ● ●● ● ● Stockholm Amsterdam Berlin ● ● ● ●● ●●●●● ●● ●● ●● ● ● ● ●● ● Brussels Budapest ● ● ●● ●●●● ●● ●● ● ● ●● Copenhagen Dublin Krakow ● ●●● ●●● ● ●●●●●●● ● ●● London Paris ● ● ● ●● ● ● ●●●●● ●●● ●● ● ● ●● Prague Sarajevo Sofia ● ●● ● ● ● ● ●● ●●●●●●●● ● ● ● ● ● ● Athens Lisbon Madrid ● ●● ●● ● ●● ●● ● ●●●● ● ● ● ● Rome

March April May June July August September October November December



−10

●● ●●● ●

● ●● ●● ●●

0

● ●●





●●

●●







● ●

10

20 33 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Characterizing classes using quantitative variables 1st idea : if the values of X for class q seem to be randomly drawn from all the values of X , then X doesn’t characterize class q. Random





● ●

● ●●

●● ●● ● ● ●

●● ●●

●●







January





● ●

● ●●

●● ●● ● ● ●

●● ●●

●●







−10

−5

0

5

10

Temperature

2nd idea : the more a random draw appears unlikely, the more X characterizes class q.

34 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Characterizing classes using quantitative variables Idea : use as reference a random draw of nq values from N ¯q ?) What values can x¯q take ? (i.e., what is the distribution of X ¯q ) = x¯ E(X ¯q ) = N L(X

2 ¯q ) = s V(X nq



N − nq N −1



¯q is a mean because X

=⇒ Test statistic = r

x¯q − x¯ s2 nq



N−nq N−1

 ∼ N (0, 1)

• If |test statistic| ≥ 1.96 then X characterizes class q • and the more the test statistic is large, the better X

characterizes class q. Idea : rank the variables by decreasing |test statistic| 35 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Characterizing classes using quantitative variables

$quanti$‘1‘ v.test July June August May September January December November April February October March

-1.99 -2.06 -2.48 -2.55 -3.14 -3.26 -3.27 -3.36 -3.39 -3.44 -3.45 -3.68

Mean in category 16.80 14.70 15.50 10.80 11.00 -5.14 -2.91 0.60 4.67 -4.60 5.76 -1.14

Overall mean 18.90 16.80 18.30 13.30 14.70 0.17 1.84 5.08 8.38 0.96 10.10 4.06

sd in category 2.450 2.520 2.260 2.430 1.670 2.630 1.830 0.940 1.550 2.340 0.919 1.100

Overall sd 3.33 3.07 3.53 2.96 3.68 5.07 4.52 4.14 3.40 5.01 3.87 4.39

p.value 0.046100 0.039600 0.013100 0.010800 0.001710 0.001130 0.001080 0.000781 0.000706 0.000577 0.000553 0.000238

36 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Characterizing classes using quantitative variables $‘2‘ NULL $‘3‘ v.test September October August November July April March February June December January May

3.81 3.72 3.71 3.69 3.60 3.53 3.45 3.43 3.39 3.39 3.29 3.18

Mean in category 21.20 16.80 24.40 12.20 24.50 14.00 11.10 8.95 21.60 8.95 7.92 17.60

Overall mean 14.70 10.10 18.30 5.08 18.90 8.38 4.06 0.96 16.80 1.84 0.17 13.30

sd in category 1.54 1.91 1.88 2.26 2.09 1.18 1.27 1.74 1.86 2.34 2.08 1.55

Overall sd 3.68 3.87 3.53 4.14 3.33 3.40 4.39 5.01 3.07 4.52 5.07 2.96

p.value 0.000140 0.000201 0.000211 0.000222 0.000314 0.000413 0.000564 0.000593 0.000700 0.000706 0.000993 0.001460

37 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Characterizing classes using qualitative variables Which variables best characterize the partition ?

• For each qualitative variable, do a χ2 test between it and the

class variable • Sort the variables by increasing p-value

$test.chi2 Area

p.value df 0.001195843 6

38 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Characterizing classes using qualitative variables Does the South category characterize the 3rd class ? South Not south Total

Cluster 3 nmc = 4 0 nc = 4

Other cluster 1 18 19

Total nm = 5 18 n = 23

Test : H0 : nnmcc = nnm versus H1 : m abnormally overrepresented in c PH0 (Nmc ≥ nmc ) Under H0 : L(Nmc ) = H(nc , nnm , n) Cluster 3 Area=South

Cla/Mod 80

Mod/Cla 100

Global p.value v.test 21.74 0.000564 3.448

4 4 5 ×100 = 80 ; ×100 = 100 ; ×100 = 21.74 ; PH(4, 5 ,23) [Nmc ≥ 4] = 0.000564 23 5 4 23

=⇒ H0 rejected, South is overrepresented in the 3rd class Sort the categories in terms of p-values 39 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Characterizing classes using factor axes These are also quantitative variables $‘1‘ v.test Dim.1

-3.32

Mean in category -3.37

Overall mean 0

sd in category 0.85

Overall sd 3.15

Mean in category -0.18

Overall mean 0

sd in category 0.22

Overall sd 0.36

Mean in category 5.66

Overall mean 0

sd in category 1.26

Overall sd 3.15

p.value 0.000908

$‘2‘ v.test Dim.3

-2.41

p.value 0.015776

$‘3‘ v.test Dim.1

3.86

p.value 0.000112

40 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

Conclusions • Clustering can be done on tables of individuals vs quantitative

variables ⇒ MCA transforms qualitative variables into quantitative ones • hierarchical clustering gives a hierarchical tree ⇒ number of

classes • K-means can be used to make classes more robust • Characterize classes by active and supplementary variables,

quantitative or qualitative

41 / 42

Introduction

Principles of hierarchical clustering

Example

K-means

Extras

Describing the classes found

More

nalysis Series

Husson Lê Pagès

Exploratory Multivariate Analysis by Example Using R Second Edition

31116

Chapman & Hall/CRC Computer Science & Data Analysis Series

Exploratory Multivariate Analysis by Example Using R SECOND EDITION

Husson F., Lê S. & Pagès J. (2017) Exploratory Multivariate Analysis by Example Using R 2nd edition, 230 p., CRC/Press. François Husson • Sébastien Lê Jérôme Pagès

8-1-138-19634-6

90000

1 96346

The FactoMineR package for performing clustering : http://factominer.free.fr/index_fr.html Movies on Youtube : • a Youtube channel: youtube.com/HussonFrancois • a playlist with 11 movies in English • a playlist with 17 movies in French 42 / 42