Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Hierarchical clustering François Husson Applied Mathematics Department - Rennes Agrocampus
[email protected]
1 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Hierarchical clustering 1
Introduction
2
Principles of hierarchical clustering
3
Example
4
K-means : a partitioning algorithm Extras
5
• • • • 6
Making more robust partitions Clustering in high dimensions Qualitative variables and clustering Combining with factor analysis - clustering
Describing classes of individuals 1 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Hierarchical clustering 1 Introduction 2 Principles of hierarchical clustering 3 Example 4 Partitioning algorithm : K-means 5 Extras 6 Characterizing classes of individuals
2 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Introduction • Definitions : • Clustering is : making or building classes • Class : set of individuals (or objects) with similar shared characteristics • Examples • of clustering : animal kingdom, computer hard disk, geographic division of France, etc. • of classes : social classes, political classes, etc. • Two types of clustering : • hierarchical : tree • partitioning methods
3 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Hierarchical example : the animal kingdom
4 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Hierarchical clustering 1 Introduction 2 Principles of hierarchical clustering 3 Example 4 Partitioning algorithm : K-means 5 Extras 6 Characterizing classes of individuals
5 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
What data ? What goals ?
3 2
H
F
G
E
B
D
in the population
C
• detects a “natural” number of classes
1
individuals or groups of individuals
0
• shows hierarchical links between
A
Goals : build a tree structure that :
4
Clustering is for data tables : rows of individuals, columns of quantitative variables
6 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Critères Measuring similarity of individuals : • Euclidean distance • similarity indices • etc.
Similarity between groups of individuals : • minimum jump or single linkage (smallest distance) • complete linkage (largest distance)
x x x x x
x xx xxx x x x
• Ward criterion
7 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Algorithm ABC DEFGH 4.07
6th grouping
ABC DE DE 4.72 FGH 4.07 1.81
{ABCDEFGH}
D E FG H
ABC D E FG 4.72 5.55 1.00 4.07 2.01 1.81 4.75 3.16 2.90 1.12
{ABC},{DEFGH}
{ABC},{DE},{FGH} {ABC},{DE},{FG},{H}
1
4th grouping
ABC DE FG DE 4.72 FG 4.23 1.81 H 4.07 2.90 1.12
3
5th grouping
2
4
7th grouping
G
{ABC},{D},{E},{FG},{H} {ABC},{D},{E},{F},{G},{H}
1.00 2.01 2.06 2.06 1.81 0.61 3.16 2.90 1.28 1.12
{AC},{B},{D},{E},{F},{G},{H}
AC 0.50 4.80 5.57 4.07 4.68 4.75
B 4.72 5.55 4.23 4.84 5.02
D
E
F
G
1.00 2.01 2.06 2.06 1.81 0.61 3.16 2.90 1.28 1.12
B C D E F G H
A 0.50 0.25 5.00 5.78 4.32 4.92 5.00
B 0.56 4.72 5.55 4.23 4.84 5.02
C
E
H
F D
G
E
A
2nd grouping
B D E F G H
B
F
D
E
C
D
0
D E F G H
3rd grouping
ABC 4.72 5.55 4.07 4.68 4.75
{A},{B},{C},{D},{E},{F},{G},{H} F
G
1st grouping 4.80 5.57 4.07 4.68 4.75
1.00 2.01 2.06 2.06 1.81 0.61 3.16 2.90 1.28 1.12
8 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
●
0.0
0.5
Hierarchical Clustering
Trees always end up . . . cut through !
1.0
1.5
Trees and partitions Click to cut the tree
Choosing a height to cut at gives a partition
1.0
1.5
inertia gain
Casarsa Parkhomenko YURKOV Lorenzo NOOL BOURGUIGNON MARTINEAU Karlivans BARRAS Uldal HERNU Turi Karpov Clay Sebrle Schoenbeck Ojaniemi Barras Qi Smirnov Gomez Zsivoczky Macey Smith McMULLEN Bernard ZSIVOCZKY Hernu KARPOV SEBRLE Terek Pogorelov Korkizoglou CLAY BERNARD Nool Warners Drews WARNERS Schwarzl Averyanov
0.0
0.5
●
Remark : given how it was made, the partition is interesting but not optimal
9 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Partition quality When is a partition a good one ? • If individuals placed in the same class are close to each other • If individuals in different classes are far from each other
Mathematically speaking ? • small within-class variability • large between-class variability
=⇒ Two criteria. Which one to use ?
10 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Partition quality x¯k the mean of the xk , x¯qk the mean of the xk in class q K Q I X X X
(xiqk − x¯k )2 =
k=1 q=1 i=1
|
K Q I X X X
(xiqk − x¯qk )2 +
k=1 q=1 i=1
{z
total inertia
}
|
{z
within-class inertia
K Q I X X X
(¯ xqk − x¯k )2
k=1 q=1 i=1
}
|
{z
between-class inertia
}
x2 x1 x
x
x3
=⇒ 1 criterion only ! 11 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Partition quality Partition quality is measured by : 0≤
between-class inertia ≤1 total inertia
inertia between = 0 =⇒ ∀k, ∀q, x¯ = x¯ qk k inertia total by variable, classes have the same means Doesn’t allow us to classify inertia between = 1 =⇒ ∀k, ∀q, ∀i, x = x¯ iqk qk inertia total individuals in the same class are identical Ideal for classifying Warning : don’t just accept this criteria at face value : it depends on the number of individuals and classes 12 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Ward’s method • Initialize : 1 class = 1 individual =⇒ Between-class inertia =
total inertia • At each step : combine classes a and b that minimize the
decrease in between-class inertia ma mb Inertia(a) + Inertia(b) = Inertia(a ∪ b) − d 2 (a, b) ma + mb |
{z
to minimize
}
Group together objects with small weights and avoid chain effects Saut minimum
Group together classes with similar centers of gravity
Ward
0
2
4
6
8
−2
0
2
4
6
8
1 6 5 10 7 13 8 11 12 2 9 3 15 4 14 16 18 25 26 24 28 29 17 19 20 30 23 21 22 27
10
+ +++++ + + ++*+ *+* + * * *** *** *** *** xx x*x***** xxx*x*xx** x xx x 10
Saut minimum Saut minimum
Ward
Ward
Direct use for clustering
1 31 32 6 5 10 33 8 11 12 7 35 34 13 36 2 9 3 15 4 14 16 18 25 24 28 29 17 19 20 30 23 21 53 54 55 22 27 26 57 56 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
−2 0
2
4
6
8 10
−2
1 6 10 5 3 15 2 4 7 13 9 8 11 12 14 16 18 25 26 19 20 30 23 22 27 24 28 29 17 21
xx xx xxxxxx x xx x
1 31 32 6 10 33 7 35 34 13 36 5 37 38 39 40 41 42 43 44 45 46 47 48 49 26 57 56 50 51 52 53 54 55 18 25 3 15 22 27 19 20 30 23 2 4 24 28 29 21 9 8 11 12 14 16 17
−2 0
2
4
6
8 10
+ ++++ + ++ ++++ +
13 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Hierarchical clustering 1 Introduction 2 Principles of hierarchical clustering 3 Example 4 Partitioning algorithm : K-means 5 Extras 6 Characterizing classes of individuals
14 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Temperature data • 23 individuals : European capitals • 12 variables : mean monthly temperatures over 30 years Amsterdam Athens Berlin Brussels Budapest Copenhagen Dublin Elsinki Kiev Krakow Lisbon London Madrid Minsk Moscow Oslo Paris Prague Reykjavik Rome Sarajevo Sofia Stockholm
Jan 2.9 9.1 -0.2 3.3 -1.1 -0.4 4.8 -5.8 -5.9 -3.7 10.5 3.4 5.0 -6.9 -9.3 -4.3 3.7 -1.3 -0.3 7.1 -1.4 -1.7 -3.5
Feb 2.5 9.7 0.1 3.3 0.8 -0.4 5.0 -6.2 -5.0 -2.0 11.3 4.2 6.6 -6.2 -7.6 -3.8 3.7 0.2 0.1 8.2 0.8 0.2 -3.5
Mar 5.7 11.7 4.4 6.7 5.5 1.3 5.9 -2.7 -0.3 1.9 12.8 5.5 9.4 -1.9 -2.0 -0.6 7.3 3.6 0.8 10.5 4.9 4.3 -1.3
Apr 8.2 15.4 8.2 8.9 11.6 5.8 7.8 3.1 7.4 7.9 14.5 8.3 12.2 5.4 6.0 4.4 9.7 8.8 2.9 13.7 9.3 9.7 3.5
May 12.5 20.1 13.8 12.8 17.0 11.1 10.4 10.2 14.3 13.2 16.7 11.9 16.0 12.4 13.0 10.3 13.7 14.3 6.5 17.8 13.8 14.3 9.2
Jun 14.8 24.5 16.0 15.6 20.2 15.4 13.3 14.0 17.8 16.9 19.4 15.1 20.8 15.9 16.6 14.9 16.5 17.6 9.3 21.7 17.0 17.7 14.6
Jul 17.1 27.4 18.3 17.8 22.0 17.1 15.0 17.2 19.4 18.4 21.5 16.9 24.7 17.4 18.3 16.9 19.0 19.3 11.1 24.4 18.9 20.0 17.2
Aug 17.1 27.2 18.0 17.8 21.3 16.6 14.6 14.9 18.5 17.6 21.9 16.5 24.3 16.3 16.7 15.4 18.7 18.7 10.6 24.1 18.7 19.5 16.0
Sep 14.5 23.8 14.4 15.0 16.9 13.3 12.7 9.7 13.7 13.7 20.4 14.0 19.8 11.6 11.2 11.1 16.1 14.9 7.9 20.9 15.2 15.8 11.7
Oct 11.4 19.2 10.0 11.1 11.3 8.8 9.7 5.2 7.5 8.6 17.4 10.2 13.9 5.8 5.1 5.7 12.5 9.4 4.5 16.5 10.5 10.7 6.5
Nov 7.0 14.6 4.2 6.7 5.1 4.1 6.7 0.1 1.2 2.6 13.7 6.3 8.7 0.1 -1.1 0.5 7.3 3.8 1.7 11.7 5.1 5.0 1.7
Dec 4.4 11.0 1.2 4.4 0.7 1.3 5.4 -2.3 -3.6 -1.7 11.1 4.4 5.4 -4.2 -6.0 -2.9 5.2 0.3 0.2 8.3 0.8 0.6 -1.6
Area West South West West East North North North East East South North South East East North West East North South South East North
Which cities have similar weather patterns ? How to characterize groups of cities ? 15 / 42
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Temperature data : hierarchical tree Hierarchical clustering
0
1
5
2
3
4
5
6
6
7
Cluster Dendrogram
Sofia
Sarajevo
Berlin Prague
Kiev
Krakow Budapest
Brussels Copenhagen
Paris
London Amsterdam
Stockholm Dublin
Elsinki Oslo
Rome
Reykjavik
Madrid
Athens Lisbon
Moscow Minsk
1
2
3
4
inertia gain
0
Introduction
16 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
3 4 5 1 2
23 clusters to 22 clusters: 0.01 22 clusters to 21 clusters: 0.01 21 clusters to 20 clusters: 0.01 ………………………………………….… 9 clusters to 8 clusters: 0.15 8 clusters to 7 clusters: 0.16 7 clusters to 6 clusters: 0.27 6 clusters to 5 clusters: 0.29 5 clusters to 4 clusters: 0.60 4 clusters to 3 clusters: 0.76 3 clusters to 2 clusters: 2.36 2 clusters to 1 clusters: 6.76
0
Loss in between-inertia when going from
6
Temperature data
inertia gain
Important loss when going from 2 clusters to a unique cluster thus we prefer to keep 2 custers
Sum of losses of inertia = 12 17 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Using the tree to build a partition Should we make 2 groups ? 3 ? 4 ? Hierarchical clustering
Cut into 2 groups :
5
6
7
Cluster Dendrogram
Sofia
Berlin Prague
Sarajevo
Kiev
Krakow Budapest
Brussels Copenhagen
Paris
London Amsterdam
Stockholm Dublin
Elsinki Oslo
Rome
Madrid
Reykjavik
Moscow Minsk
1 0
Athens Lisbon
What can we compare this percentage with ?
2
3
4
between-class inertia 6.76 = = 56% total inertia 12
18 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Using the tree to build a partition
4
66 % of the information is contained in this 2-class cut What can we compare this percentage with ?
Kiev
Budapest
Minsk Krakow Sofia Oslo Prague Sarajevo Stockholm Berlin Paris Copenhagen Brussels London Amsterdam
0
Elsinki
-2
Dim 2 (15.40%)
2
Moscow
Reykjavik -5
Madrid
Athens Rome
Lisbon
Dublin 0 Dim 1 (82.90%)
5
19 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Using the tree to build a partition Hierarchical clustering
6
7
Cluster Dendrogram
4
5
Separate cold cities into 2 groups :
Sofia
Sarajevo
Berlin Prague
Kiev
Krakow Budapest
Brussels Copenhagen
Paris
London Amsterdam
Elsinki Oslo
Stockholm Dublin
Rome
Moscow Minsk
Madrid
Reykjavik
Athens Lisbon
0
1
2
3
between-class inertia 2.36 = = 20% total inertia 12
19 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Using the tree to build a partition
4
The move from 23 cities to 3 classes : 56 % + 20 % = 76 % of the variability in the data
Kiev
Budapest
Minsk Krakow
Prague Sarajevo Sofia Oslo Elsinki Berlin Stockholm Copenhagen Paris Brussels Amsterdam London
Athens
Madrid Rome
Lisbon
-2
Dim 2 (15.40%) 0 2
Moscow
Reykjavik
-5
Dublin
0 Dim 1 (82.90%)
5
20 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Determining the number of classes • Plot with the bars
• Starting from the tree
• Ultimate criterion :
• Depends on the use
interpretability of the classes
(survey, etc.) Hierarchical clustering
0
2
2
3
4
4
6
5
6
7
Cluster Dendrogram
Sofia
Berlin Prague
Sarajevo
Kiev
Krakow Budapest
Brussels Copenhagen
Paris
London Amsterdam
Elsinki Oslo
Stockholm Dublin
Rome
Moscow Minsk
Madrid
Reykjavik
Athens Lisbon
0
1
inertia gain
20 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Hierarchical clustering 1 Introduction 2 Principles of hierarchical clustering 3 Example 4 Partitioning algorithm : K-means 5 Extras 6 Characterizing classes of individuals
21 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Partitioning algorithm : K-means Algorithm for aggregating around moving centers (K-means) 2
●
Budapest
Moscow
Kiev Budapest
●
Moscow
Kiev Budapest
●
Minsk
Kiev
2
Moscow
Kiev
●
2
Moscow
2
• Choose
Budapest
●
Minsk
●
Minsk
Minsk
●
●
Paris Brussels Amsterdam London
Lisbon
0
6
8
0
Budapest
●
●
Paris Brussels ● Amsterdam ● London
Lisbon
0
Moscow
6
8
Paris ● Brussels ● Amsterdam ● London
Lisbon
8
●
1
Paris ● Brussels ● Amsterdam ● London
0
2
4
6
8
8
Athens Madrid
●
1
●
Sarajevo
Rome
●
Berlin ●
Copenhagen ●
●
Paris ● Brussels ● Amsterdam ● London
Lisbon
Lisbon
−2
●
Dublin
Dublin
●
●
Reykjavik −3
Reykjavik
Dim 1 ( 82.9 %)
6
Budapest
Krakow Sofia ● Prague
Stockholm
●
−2 −2
4
●
●
●
−3 −4
2
Kiev
Elsinki Oslo
Rome
●
Copenhagen
●
−3 6
0
Minsk
Lisbon
−2
−2 −3
4
−2
Moscow
Athens Madrid
●
●
●
2
−4
Dim 1 ( 82.9 %)
Sarajevo
Dublin Reykjavik
0
8
Berlin
0
1
●
●
Dim 1 ( 82.9 %)
6
Budapest
Krakow Sofia ● Prague
Stockholm
−1
Dim 2 ( 15.4 %)
−1
●
●
●
4
●
Copenhagen
Dublin
−2
2
Kiev
Elsinki Oslo
Rome
●
0
●
Paris ● Brussels ● Amsterdam ● London
Reykjavik
0
Minsk
Madrid
●
●
Sarajevo Berlin
Stockholm
−1
Dim 2 ( 15.4 %)
1 0
●
●
0 −2 −3
Moscow
Athens
Krakow Sofia ● Prague Elsinki Oslo
Rome
●
Copenhagen
−2
●
Minsk Athens Madrid
●
●
Sarajevo Berlin
●
Dim 1 ( 82.9 %)
Budapest
●
Krakow Sofia ● Prague
Stockholm
−4
−4
Kiev
2
2
Budapest
Minsk
Dim 2 ( 15.4 %)
4
Dim 1 ( 82.9 %)
Kiev ●
2
Lisbon
●
Dublin Reykjavik
2
Moscow
Elsinki Oslo
Paris ● Brussels ● Amsterdam ● London
Lisbon
−2 −2
●
●
−3 −4
Rome
●
Copenhagen
2
8
Athens Madrid
●
●
1
6
8
Budapest
●
●
Sarajevo
Dublin Reykjavik
−3 4
6
Berlin
●
−2
−2 −3
2
●
Krakow Sofia ● Prague
Stockholm
−1
1
Copenhagen
●
4
●
●
●
0
2
Kiev
Elsinki Oslo
Rome
●
Dublin
Dim 1 ( 82.9 %)
0
Minsk Athens Madrid
●
Dim 2 ( 15.4 %)
Paris Brussels ● Amsterdam ● London
Lisbon
Reykjavik
−2
1
Budapest
●
●
Sarajevo
0
1 0
●
Dublin
−2
Dim 1 ( 82.9 %)
Berlin
Stockholm
−1
Paris Brussels Amsterdam London
−4
Moscow
Krakow Sofia ● Prague Elsinki Oslo
Rome
●
Copenhagen
Reykjavik
8
●
Athens Madrid
●
●
Dim 2 ( 15.4 %)
●
6
Minsk
●
●
Sarajevo Berlin
Stockholm
−1
Copenhagen
4
Kiev
2
●
2
Dim 1 ( 82.9 %)
●
Dim 2 ( 15.4 %)
1 0
−2
Moscow
Krakow Sofia ● Prague Elsinki Oslo
Rome
●
●
−4
0 −2 −3
−4
Minsk Athens Madrid
● Sarajevo Berlin
−1
Dim 2 ( 15.4 %)
4
Kiev
2
2
2
Dim 1 ( 82.9 %)
2
−2
●
• Calculate
anew the Q centers of gravity
1 −3
−4
0
8
−1
6
Budapest
●
●
Lisbon
Dublin Reykjavik
Dim 2 ( 15.4 %)
4
Moscow
Krakow Sofia ● Prague
Stockholm
Paris Brussels Amsterdam London
Dublin
−3 2
Kiev
Elsinki Oslo
Rome
●
Lisbon
−2
−2
−2
0
Minsk
points to the closest center
Athens Madrid
●
●
●
●
Copenhagen
Reykjavik
Dim 1 ( 82.9 %)
• Assign the
Lisbon
Reykjavik
−2
Moscow
●
Dublin
●
●
−4
Copenhagen Paris Brussels Amsterdam London
●
Sarajevo Berlin
Stockholm
●
●
Dublin Reykjavik
●
−1
Copenhagen
Krakow Sofia ● Prague Elsinki Oslo
Rome
Berlin
Stockholm 0
1
●
●
●
Dim 2 ( 15.4 %)
Paris ● Brussels ● Amsterdam ● London
●
Madrid
●
●
● Sarajevo
−1
●
Berlin
Stockholm
Athens
Krakow Sofia ● Prague Elsinki Oslo
Rome
Dim 2 ( 15.4 %)
Copenhagen
●
●
0
● ●
Madrid
●
●
Sarajevo
−1
Dim 2 ( 15.4 %)
1 0
●
Athens
Krakow Sofia ● Prague Elsinki Oslo
Rome
Berlin
●
−1
Dim 2 ( 15.4 %)
●
●
Stockholm
●
Madrid
●
●
Sarajevo
●
−3
randomly Q centers of gravity
Athens
Krakow Sofia ● Prague Elsinki ● Oslo
−4
−2
0
2
Dim 1 ( 82.9 %)
4
6
8
−4
−2
0
2
4
6
8
Dim 1 ( 82.9 %)
22 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Hierarchical clustering 1 Introduction 2 Principles of hierarchical clustering 3 Example 4 Partitioning algorithm : K-means 5 Extras 6 Characterizing classes of individuals
23 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Robustifying a partition obtained using hierarchical clustering The partition obtained by hierarchical clustering is not optimal and can be improved or made robust using K-means Algorithm : • use the obtained hierarchical partition to initialize K-means • run a few iterations of K-means
=⇒ potentially improved partition Advantage : more robust partition Disadvantage : loss of hierarchical structure
24 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Hierarchical clustering in high dimension • If many variables : do PCA and keep only first axes =⇒ takes
us to classical case • If many individuals, hierarchical algorithm is too long • Use K-means to partition into around 100 classes • Build tree using these classes (weighted by the number of individuals in each class) • Gives us the “top” of the tree
25 / 42
0.00
0.02
0.04
0.04
0.06
0.06
0.08
0.08
Example
Cluster Dendrogram
Tree from original data 1 14 33 32 2 40 42 6 35 36 38 26 47 11 25 43 5 16 30 39 19 49 10 17 3 27 23 13 20 8 22 21 50 18 7 41 29 31 34 15 37 24 4 45 28 46 48 9 44 12
0.02
Principles of hierarchical clustering
0.00
158 124 146 118 123 117 169 139 140 285 115 155 176 70 116 11 90 125 130 143 129 218 298 114 174 106 107 73 147 282 292 105 150 137 94 149 82 23 161 83 197 47 194 142 181 290 1 263 88 46 75 26 294 86 54 205 6 4 223 217 198 191 203 209 262 180 189 186 102 226 188 187 200 38 167 154 40 163 37 63 152 166 213 228 296 20 231 164 275 29 84 291 260 109 27 7 30 178 45 214 119 111 5 77 170 71 121 12 134 55 193 237 219 76 97 258 41 144 141 177 36 184 254 230 145 250 175 43 247 284 91 216 85 157 104 179 131 293 126 245 93 16 288 206 196 24 81 113 153 15 159 9 128 151 13 19 49 42 300 156 183 2 232 162 34 242 96 132 8 14 257 256 227 222 211 212 249 165 199 195 182 202 220 190 53 100 208 74 95 31 168 248 299 171 287 272 297 229 261 62 66 22 44 127 148 204 273 274 135 136 239 103 98 80 286 289 210 215 221 28 72 236 69 138 281 78 89 270 61 99 267 233 57 18 279 244 234 60 246 225 50 10 173 17 35 278 269 240 224 120 52 271 277 68 108 101 264 133 268 87 259 265 122 251 252 59 295 207 283 67 192 276 92 65 110 25 238 79 266 255 39 160 64 3 201 253 48 112 243 241 56 32 51 172 58 21 33 185 280 235
Introduction K-means Extras Describing the classes found
Hierarchical clustering in high dimension
• If many variables : do PCA and keep only first axes =⇒ takes
us to classical case
• If many individuals, hierarchical algorithm is too long • Use K-means to partition into around 100 classes • Build tree using these classes (weighted by the number of individuals in each class) Hierarchical clustering Hierarchical Clustering • Gives us the “top” of the tree Hierarchical Classification
Tree using classes 25 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Hierarchical clustering on qualitative data
Two strategies :
• Transform them to quantitative data • Do MCA and keep only the first dimensions • Do hierarchical clustering using the principal axes of the MCA • Use measures/indices suitable for qualitative variables :
similarity indices, Jaccard index, etc.
26 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Doing factor analysis followed by clustering • Qualitative data : MCA outputs quantitative principal
components • Factor analysis eliminates the last components, which are just
noise =⇒ more stable clustering x.1
x.k
x.K
F1
FQ
FK
PCA Data
Structure
Noise
27 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Doing factor analysis followed by clustering • Representation of the tree and classes on two factor axes
=⇒ FA gives continuous information, the tree gives discontinuous information. The tree hints at information hidden in further axes Hierarchical clustering on the factor map
2
3
height
4
5
6
7
cluster 1 cluster 2 cluster 3
1
Moscow Kiev Minsk Krakow Prague Sofia Oslo Sarajevo Berlin Stockholm Paris Copenhagen Brussels Reykjavik Dublin LondonAmsterdam
Budapest
0
Elsinki
-6
-4
-2
0
2
Athens Madrid Rome Lisbon
4
6
-3
-2
-1
0
1
2
3
8
Dim 1 (82.9%)
28 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Hierarchical clustering 1 Introduction 2 Principles of hierarchical clustering 3 Example 4 Partitioning algorithm : K-means 5 Extras 6 Characterizing classes of individuals
29 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
The class make-up : using “model individuals” Model individuals : the ones closest to each class center Oslo 0.339
Cluster 2:
Berlin 0.5764
Cluster 3:
Rome 0.360 4
Cluster 1:
Helsinki 0.884
Stockholm 0.9224
Minsk 0.9654
Sarajevo 0.7164
Brussels 1.038
Prague 1.0556
Lisbon 1.737
Madrid 1.835
Athens 2.167
Kiev
2
Budapest
Minsk
cluster 1
Krakow
Prague Sofia
Sarajevo Oslo Elsinki Berlin Stockholm Copenhagen cluster 2 Paris Brussels Amsterdam London
Athens
Madrid Rome
cluster 3
Lisbon
-2
0
Amsterdam 1.124
cluster 1 cluster 2 cluster 3 Moscow
Dim 2 (15.40%)
Moscow 1.7664
Reykjavik
-5
Dublin
0 Dim 1 (82.90%)
5
30 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Characterizing/describing classes
• Goals : • Find the variables which are most important for the partition • Characterize a class (or group of individuals) in terms of quantitative variables • Sort the variables that best describe the classes
• Questions : • Which variables best characterize the partition • How can we characterize individuals in the 1st class ? • Which variables describe them best ?
31 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Characterizing/describing classes Which variables best represent the partition ? • For each quantitative variable : • build an analysis of variance model between the quantitative
variable and the class variable • do a Fisher test to detect class effect
• Sort the variables by increasing p-value October March November September April February December January August July May June
Eta2 0.8990 0.8865 0.8707 0.8560 0.8353 0.8246 0.7730 0.7477 0.7160 0.6309 0.5860 0.5753
P-value 1.108e-10 3.556e-10 1.301e-09 3.842e-09 1.466e-08 2.754e-08 3.631e-07 1.047e-06 3.415e-06 4.690e-05 1.479e-04 1.911e-04 32 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Characterizing classes using quantitative variables January
●
February
● ● ●
●
● ●●
● ● ●●
●● ●● ● ● ●
●
●● ●●
●● ●● ●●
● ●●● ●
●
●
●
●
●
●
●
Elsinki ● ●● ● ●● ●● ● ● ●●● ●●● ● ● ● ● ● ● Kiev Minsk ●●● ● ●●● ●● ●● ●● ●●● ● ●● ● ● ● Moscow Oslo Reykjavik ● ● ● ● ● ● ●●● ●●●● ●● ●● ● ●● ● ● Stockholm Amsterdam Berlin ● ● ● ●● ●●●●● ●● ●● ●● ● ● ● ●● ● Brussels Budapest ● ● ●● ●●●● ●● ●● ● ● ●● Copenhagen Dublin Krakow ● ●●● ●●● ● ●●●●●●● ● ●● London Paris ● ● ● ●● ● ● ●●●●● ●●● ●● ● ● ●● Prague Sarajevo Sofia ● ●● ● ● ● ● ●● ●●●●●●●● ● ● ● ● ● ● Athens Lisbon Madrid ● ●● ●● ● ●● ●● ● ●●●● ● ● ● ● Rome
March April May June July August September October November December
●
−10
●● ●●● ●
● ●● ●● ●●
0
● ●●
●
●
●●
●●
●
●
●
● ●
10
20 33 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Characterizing classes using quantitative variables 1st idea : if the values of X for class q seem to be randomly drawn from all the values of X , then X doesn’t characterize class q. Random
●
●
● ●
● ●●
●● ●● ● ● ●
●● ●●
●●
●
●
●
January
●
●
● ●
● ●●
●● ●● ● ● ●
●● ●●
●●
●
●
●
−10
−5
0
5
10
Temperature
2nd idea : the more a random draw appears unlikely, the more X characterizes class q.
34 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Characterizing classes using quantitative variables Idea : use as reference a random draw of nq values from N ¯q ?) What values can x¯q take ? (i.e., what is the distribution of X ¯q ) = x¯ E(X ¯q ) = N L(X
2 ¯q ) = s V(X nq
N − nq N −1
¯q is a mean because X
=⇒ Test statistic = r
x¯q − x¯ s2 nq
N−nq N−1
∼ N (0, 1)
• If |test statistic| ≥ 1.96 then X characterizes class q • and the more the test statistic is large, the better X
characterizes class q. Idea : rank the variables by decreasing |test statistic| 35 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Characterizing classes using quantitative variables
$quanti$‘1‘ v.test July June August May September January December November April February October March
-1.99 -2.06 -2.48 -2.55 -3.14 -3.26 -3.27 -3.36 -3.39 -3.44 -3.45 -3.68
Mean in category 16.80 14.70 15.50 10.80 11.00 -5.14 -2.91 0.60 4.67 -4.60 5.76 -1.14
Overall mean 18.90 16.80 18.30 13.30 14.70 0.17 1.84 5.08 8.38 0.96 10.10 4.06
sd in category 2.450 2.520 2.260 2.430 1.670 2.630 1.830 0.940 1.550 2.340 0.919 1.100
Overall sd 3.33 3.07 3.53 2.96 3.68 5.07 4.52 4.14 3.40 5.01 3.87 4.39
p.value 0.046100 0.039600 0.013100 0.010800 0.001710 0.001130 0.001080 0.000781 0.000706 0.000577 0.000553 0.000238
36 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Characterizing classes using quantitative variables $‘2‘ NULL $‘3‘ v.test September October August November July April March February June December January May
3.81 3.72 3.71 3.69 3.60 3.53 3.45 3.43 3.39 3.39 3.29 3.18
Mean in category 21.20 16.80 24.40 12.20 24.50 14.00 11.10 8.95 21.60 8.95 7.92 17.60
Overall mean 14.70 10.10 18.30 5.08 18.90 8.38 4.06 0.96 16.80 1.84 0.17 13.30
sd in category 1.54 1.91 1.88 2.26 2.09 1.18 1.27 1.74 1.86 2.34 2.08 1.55
Overall sd 3.68 3.87 3.53 4.14 3.33 3.40 4.39 5.01 3.07 4.52 5.07 2.96
p.value 0.000140 0.000201 0.000211 0.000222 0.000314 0.000413 0.000564 0.000593 0.000700 0.000706 0.000993 0.001460
37 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Characterizing classes using qualitative variables Which variables best characterize the partition ?
• For each qualitative variable, do a χ2 test between it and the
class variable • Sort the variables by increasing p-value
$test.chi2 Area
p.value df 0.001195843 6
38 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Characterizing classes using qualitative variables Does the South category characterize the 3rd class ? South Not south Total
Cluster 3 nmc = 4 0 nc = 4
Other cluster 1 18 19
Total nm = 5 18 n = 23
Test : H0 : nnmcc = nnm versus H1 : m abnormally overrepresented in c PH0 (Nmc ≥ nmc ) Under H0 : L(Nmc ) = H(nc , nnm , n) Cluster 3 Area=South
Cla/Mod 80
Mod/Cla 100
Global p.value v.test 21.74 0.000564 3.448
4 4 5 ×100 = 80 ; ×100 = 100 ; ×100 = 21.74 ; PH(4, 5 ,23) [Nmc ≥ 4] = 0.000564 23 5 4 23
=⇒ H0 rejected, South is overrepresented in the 3rd class Sort the categories in terms of p-values 39 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Characterizing classes using factor axes These are also quantitative variables $‘1‘ v.test Dim.1
-3.32
Mean in category -3.37
Overall mean 0
sd in category 0.85
Overall sd 3.15
Mean in category -0.18
Overall mean 0
sd in category 0.22
Overall sd 0.36
Mean in category 5.66
Overall mean 0
sd in category 1.26
Overall sd 3.15
p.value 0.000908
$‘2‘ v.test Dim.3
-2.41
p.value 0.015776
$‘3‘ v.test Dim.1
3.86
p.value 0.000112
40 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
Conclusions • Clustering can be done on tables of individuals vs quantitative
variables ⇒ MCA transforms qualitative variables into quantitative ones • hierarchical clustering gives a hierarchical tree ⇒ number of
classes • K-means can be used to make classes more robust • Characterize classes by active and supplementary variables,
quantitative or qualitative
41 / 42
Introduction
Principles of hierarchical clustering
Example
K-means
Extras
Describing the classes found
More
nalysis Series
Husson Lê Pagès
Exploratory Multivariate Analysis by Example Using R Second Edition
31116
Chapman & Hall/CRC Computer Science & Data Analysis Series
Exploratory Multivariate Analysis by Example Using R SECOND EDITION
Husson F., Lê S. & Pagès J. (2017) Exploratory Multivariate Analysis by Example Using R 2nd edition, 230 p., CRC/Press. François Husson • Sébastien Lê Jérôme Pagès
8-1-138-19634-6
90000
1 96346
The FactoMineR package for performing clustering : http://factominer.free.fr/index_fr.html Movies on Youtube : • a Youtube channel: youtube.com/HussonFrancois • a playlist with 11 movies in English • a playlist with 17 movies in French 42 / 42