SFDS 2008 slides about FactoMineR .fr

10.76 7.40 14.26 1.86 49.37 14.05 50.72 4.92 60.15 301.50 2 8122 Decastar ... 10.44 7.96 15.23 2.06 49.19 14.13 50.11 4.90 69.71 282.00 2 8820 OlympicG.
2MB taille 6 téléchargements 259 vues
An R package for exploratory data analysis for teaching and research François Husson, Julie Josse & Sébastien Lê

Why

?

 To make exploratory multivariate data analysis with a free software  The possibility to propose new methods (taking into account different structure on the data)  To have a package user friendly and oriented to practitioner (a very easy GUI)

1 – The classical methods  Methods implemented are similar in their main objective: to sum up and simplify the data by reducing the dimensionality of the dataset  Continuous variables: Principal Components Analysis  Contingency table: Correspondence Analysis  Categorical variables: Multiple Correspondence Analysis  Continuous and categorical variables: Mixed Data Analysis

PCA Example

100m

Long.jump

Shot.put

High.jump

400m

110m.hurdle

Discus

Pole.vault

Javeline

1500m

Rank

Points

Competition

Data : performances of 41 athletes during two meetings of decathlon

SEBRLE CLAY KARPOV BERNARD YURKOV

11.04 10.76 11.02 11.02 11.34

7.58 7.40 7.30 7.23 7.09

14.83 14.26 14.77 14.25 15.19

2.07 1.86 2.04 1.92 2.10

49.81 49.37 48.37 48.93 50.42

14.69 14.05 14.09 14.99 15.31

43.75 50.72 48.95 40.87 46.26

5.02 4.92 4.92 5.32 4.72

63.19 60.15 50.31 62.77 63.44

291.70 301.50 300.20 280.10 276.40

1 2 3 4 5

8217 8122 8099 8067 8036

Decastar Decastar Decastar Decastar Decastar

Sebrle Clay Karpov Macey Warners

10.85 10.44 10.50 10.89 10.62

7.84 7.96 7.81 7.47 7.74

16.36 15.23 15.93 15.73 14.48

2.12 2.06 2.09 2.15 1.97

48.36 49.19 46.81 48.97 47.97

14.05 14.13 13.97 14.56 14.01

48.72 50.11 51.65 48.34 43.73

5.00 4.90 4.60 4.40 4.90

70.52 69.71 55.54 58.46 55.39

280.01 282.00 278.11 265.42 278.05

1 2 3 4 5

8893 8820 8725 8414 8343

OlympicG OlympicG OlympicG OlympicG OlympicG

PCA example  Introduction of Variables factor map (PCA)

supplementary continuous Discus

representing the variables according to their quality of representation

Shot.put

High.jump

110m.hurdle 100m Rank

0.0



1500m Javeline

Dimension 2 (17.37%)

 Graphs enriched by :

0.5

variables

400m

Points

Long.jump -0.5



1.0

supplementary information:



contribution



quality of representation

-1.0

 Indicators: -1.0

-0.5

0.0 Dimension 1 (32.72%)

0.5

1.0

PCA example  Introduction of supplementary information: •

supplementary individuals



supplementary categorical variables

 Graphs enriched by:

Decastar OlympicG

around the categories

 Indicators:

4 0

confidence ellipses

Dimension 2 (17.37%)

information

YURKOV Parkhomenko Korkizoglou

2

supplementary •

Casarsa

coloring according to

-2



Individuals factor map (PCA)

Zsivoczky Macey Smith Pogorelov SEBRLE CLAY HERNU Terek MARTINEAU Barras KARPOV Uldal Turi McMULLEN Decastar Schoenbeck BOURGUIGNON BARRAS Qi OlympicG Bernard Karlivans Ojaniemi Hernu BERNARD Smirnov ZSIVOCZKY Gomez Nool Lorenzo Averyanov Schwarzl WARNERS NOOL Warners

Sebrle Clay Karpov

Drews

contribution



quality of representation

-4



-4

-2

0 Dimension 1 (32.72%)

2

4

6

PCA example 4

Casarsa

Nb points YURKOV Parkhomenko 2

Korkizoglou Sebrle

Dimension 2 (17.37 %)

Zsivoczky Smith Pogorelov

MARTINEAU HERNU BOURGUIGNON

Terek

Turi

Uldal

SEBRLE CLAY

Barras

Decastar

0

Macey

McMULLEN OlympicG

Clay

KARPOV

Karpov

Bernard

Schoenbeck Hernu BERNARD Ojaniemi

KarlivansBARRAS

Lorenzo

Qi

ZSIVOCZKY

Smirnov Gomez

Schwarzl

NOOL

Nool Averyanov Warners

WARNERS

-2

Drews -4

-2

0

Dimension 1 (32.71 %)

2

4

PCA example 4

Casarsa

Pole.vault YURKOV Parkhomenko

2

Korkizoglou Sebrle Zsivoczky

Macey

Dimension 2 (17.37 %)

Smith MARTINEAU HERNU BOURGUIGNON

Pogorelov

Terek

Barras

Turi

Uldal

Decastar

0

Karlivans BARRAS

McMULLEN OlympicG

Qi

KARPOV

Karpov

Bernard

Schoenbeck Hernu

BERNARD

Ojaniemi

ZSIVOCZKY

Smirnov Gomez

Lorenzo

Clay

SEBRLE CLAY

Schwarzl

Nool Averyanov

NOOL WARNERS

-2

Warners

Drews

-4

-2

0

Dimension 1 (32.71 %)

2

4

Description of the dimensions  By the quantitative variables: • The correlation between each variable and the coordinate of the individuals on the axis s is calculated • The correlation coefficients are sorted • Only the significant correlations are given

Best variable to describe the 1st dimension

$Dim.1 $Dim.1$quanti Dim.1 Points 0.96 Long.jump 0.74 Shot.put 0.62 Rank -0.67 400m -0.68 110m.hurdle -0.75 100m -0.77

$Dim.2 $Dim.2$quanti Dim.2 Discus 0.61 Shot.put 0.60

Significant level = 0.05

Description of the dimensions  By the qualitative variables: • Perform a one-way analysis of variance with the coordinates of the individuals on the axis explained by the qualitative variable $Dim.1$quali Competition

P-value 0.155

$Dim.1$category Estimate P-value OlympicG 0.4393 0.155 Decastar -0.4393 0.155

• A F-test by variable • For each category, a student T-test to compare the average of the category with the general

Significant level = 0.2

mean

2 – Structure on the data Different structure on the data are proposed:  a partition on the variables: several sets of variables are simultaneously studied: Multiple Factor Analysis, Generalized Procrustes Analysis  a hierarchy on the variables: variables are grouped and subgrouped (like in questionnaires structured in topics and subtopics): Hierarchical Multiple Factor Analysis  a partition on the individuals: several sets of individuals described by the same variables: Dual Multiple Factor Analysis

Groups of variables (MFA) Groups of variables are quantitative and/ or qualitative Objectives : - study the link between the sets of variables - balance the influence of each group of variables - give the classical graphs but also specific graphs: groups of variables - partial representation Examples : - Genomic: DNA, protein - Sensory analysis: sensorial, physico-chemical - Comparison of coding (quantitative / qualitative)

Hierarchy on the variables (HMFA)

Two levels for the hierarchy: the first one contains L groups, each l group contains Jl subgroups, and each subgroup have Kj variables

Objective: to balance the groups and the subgroups of variables

Partition on the individuals (DMFA) 1

k

K

1 Group 1

i

xik

I1 1 Group J

IJ

Objective: to compare the covariance matrices

3 – Graphical User Interface

Menu of the FactoMineR GUI

3 – Graphical User Interface

Main window of the PCA

3 – Graphical User Interface Graphical options

3 – Graphical User Interface

4 – Conclusion For researchers, practitioners and students: with classical and advanced methods The FactoMineR package is available on the CRAN The GUI can be simply loaded: source("http://factominer.free.fr/install-facto.r")

A website is dedicated to this package: http://factominer.free.fr Future: dynamical graphs Perspective: UseR!2008 (2 tutorials), UseR!2009 at Rennes

MFA example: representation of the individuals 4

http://factominer.free.fr T2

2 0

1VAU

3EL 1TUR 1FON

2ING

1CHA

4ELPER1

DOM12BOU 1BOI1DAM 1POY 1ROC 2EL 1ING 2DAM 1BEN 2BEA

-2

Dim 2 (19.49 %)

T1

-4

-2

0

Dim 1 (49.38 %)

2

odor visual odor.after.shak. taste overall

http://factominer.free.fr

Spice.before.shaking

Spice Odor.Intensity.before.shaking Bitterness Odor.Intensity Acidity Astringency Phenolic Nuance Visual.intensity Alcohol Aroma.persistency Surface.feeling Attack.intensity Intensity Aroma.intensity Aroma.quality.before.shaking Fruity.before.shaking Harmony Fruity Flower.before.shaking Quality.of.odour Smooth Typical Balance Overall.quality Flower Aroma.quality Plante

-1.0

Dim 2 (19.49 %) -0.5 0.0 0.5

1.0

MFA example:Correlation representation of the variables circle

-1.0

-0.5

0.0 0.5 Dim 1 (49.38 %)

1.0

MFA example: representation of the groups

origin 0.6

odor

0.4

odor.after.shaking

overall 0.2

taste

visual 0.0

Dim 2 (19.49 %)

0.8

1.0

http://factominer.free.fr

0.0

0.2

0.4

0.6

Dim 1 (49.38 %)

0.8

1.0

http://factominer.free.fr

odor visual odor.after.shaking taste

4

6

Individual factor map of the partial points MFA example: representation

2

1VAU

3EL 4ELPER1

0

DOM1 1BOI 1DAM 1TUR 1FON 1ING 1ROC 2BOU 1POY 1BEN 2EL 2DAM 2BEA 1CHA

-2

2ING

-4

Dim 2 (19.49 %)

T2 T1

-6

-4

-2

0

Dim 1 (49.38 %)

2

4

MFA example: representation of the partial points 6 4 2

Env4

Env2

Saumur

Chinon Env1 Bourgueuil

0

Dim 2 (19.49 %)

http://factominer.free.fr

olf vis olfag gust

-4

-2

Reference

0 2 Dim 1 (49.38 %)

4

Unsupervised classification

Cluster Dendrogram for Solution HClust.2

Observation Number in Data Set don

1ROC

1BEN

DOM1

2EL

1TUR

1FON

1POY

2DAM

1ING

1BOI

2BOU

PER1

4EL

2BEA

1DAM

T2

2ING

1VAU

T1

3EL

1CHA

4

6

4 classes

2 0

Height

8

10

12

http://factominer.free.fr

MFA example: representation of the individuals http://factominer.free.fr

4

classe1 classe2 classe3 classe4

T2 classe3

Dim 2 (19.49 %)

2

T1

1VAU

3EL 4EL

classe2

PER1

0

1DAM DOM1 2BOU 1BOI 1TUR 1ROC classe1 1FON 1POY 2EL 1ING classe4 2DAM 1BEN 1CHA 2BEA

-2

2ING

-4

-2 Dim 1 (49.38 %)

0

2

1.0

Prefmap-PLS graph between Rank and Points

http://factominer.free.fr Long.jump

0.5

Shot.put High.jump Discus Javeline

0.0 -0.5

X1500m

X110m.hurdle X400m X100m

-1.0

Points

Pole.vault

-1.0

-0.5

0.0

0.5

Rank Correlation between Rank and Points : -0.7392

1.0

1.0

MFA example: representation of the variables http://factominer.free.fr

-1.0

Dimension 2 (16.46%) -0.5 0.0 0.5

Petal.Width Petal.Width Petal.Width Petal.Width Sepal.Width Sepal.Width Sepal.Width Sepal.Width Petal.Length Sepal.Length Petal.Length Petal.Length Sepal.Length Sepal.Length Sepal.Length Petal.Length

setosa versicolor virginica var

-1.0

-0.5

0.0 0.5 Dimension 1 (61.49%)

1.0

Projection of the groups

0.6 0.4

virginica

0.2

setosa versicolor

0.0

Dim 2

0.8

1.0

http://factominer.free.fr

0.0

0.2

0.4

0.6 Dim 1

0.8

1.0

1.2

1.0

Biplot between axes 1 and 2 for group versicolor

0.5

http://factominer.free.fr Petal.Width

0.0

Petal.Length

-0.5

Sepal.Length

-1.0

Dim.2

Sepal.Width

-1.0

-0.5

0.0

0.5

Dim.1 Correlation between Dim.1 and Dim.2 : 0.09613

1.0

Individual factor map

http://factominer.free.fr

A 1

OA

0

GBM

-1

O

-2

Dim 2 (13.29 %)

2

3

A GBM O OA

-3

-2

-1

0 Dim 1 (20.39 %)

1

2

3

2

Individual factor map

http://factominer.free.fr

CGH expr

1

A

0

GBM

O -1

Dim 2 (13.29 %)

OA

-1

0 Dim 1 (20.39 %)

1

2

Individual factor map

2

http://factominer.free.fr

OA

0

GBM

-2

O

-4

Dim 2 (13.29 %)

A

-4

-2

0 Dim 1 (20.39 %)

2

4

*Dataframe 8

http://factominer.free.fr

Clay

Sebrle Karpov Warners

SEBRLE WARNERS

HERNU

Nool Schwarzl

Ojaniemi Macey Bernard CLAY Drews Averyanov

Qi

SchoenbeckPogorelov OlympicG KARPOV

p m .ju g n o L

McMULLEN

ZSIVOCZKY Gomez

NOOL

Decastar

Hernu

Karlivans

BERNARD Zsivoczky YURKOV

Smirnov Korkizoglou Lorenzo 7

Uldal

Terek

BARRAS

Barras Turi

BOURGUIGNON

Smith

MARTINEAU

Casarsa Parkhomenko

11

X100m

12

The FactoMineR team is nearly all the time ready to improve the package