Extensions - FactoMineR

Bibliography. Greenacre M (1984). Theory and Applications of Correspondence Analysis. Academic. Press, London, England. Greenacre M (2007).
529KB taille 2 téléchargements 324 vues
Extensions

• Different structures on the data • Mixed data (continuous and categorical variables) • Missing values • Graphical User Interface

1

Structure on the data Different structures on the data are proposed:  a partition on the variables: several sets of variables are simultaneously studied: Multiple Factor Analysis  a hierarchy on the variables: variables are grouped and subgrouped (like in questionnaires structured in topics and subtopics): Hierarchical Multiple Factor Analysis  a partition on the individuals: several sets of individuals described by the same variables: Dual Multiple Factor Analysis 2

Groups of variables (MFA) Groups of variables are quantitative and/ or qualitative Objectives: - study the link between the sets of variables - balance the influence of each group of variables - give the classical graphs but also specific graphs: groups of variables - partial representation Examples: - Genomic: DNA, protein - Sensory analysis: sensorial, physico-chemical - Comparison of coding (quantitative / qualitative) 3

4

MFA example: representation of the individuals

T2

2 0

1VAU

3EL 1TUR 1FON

2ING

1CHA

4ELPER1

DOM12BOU 1BOI1DAM 1POY 1ROC 2EL 1ING 2DAM 1BEN 2BEA

-2

Dim 2 (19.49 %)

T1

-4

-2

0

Dim 1 (49.38 %)

2 4

odor visual odor.after.shak. taste overall

Spice.before.shaking

Spice Odor.Intensity.before.shaking Bitterness Odor.Intensity Acidity Astringency Phenolic Nuance Visual.intensity Alcohol Aroma.persistency Surface.feeling Attack.intensity Intensity Aroma.intensity Aroma.quality.before.shaking Fruity.before.shaking Harmony Fruity Flower.before.shaking Quality.of.odour Smooth Typical Balance Overall.quality Flower Aroma.quality Plante

-1.0

Dim 2 (19.49 %) -0.5 0.0 0.5

1.0

MFA example:Correlation representation of the variables circle

-1.0

-0.5

0.0 0.5 Dim 1 (49.38 %)

1.0 5

origin 0.6

odor

0.4

odor.after.shaking

overall 0.2

taste

visual 0.0

Dim 2 (19.49 %)

0.8

1.0

MFA example: representation of the groups

0.0

0.2

0.4

0.6

0.8

1.0

Dim 1 (49.38 %) 6

odor visual odor.after.shaking taste

4

6

Individual factor map of the partial points MFA example: representation

2

1VAU

3EL 4ELPER1

0

DOM1 1BOI 1DAM 1TUR 1FON 1ING 1ROC 2BOU 1POY 1BEN 2EL 2DAM 2BEA 1CHA

-2

2ING

-4

Dim 2 (19.49 %)

T2 T1

-6

-4

-2

0

2

4

Dim 1 (49.38 %)

7

Hierarchy on the variables (HMFA)

Two levels for the hierarchy: the first one contains L groups, each l group contains Jl subgroups, and each subgroup have Kj variables

Objective: to balance the groups and the subgroups of variables 8

Partition on the individuals (DMFA) 1

k

K

1 Group 1

i

xik

I1 1 Group J

IJ

Objective: to compare the covariance matrices

9

Mixed data analysis Objectives: - study simulatneously continuous and categorical variables - balance the influence of each variable Outputs: - representation of the individuals and the categories - representation of the correlation circle graph - representation of all the variables Remark: this method gives the same results than PCA for continuous variables and the same results than MCA for categorical variables 10

Mixed data analysis

Correlation circle

Individual factor map

2

1.0

18

Env2

4

11

12

5

13 14

0

Reference 21 20Env4

2

8

1 67

16

Overall.quality

-1

Env1

Typical 0.0

9 Saumur

Dim 2 (19.62 %)

1

17

3

-0.5

Bourgueuil

-2

15

-1.0

19

-1

0

1

2

-1.0

Dim 1 (35.39 %)

-0.5

Variables representation

0.0

0.5

1.0

0.8

1.0

Dim 1 (35.39 %)

0.6

Dim 2 (19.62 %)

Soil Label

0.4

-2

0.2

-3

Typical Overall.quality

0.0

Dim 2 (19.62 %)

0.5

10Chinon

0.0

0.2

Groups representation

0.4

0.6

Dim 1 (35.39 %)

0.8

1.0

11

Missing values - Delete individuals with missing values (not a good idea!) - Replace missing values by the mean of the variable (for PCA) - Use imputation method - Use specific algorithms: - nipals, EM algorithm - the principle of the EM algorithm is: - make factorial analysis - complete the data using the factorial results - do this to steps alternatively until convergence 12

Graphical User Interface The GUI can be simply loaded: source(http://factominer.free.fr/install-facto.r)

Menu of the FactoMineR GUI

13

Graphical User Interface

Main window of the PCA

14

Graphical User Interface Graphical options

15

Bibliography Greenacre M (1984). Theory and Applications of Correspondence Analysis. Academic Press, London, England. Greenacre M (2007). Correspondence Analysis in Practice. Chapman & Hall/CRC, Boca Raton, FL, 2nd edition Greenacre M, Blasius J (2006). Multiple Correspondence Analysis and Related Methods. Chapman & Hall/CRC, Boca Raton, FL. Escofier B and Pagès J (1988, 1990, 1993, 1998) Analyses factorielles simples et multiples. Dunod. Escofier B, Pagès J (1994). Multiple Factor Analysis (AFMULT package). Computational statistics & data analysis 18 121-140 Jambu M (1991) Exploratory and Multivariate Data. Elsevier. Jolliffe I Principal Component Analysis Springer. 2nd edition 2002 Tenenhaus M, Young FW (1985). ``An Analysis and Synthesis of Multiple Correspondence Analysis, Optimal Scaling, Dual Scaling, Homogeneity Analysis and Other Methods for Quantifying Categorical Multivariate Data.'' Psychometrika, 50(1), 91-119. Bécue M & Pagès J (2003) A principal axes method for comparing contingency tables: MFACT. Computational Statistics and Data Analysis. (45) 3 481-503

16

Bibliography Packages Chessel D, Dufour AB, Thioulouse J (2004). ``The ade4 package I: One-table methods.'' R News, 4/1, 5-10. de Leeuw J, Mair P (2008). ``Homogeneity Analysis in R: The package homals.'' Journal of Statistical Software. Dray S, Dofour AB, Chessel D (2007). ``The ade4 package II: Two-table and Ktable methods.'‘ R News, 7/2, 47-52 Lê S, Josse J & Husson F (2008). ``FactoMineR: An R package for multivariate. ''Journal of Statistical Software, 25(1), 1-18. Nenadi O, Greenacre M (2006). ``Correspondence analysis in R, with two- and three-dimensional Graphics: The ca package. ''Journal of Statistical Software, 20(3), 1-13. 17