FAST, a novel “Factorial Approach” for Sorting Task ... - Marine Cadoret

4. Shalimar. 2. 2. 3. 2. 1. Each consumer can also be considered as a categorical variable .... Superimposed representation of the products and their descriptions.
389KB taille 4 téléchargements 42 vues
Analyzing categorization data

Marine Cadoret, Sébastien Lê, Jérôme Pagès AGROCAMPUS OUEST, France

International Federation of Classification Societies March 15th 2009, Dresden

Introduction “

“

Categorization consists in grouping objects in function of their resemblances. Following this task, a verbalization task can also be asked to describe the groups (“qualified” categorization).

2/28

Data “

98 consumers carried out a “qualified” categorization on 12 luxury perfumes:

Angel

Shalimar

Lolita Lempicka

L’instant Cinéma

Aromatics Chanel Coco Elixir Mademoiselle n°5

J’adore J’adore (ET) (EP)

Pure Poison

Pleasures 3/28

« gourmand, vanilla, wooded »

« spicy, aldehyde »

« white flower, vanilla, orange »

« oriental, showy, wooded, Patchouli oil »

« flower, floral, green »

4/28

Data table (1) Shalimar Shalimar Aromatics Elixir Chanel n°5 Angel Lolita Lempicka Cinéma L'instant Pure Poison Coco Mademoiselle Pleasures J'adore (EP) J'adore (ET)

98 42 30 21 9 10 13 11 9 6 6 7

Aromatics Chanel Lolita Pure Coco J'adore J'adore Angel Cinéma L'instant Pleasures Elixir n°5 Lempicka Poison Mademoiselle (EP) (ET) 42 98 51 27 6 8 13 12 12 11 12 7

30 51 98 15 8 9 10 21 11 14 12 14

21 27 15 98 36 18 14 10 10 11 11 12

9 6 8 36 98 42 22 18 21 18 18 18

10 8 9 18 42 98 26 28 30 22 23 24

13 13 10 14 22 26 98 25 20 23 28 22

11 12 21 10 18 28 25 98 33 30 29 28

9 12 11 10 21 30 20 33 98 28 28 38

6 11 14 11 18 22 23 30 28 98 38 48

6 12 12 11 18 23 28 29 28 38 98 56

Data usually gathered in a cooccurrences (or dissimilarities) matrix and analyzed by non-metric MDS

5/28

7 7 14 12 18 24 22 28 38 48 56 98

Data table (2)

produit Angel Aromatic Elixir Chanel n°5 Cinéma Coco Mademoiselle J'adore (EP) J'adore (ET) L'instant Lolita Lempicka Pleasures Pure Poison Shalimar

juge 12 1 3 4 2 1 1 2 1 1 3 1 2

juge 13 4 3 3 5 5 6 6 4 5 4 1 2

juge 14 1 5 4 6 2 2 2 6 1 6 2 3

juge 15 5 2 1 4 4 3 3 2 5 3 4 2

juge 16 2 1 3 2 3 3 3 4 2 4 4 1

Each consumer can also be considered as a categorical variable

6/28

Data table (2)

produit Angel Aromatic Elixir Chanel n°5 Cinéma Coco Mademoiselle J'adore (EP) J'adore (ET) L'instant Lolita Lempicka Pleasures Pure Poison Shalimar

juge 12 fleuri doux fort homme Gr 4 fleuri artificiel herbe fleuri doux fleuri doux fleuri artificiel herbe fleuri doux fleuri doux fort homme fleuri doux fleuri artificiel herbe

juge 13 fruité fort capiteux grand-mère capiteux grand-mère fruité moyen fruité moyen sucré faible sucré faible fruité fort fruité moyen fruité fort acidulé désodorisant fort lavande eau de cologne

juge 14 vanillé épicé esprit des îles rude fort toilettes sucré douceur fleuri douceur fleuri douceur fleuri sucré vanillé épicé esprit des îles sucré douceur fleuri renfermé agressif

juge 15 à manger sucré le vieux savon doux doux fleuri fleuri le vieux à manger sucré fleuri doux le vieux

juge 16 nourriture épice ménager cire connu classique nourriture épice connu classique connu classique connu classique fleuri nourriture épice fleuri fleuri ménager cire

Let’s run MCA on this data table!

7/28

Representation of the perfumes MCA factor map

1.5

Angel

1.0 0.5

Cinéma

Shalimar

0.0

L'instant

Aromatics Elixir

-0.5

Coco Mademoiselle Pure Poison J'adore (ET) Pleasures J'adore (EP)

Chanel n°5

-1.0

Dim 2 (13.64%)

Lolita Lempicka

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

Dim 1 (17.8%)

8/28

Co-occurrences matrix Shalimar Shalimar Aromatics Elixir Chanel n°5 Angel Lolita Lempicka Cinéma L'instant Pure Poison Coco Mademoiselle Pleasures J'adore (EP) J'adore (ET)

98 42 30 21 9 10 13 11 9 6 6 7

Aromatics Chanel Lolita Pure Coco J'adore J'adore Angel Cinéma L'instant Pleasures Elixir n°5 Lempicka Poison Mademoiselle (EP) (ET) 42 98 51 27 6 8 13 12 12 11 12 7

30 51 98 15 8 9 10 21 11 14 12 14

21 27 15 98 36 18 14 10 10 11 11 12

9 6 8 36 98 42 22 18 21 18 18 18

10 8 9 18 42 98 26 28 30 22 23 24

13 13 10 14 22 26 98 25 20 23 28 22

11 12 21 10 18 28 25 98 33 30 29 28

9 12 11 10 21 30 20 33 98 28 28 38

6 11 14 11 18 22 23 30 28 98 38 48

6 12 12 11 18 23 28 29 28 38 98 56

9/28

7 7 14 12 18 24 22 28 38 48 56 98

Representation of the words 2

strong fruity strong honey marked sweet cold tabacco

Angel Lolita Lempicka 1

Dim 2 (13.64%)

strong spicy warm vanilla vanilla sweet fruity chocolate warm sweet

MCA factor map

Cinéma

-1

0

Shalimar L'instant toilet Coco Mademoiselle Aromatics Elixir Pure Poison passion J'adore (ET)Pleasures Chanel n°5 J'adore (EP) sweet light

soft cleanliness fruity chemical solvant discrete discrete fruity

-1

0

1

spicy medicine oriental nauseating strong deodorant

2

Dim 1 (17.8%) 10/28

Confidence ellipses around products

P4 P1 P3 P2 F4

F1 F2

F3

Superimposed representation of the products and their descriptions 11/28

Confidence ellipses around products

P4 P1 P3 P2 F4

F1 F2

F3

P4 is at the barycentre of the words used to describe the groups

12/28

Confidence ellipses around products Panelist’s words (resampled) P4

product P4 (resampled)

13/28

Confidence ellipses around products product P4 (resampled) P4

14/28

Confidence ellipses around products

P4

product P4 (resampled)

15/28

Confidence ellipses around products

P4

16/28

1.5

2.0

Confidence ellipses for perfumes data Angel

0.5

Cin éma

0.0

L'instant

Shalimar

-0.5

Coco Mademoiselle Pure Poison J'adore (ET)

Aromatics

J'adore (EP) Pleasures

Elixir

Chanel n °5

-1.0

Dim 2 (13.64%)

1.0

Lolita Lempicka

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

Dim 1 (17.8%)

17/28

0.5

Coco Mademoiselle Angel Aromatics Elixir Cinéma Chanel n°5

0.0

Shalimar

-0.5

J'adore (ET)

L'instant Lolita Lempicka Pure Poison Pleasures J'adore (EP)

-1.0

Dim 2 (10.14%)

1.0

Confidence ellipses for random data

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

Dim 1 (11.03%) 18/28

Explanation “

Number of columns >> number of rows

“

Automatic production of common dimensions

“

Looking for an indicator of consensus between subjects

19/28

Significance of the results “

H0: absence of consensus

“

Indicator: first eigenvalue

“

Evolution of the indicator under H0:

Number of products

10 20 50 100

10 0,69566 0,471051 0,306195 0,236413

Number of subjects 20 50 0,624637 0,557816 0,389234 0,319656 0,228844 0,167827 0,16401 0,109948

100 0,524184 0,286282 0,14006 0,086396

20/28

Bar plots of the eigenvalues (10 products) 20 subjects

dim 6

dim 7

dim 8

0.6 0.5 0.4 0.3 0.2 0.1 dim 1

dim 9

dim 2

dim 3

dim 4

dim 5

dim 6

dim 7

dim 8

dim 9

dim 1

dim 2

dim 3

dim 4

dim 5

dim 6

dim 7

dim 8

dim 9

1000 subjects

0.4

0.5

0.5

0.6

0.6

0.7

0.7

100 subjects

0.4

0.3 0.2 0.1 0.0

dim 5

0.3

dim 4

0.2

dim 3

0.1

dim 2

0.0

dim 1

0.0

0.0

0.0

0.1

0.1

0.2

0.2

0.3

0.3

0.4

0.4

0.5

0.5

0.6

0.6

0.7

0.7

50 subjects

0.7

10 subjects

dim 1

dim 2

dim 3

dim 4

dim 5

dim 6

dim 7

dim 8

dim 9

dim 1

dim 2

dim 3

dim 4

dim 5

dim 6

dim 7

dim 8

dim 9

21/28

Significance of the indicator for a given data table “

Calculate the p-value associated to the first eigenvalue of the MCA: 1. Repeat a great number of times: 1. Independent row permutations within each column 2. Calculate the first eigenvalue associated to the permutated table

2. Distribution of the eigenvalues under H0 3. Identify the observed eigenvalue in this distribution to get the p-value

22/28

100 50 0

Frequency

150

Significance of the first dimension for perfumes data

0.35

0.40

0.45

0.50

0.55

0.60

0.65

0.70

First eigenvalue

23/28

Cinéma

L'instant Coco Mademoiselle Pure Poison J'adore (ET) J'adore (EP) Pleasures

-1.0 -0.5

Shalimar AromaticsElixir Chanel n°5

0.0 0.5 1.0 1.5 Dim 1 (17.8%)

Perfume data

2.0

-1.0

0.5

1.0

Lolita Lempicka

Dim 2 (10.14%) -0.5 0.0 0.5 1.0

Angel

-1.0 -0.5 0.0

Dim 2 (13.64%)

1.5

2.0

Confidence ellipses Coco Mademoiselle Angel Aromatics Elixir Cinéma Chanel n°5 Shalimar J'adore (ET) L'instantLolita Lempicka Pure Poison Pleasures J'adore (EP)

-1.5 -1.0 -0.5 0.0 0.5 1.0 Dim 1 (11.03%)

1.5

Random data 24/28

Second empirical indicator “

Ellipses overlapping

Total inertia

“

=

Between inertia

+

Within inertia

between inertia ≤1 Calculate the inertia ratio: 0 ≤ total inertia 25/28

150 100 50 0

Frequency

200

250

300

Significance of the inertia ratio for perfumes data

0.982

0.984

0.986

0.988

0.990

0.992

0.994

0.996

ratio

26/28

Conclusion “

MCA: suitable method for categorization data

“

Sensory data: few rows and many columns

“

By construction, many relationships

“

External validation

27/28

SensoMineR a package for sensory data analysis Journal of sensory studies (2008)

FactoMineR: an R package for multivariate analysis Journal of statistical software (2008)

http://www.agrocampus-rennes.fr/math/ 28/28

The applied mathematics department of Agrocampus organizes Paris Rennes

The R User Conference 2009 July 8-10 in Rennes, France

http://www.agrocampus-rennes.fr/math/useR-2009/