Discrimination of balsamic vinegar production methods ... - Marion Cuny

Nov 29, 2007 - Marion Cuny1,2, Delphine Bouveresse3, Michèle Lees1, Douglas Rutledge2. 1: Eurofins Scientific Analytics Nantes FR. 1: Eurofins Scientific ...
2MB taille 5 téléchargements 260 vues
28th-29th November 2007

Discrimination of balsamic vinegar production methods: Evolving Window Zone Selection S and Interval PLS-Cluster SC on 1H NMR spectra, followed by ICA Marion Cuny1,2, Delphine Bouveresse3, Michèle Lees1, Douglas Rutledge2 1: Eurofins

Scientific Analytics Analytics, Nantes Nantes, FR 2:

AgroParisTech, Paris, FR 3:

INRA P INRA, Paris, i FR www.eurofins.com

Traditional Balsamic Vinegar (TB) vs. Balsamic Vinegar (Ba)

2 areas: •Modena •Reggio Emilia PDO Lyon, 29th November 2007

2

Traditional Balsamic Vinegar (TB) vs. Balsamic Vinegar (Ba)

Production: TB: cooked must from Trebbiano grape varieties Ba: cooked must diluted with wine and caramel (2%)

Aging: TB: aged at least 12 years in barrels of different kinds of wood and of decreasing volume Ba: 60 days in wooden recipients

Price variations leading g to counterfeiting g Need for authentication methods Lyon, 29th November 2007

3

1H NMR spectroscopy

ƒ Simplified Si lifi d sample l preparation ti ƒ Rapid measurement ƒ Screening of all protonated molecules

Lyon, 29th November 2007

4

Presentation outline

1. Samples and data sets 2 Chemometric 2. Ch t i methods th d 3. Discrimination following Evolving Window Zone Selection 4. Discrimination following Interval-PLS_Cluster 5. Conclusion

Lyon, 29th November 2007

5

1. Samples and data sets 2 Chemometric 2. Ch t i methods th d 3. Discrimination following Evolving Window Zone Selection 4. Discrimination following Interval-PLS_Cluster 5. Conclusion

Lyon, 29th November 2007

6

Samples and Spectrum acquisition Samples :

1.

Sample and data sets

2.

Chemometric methods

3.

Discrimination with EWZS

4.

Discrimination with Interval PLS_Cluster PLS Cluster

5.

Conclusion

ƒ 15 TB / 14 Ba ƒ Ba: 7 from the market and 7 from producers in Emilia. 3 are aged. Sample preparation: Dipartimento di Chimica Organica e Industriale, Università di Parma. Parma Dilution of vinegar in D2O + TSP Data acquisition :

ƒ noesypr1d ƒ NS= NS 32 ƒ AQ= 1.4 sec ƒ Experiment duration: 3 3.5 5 min Lyon, 29th November 2007

7

Data pre-treatment

1.

Sample and data sets

2.

Chemometric methods

3.

Discrimination with EWZS

4.

Discrimination with Interval PLS_Cluster PLS Cluster

5.

Conclusion

ƒ Phase correction ƒ Baseline correction ƒ Warping ƒ Mean of 7 adjacent points : 32 K Æ 4692 variables ƒ Edge, Water and TSP suppression 4692 Æ 3539 variables ƒ Log transformation

Lyon, 29th November 2007

8

Data set

1.

Sample and data sets

2.

Chemometric methods

3.

Discrimination with EWZS

4.

Discrimination with Interval PLS_Cluster PLS Cluster

5.

Conclusion

An X(n,p) ( p) matrix of data containing g the values to trace the spectra p

X(n,p)

Recorded spectra p 8

x 10

Variable 1

Variable 2

Variable p

6

5

Sample 1

x1,1

x1,2

x1,p

4

Sample 2

x2,1

x2,2

x2,p

3

2

Sample p n

xn,1

xn,2

xn,p

1

0

10

9

8

7

6

5

4

3

2

1

ppm

Lyon, 29th November 2007

9

Data set after pre-treatment

1.

Sample and data sets

2.

Chemometric methods

3.

Discrimination with EWZS

4.

Discrimination with Interval PLS_Cluster PLS Cluster

5.

Conclusion

Complete dataset after pre-treatment

12

11

10

9

8

7

6

5

4 10

Lyon, 29th November 2007

9

8

7

6

5

4

3

2

1

0

10

1 Sample 1. S l and d data d t sets t 2. Chemometric methods 3. Discrimination following Evolving Window Zone Selection 4. Discrimination following Interval-PLS_Cluster 5 Conclusion 5.

Lyon, 29th November 2007

11

Why using variable selection? 3

x 10

4

1.

Sample and data sets

2.

Chemometric methods

3.

Discrimination with EWZS

4.

Discrimination with Interval PLS_Cluster

5.

Conclusion

V in e g a r s p e c t ru m

2 .5 Aromatic compounds Sugars

Acids

2

1 .5

1

0 .5

0

-0 .5 10

9

8

7

6

5 ppm

4

3

2

1

0

The relevant information may be anywhere in the spectrum No part of it should be neglected Lyon, 29th November 2007

12

Methodology x

3

2

.

5

1

.

5

0

.

5

0

.

5

1

0

4

V

1.

Sample and data sets

2.

Chemometric methods

3.

Discrimination with EWZS

4.

Discrimination with Interval PLS_Cluster

5.

Conclusion

i n

e

g

a

r

s

p

e

c

t

r

u

m

2

1

0

-

1

0

9

8

7

6

5 p

p

4

3

2

1

0

m

Variable selection EWZS

Interval-PLS_Cluster

ICA Prediction model Lyon, 29th November 2007

13

Selection of info : EWZS function 5

Dataset = Xn,2500

x 10

6

1.

Sample and data sets

2.

Chemometric methods

3.

Discrimination with EWZS

4.

Discrimination with Interval PLS_Cluster PLS Cluster

5.

Conclusion

4

Value to predict = Yn,1

3 2 1 0 0

1000

2000

Evolving Window Zone Selection function: A growing sliding window to test zone's ability to predict Y

Xn,1:500

Legend : Step =500 Mi i l size Minimal i = 500 Maximal size = 1500

Spectum Observation window

6

x 10 5 4 3 2 1 0 0

1000

2000

1st zone tested

In press : ACA,2007, Cuny,M. et al., Evolving Window Zone Selection method followed by Independent Component Analysis as useful

Lyon, 29th November 2007 chemometric tools to discriminate between grapefruit juice, orange juice and blends

14

Selection of info : EWZS function 5

x 10

6

1.

Sample and data sets

2.

Chemometric methods

3.

Discrimination with EWZS

4.

Discrimination with Interval PLS_Cluster PLS Cluster

5.

Conclusion

4

Dataset = Xn,2500

Value to predict = Yn,1

3 2 1 0 0

1000

2000

Evolving Window Zone Selection function: A growing sliding window to test zone's ability to predict Y

Xn,1:500

Xn,1:1000

Legend : Step =500 Mi i l size Minimal i = 500

Spectum Observation window

Maximal size = 1500 6

x 10 5 4 3 2 1 0 0

6

1000

2000

x 10 5 4 3 2 1 0 0

1000

2000

2nd zone tested

In press : ACA,2007, Cuny,M. et al., Evolving Window Zone Selection method followed by Independent Component Analysis as useful

Lyon, 29th November 2007 chemometric tools to discriminate between grapefruit juice, orange juice and blends

15

Selection of info : EWZS function 5

x 10

6

1.

Sample and data sets

2.

Chemometric methods

3.

Discrimination with EWZS

4.

Discrimination with Interval PLS_Cluster PLS Cluster

5.

Conclusion

4

Dataset = Xn,2500

Value to predict = Yn,1

3 2 1 0 0

1000

2000

Evolving Window Zone Selection function: A growing sliding window to test zone's ability to predict Y

Growing

Legend : Step =500 Mi i l size Minimal i = 500

6

x 10 5 4 3 2 1 0 0

6

1000

2000

6

Sliding

Xn,1:500

Xn,1:1000

Xn,1:1500

Xn,500:1000

Xn,500:1500

Xn,500:2000

x 10 5 4 3 2 1 0 0



Xn,1000:2500

x 10 5 4 3 2 1 0 0

x 10 5 4 3 2 1 0 0

6

1000

2000

6

1000

2000

6

Xn,1000:1500 Xn,1000:2000

Spectum Observation window

Maximal size = 1500

x 10 5 4 3 2 1 0 0

1000

2000

1000

2000

1000

2000

1000

2000

6

1000

2000

6

x 10 5 4 3 2 1 0 0

x 10 5 4 3 2 1 0 0

x 10 5 4 3 2 1 0 0

6

1000

2000

x 10 5 4 3 2 1 0 0

In press : ACA,2007, Cuny,M. et al., Evolving Window Zone Selection method followed by Independent Component Analysis as useful

Lyon, 29th November 2007 chemometric tools to discriminate between grapefruit juice, orange juice and blends

16

Selection of info : EWZS function

1.

Sample and data sets

2.

Chemometric methods

3.

Discrimination with EWZS

4.

Discrimination with Interval PLS_Cluster PLS Cluster

5.

Conclusion

Data reduction

Xn,1:500 n 1:500

Xn,1:1000 n 1:1000

Xn,1:1500 n 1:1500

Prediction

ICA PLS

Xn,500:1000

Xn,500:1500

Xn,500:2000

PCA PLS-DA PLS DA

Xn,1000:1500 Xn,1000:2000

Xn,1000:2500

Criterion

PLS Linear Regression

Yn,1

RMSECV R²





RMSECV (Xn,1:500)

RMSECV (Xn,1:1000)

RMSECV (Xn,1:1500 )

R² (Xn,1:500)

R² (Xn,1:1000)

R² (Xn,1:1500 )

RMSECV (Xn,500:1000)

RMSECV (Xn,500:1500)

RMSECV (Xn,500:2000 )

R² (Xn,500:1000)

R² (Xn,500:1500)

R² (Xn,500:2000 )

RMSECV (Xn,1000:1500) RMSECV (Xn,1000:2000) RMSECV (Xn,1000:2500 )

R² (Xn,1000:1500) R

R² R (Xn,1000:2000)

R R² (Xn,1000:2500 )





MAP

MAP RMSECV value

Selection

R² value

Selected zone = Xn,500:1500 …

… In press : ACA,2007, Cuny,M. et al., Evolving Window Zone Selection method followed by Independent Component Analysis as useful

Lyon, 29th November 2007 chemometric tools to discriminate between grapefruit juice, orange juice and blends

17

Interval-PLS_Cluster

1.

Sample and data sets

2.

Chemometric methods

3.

Discrimination with EWZS

4.

Discrimination with Interval PLS_Cluster PLS Cluster

5.

Conclusion

1) Apply the generalised PLS_Cluster algorithm g across the data set within a window moving 2) A dendrogram is obtained for each window 3) Selection of the “best” dendrograms gives a selection of informative intervals

Lyon, 29th November 2007

18

Generalised PLS_Cluster (1)

1.

Sample and data sets

2.

Chemometric methods

3.

Discrimination with EWZS

4.

Discrimination with Interval PLS_Cluster PLS Cluster

5.

Conclusion

„

Recursive method based on PLS

„

The data set is first separated into C1 groups

„

Each group is then split into C2i sub-groups, etc.

„

Iterated until all sub sub-groups groups are singletons

„

The number of clusters Ci is not fixed : - determined at each node of the dendrogram - based on the data structure

Lyon, 29th November 2007

19

Generalised PLS_Cluster (2)

1.

Sample and data sets

2.

Chemometric methods

3.

Discrimination with EWZS

4.

Discrimination with Interval PLS_Cluster PLS Cluster

5.

Conclusion

Original data are in X(n x p)

1. Set the first y vector (n x 1) to y = u1 ((normed PC1 score vector of column-centered X). )

2. Membership_function (y) 3 Calculate a 1-LV PLS model between X and y 3. Æ regression coefficients b (p x 1) Æ fitted ŷ* = X b 4. If y* has not stabilized

Lyon, 29th November 2007

Æ

replace y by y*

Æ

go to step g p3 20

Determination of C

1.

Sample and data sets

2.

Chemometric methods

3.

Discrimination with EWZS

4.

Discrimination with Interval PLS_Cluster PLS Cluster

5.

Conclusion

After convergence : (before applying the membership function) the ŷ elements are sorted by increasing value Calculate the difference between consecutive elements y value 1

0 25 0.25

0.9 0.8

Depending D di on th threshold h ld value, l two or three clusters are detected here.

0.2

0.7 0.6

0.15

0.5 0.4

0.1

0.3

Usually set to 3 * median

0.2 0.1 0

Lyon, 29th November 2007

0.05

0

5

10

15

20

0

0

5

10

15

20

21

Result on a particular interval

1.

Sample and data sets

2.

Chemometric methods

3.

Discrimination with EWZS

4.

Discrimination with Interval PLS_Cluster PLS Cluster

5.

Conclusion

12 10 8 6 4 2

0

500

1000

1500

2000

2500

3000

3500

4000

20 10 0 -10 -20

0

Lyon, 29th November 2007

10

20

30 Level

40

50

60

22

Independent y Component Analysis

1.

Sample and data sets

2.

Chemometric methods

3.

Discrimination with EWZS

4.

Discrimination with Interval PLS_Cluster PLS Cluster

5.

Conclusion

Aim : Recover the “pure” sources from mixed signals Based on the assumption of statistical independence of underlying ((“pure”) pure ) sources How ? By finding a demixing transformation that minimises dependencies among the estimates of the "pure" pure sources A Hyvarinen A. Hyvarinen, JJ. Karhunen Karhunen, and E E. Oja Oja, Independent Component Analysis Analysis, Wiley Wiley, New York York, 2001. 2001

Lyon, 29th November 2007

23

Classification with ICA

1.

Sample and data sets

2.

Chemometric methods

3.

Discrimination with EWZS

4.

Discrimination with Interval PLS_Cluster PLS Cluster

5.

Conclusion

1) ICA model on a matrix X(n-1,p) of n-1 individuals and p variables, using m components gives:

ƒ The m signals (loadings) : S(p,m), ƒ The new coordinates of the n-1 individuals in the new base P(n-1,m) 2) Projection of a new sample J(1,p) on the m signals: B(1,m) = J(1,p)* S(p,m) * inv(S(p,m)' * S(p,m)) 3) Calculation of the Mahalanobis distance to the barycenters of the g groups p 4) Classification in the nearest group 5) Rate of correct classification Lyon, 29th November 2007

24

1. Sample and data sets 2 Chemometric 2. Ch t i methods th d 3. Discrimination following Evolving Window Zone Selection 4. Discrimination following Interval-PLS_Cluster 5. Conclusion

Lyon, 29th November 2007

25

Results of EWZS function Max Regression R²

0

R² 0.7

500

0.6 1000 0.5

1.

Sample and data sets

2.

Chemometric methods

3.

Discrimination with EWZS

4.

Discrimination with Interval PLS_Cluster PLS Cluster

5.

Conclusion

Formic Acid

Zone 1

Zone 2

8 7

7

6

6

5 5

8 34 8.32 8.34 8 32

83 8.3

8 28 8.26 8.28 8 26 8.24 8 24 8.22 8 22

3500

7

0.3

6

0.2

5

0.1

9

Zone 3

50

Zone 1 Zone 2 Zone 3 Zone 4 Zone 5 Zone 6 Zone 7

100

150

start 690 955 1310 1890 1940 2300 2960

590

200

250 end 640 990 1340 1930 1980 2360 3130

7 52 7.52

75 7.5

6.7

6.68

Zone 5

7

6.66

6.64

5.28

?

5.24

5.22

5.2

Zone 6

8 6

5.16

10

5.26

10

6

12

α-glucose

Zone 4

8 6

8

0

7 54 7.54

10

0.4 2000

3000

7 56 7.56

HMF

1500

2500

7 58 7.58

5.14

5.12

5.1

5.08

5.06

Zone 7

4

3.95

3.9

Glucose-Fructose

Balsamic l i : ---Traditional balsamic:----

8 6 2.25 2.2 2.15 2.1 2.05

2

1.95

Corresponding zones

Acetic acid Lyon, 29th November 2007

26

ICA model on selection 3 components

1.

Sample and data sets

2.

Chemometric methods

3.

Discrimination with EWZS

4.

Discrimination with Interval PLS_Cluster PLS Cluster

5.

Conclusion

2

2

0

0

28.34 2

8.32

8.3

8.28

8.26

8.24

8.22

7.54

7.52

7.5

0

6.7

6.68

6.66

5

6.64

5

5

0

0

55.16

7.56

5

0 2

2 7.58

5.14

5.12

5.1

5.08

5.06

5

5.28

5.26

4

5.24

3.95

5.22

5.2

3.9

IC 1 ------IC 2 ------IC 3 -------

0

2.25

Lyon, 29th November 2007

2.2

2.15

2.1

2.05

2

1.95

IC1 looks like the signal of -TB 27

Discrimination of sample on IC1

1.

Sample and data sets

2.

Chemometric methods

3.

Discrimination with EWZS

4.

Discrimination with Interval PLS_Cluster PLS Cluster

5.

Conclusion

Ba a

12 1.2

All samples correctly classified by leave-1-out leave 1 out cross-validation cross validation (the 3 LVs are taken into account)

1.1 Ba

Ba

1 Ba

IC2

09 0.9

Ba

Ba Ba

Ba Ba

0.8

Ba Ba

0.7 0.6

TB TB TB TB TB TB TB TB TB TB

Ba Ba TB TB TB

TB

0.5 -1.1 11

Lyon, 29th November 2007

-1 1

-0.9 09

Ba

TB

-0.8 08

-0.7 07 -0.6 06 IC1

-0.5 05

-0.4 04

-0.3 03

-0.2 02

28

1. Sample and data sets 2 Chemometric 2. Ch t i methods th d 3. Discrimination following Evolving Window Zone Selection 4. Discrimination following Interval-PLS_Cluster 5. Conclusion

Lyon, 29th November 2007

29

Reference PLS-Cluster

1.

Sample and data sets

2.

Chemometric methods

3.

Discrimination with EWZS

4.

Discrimination with Interval PLS_Cluster PLS Cluster

5.

Conclusion

4

4

Spectral zone

x 10

2 0 -2 10

9

8

7

6

5 4 ppm In green: TB and in red: Ba

3

2

1

0

20 28

10

27 21 20

0

23

18 17 4

19 3 25 16 24

10 6

-10 -20

29 22 26

0

Lyon, 29th November 2007

5

10

15 Dendrogram

20

11 15

14 13 9

8

5 12 2 7 1

25

30 30

Interval PLS-Cluster

1.

Sample and data sets

2.

Chemometric methods

3.

Discrimination with EWZS

4.

Discrimination with Interval PLS_Cluster PLS Cluster

5.

Conclusion

Parameters: window size: 80 p points, step: p 20 p points

Lyon, 29th November 2007

31

Selected zones

1.

Sample and data sets

2.

Chemometric methods

3.

Discrimination with EWZS

4.

Discrimination with Interval PLS_Cluster PLS Cluster

5.

Conclusion

17 zones selected : some are contiguous Æ 9 zones

Tyr

600

30

x 10

4

2.5

10000

2

400

400

20

1.5

5000 200 0

8.3 x 10

7.6

7.5

7.1

Sugar

4

1000

5.2

5.1

4000

Β-glu

1 2000

0.5 3.6

3.4

Lyon, 29th November 2007

3.8

1000 Same as EWZS selection

600

15 1.5

3.9

800

6000

2

0

7

Ba en noir, TB en rouge

8000

2.5

05 0.5

0

0

8.1

1

10

200

0

3.2

3.1

3

500

400 200

Citric and malic acids 2.9

2.8

0

2.65

2.55

Succinic Acid 32

Clustering on the selected zone Zone 1

1.

Sample and data sets

2.

Chemometric methods

3.

Discrimination with EWZS

4.

Discrimination with Interval PLS_Cluster PLS Cluster

5.

Conclusion

Spectral zone

600

Formic acid

?

400

TB

200

Ba

0 -200 200 8.35

8.3

8.25

8.2

28

10

20

8

27 26 19 18

29 1615 21 8 10 25

2423 22 14 93

-10

Lyon, 29th November 2007

8.05

From Emilia

17

0

-20 0

8.1

ppm 1

20

Aged 6 years

8.15

5

10

15

20

25

7 16

From the F th market

1312 11 4 5 2

30

35 33

Discrimination of the samples on IC1 and IC2

3000

1.

Sample and data sets

2.

Chemometric methods

3.

Discrimination with EWZS

4.

Discrimination with Interval PLS_Cluster PLS Cluster

5.

Conclusion

All samples correctly classified by leave-1-out cross-validation (the 3 components are taken into account)

TB TB TB

2500

TB

B Ba TB

2000

IC2

TB

TB TB TB

1500 TB TB

Ba Ba

1000 TB TB TB

500

-3500 3500

Lyon, 29th November 2007

-3000 3000

-2500 2500

-2000 2000 IC1

-1500 1500

-1000 1000

Ba Ba TB

Ba Ba B Ba Ba Ba BaBa BaBa

-500 500

34

1. Sample and data sets 2 Chemometric 2. Ch t i methods th d 3. Discrimination following Evolving Window Zone Selection 4. Discrimination following Interval-PLS_Cluster 5. Conclusion

Lyon, 29th November 2007

35

Conclusion

1.

Sample and data sets

2.

Chemometric methods

3.

Discrimination with EWZS

4.

Discrimination with Interval PLS PLS-Cluster Cluster

5.

Conclusion

ƒ A procedure combining variable selection (EWZS or Interval Interval-PLS PLS_Cluster) Cluster) and ICA applied to 1H NMR spectra was developed to discriminate more p traditional balsamic vinegar g from industrial balsamic vinegar g expensive

ƒ The zones detected by EWZS correspond to compounds that are related t the to th aging i process : volatility l tilit (acetic ( ti acid), id) degradation d d ti (HMF), (HMF) etc. t

ƒ The zones detected by y Interval-PLS_Cluster were also related to aging g g processes, but also to other mechanisms such as sugar modification

ƒ Due D tto it its non-supervised i d nature, t Interval-PLS_Cluster I t l PLS Cl t is i able bl to t detect d t t different clustering factors. In the present case, it seems to have detected subpopulations within the two vinegar groups. groups Lyon, 29th November 2007

36

Acknowledgment

For their help, advise and time: My colleagues at AgroParisTech and Eurofins For financial support: uo s Eurofins French Ministry of Research support of the convention CIFRE n° 169-2005

Lyon, 29th November 2007

37

Acknowledgment

Th k you for Thank f your attention tt ti

Lyon, 29th November 2007

38