Food authentication by 1H NMR ti bi ti ith ... - Marion Cuny

Aim: assess the potential of 1H NMR spectroscopy and chemometric .... Classification method. ▫ Leave-one-out cross-validation. D t d ti i P i i l( I d. d t)C .... No Log. ICs 2 and 3. 88. Log. Log transformation enhances the prediction by ICA.
14MB taille 5 téléchargements 236 vues
Food authentication by 1H NMR spectroscopy t i combination in bi ti with ith different chemometric tools Application to fruit juices, yoghurts and vinegars PhD Viva 7th March 2008 Marion Cuny

www.eurofins.com

Presentation outline 1. Food authentication ? 2 Why 1H NMR spectroscopy ? 2. 3. General methodology 4 Acquiring spectra 4. 4.1. Sample preparation 4 2 Measurements 4.2.

5. Data Treatment 5 1 Classification method: ICA vs 5.1. vs. PCA 5.2. Pre-treatment of the spectra

Example of orange juice

5.3. Variable selection : CLV / EWZS

6. Results on other matrices: yoghurts and vinegars 7 Conclusions 7. 2

1. Food authentication ? 2. Why 1H NMR spectroscopy ? 3. General methodology 4. Acquiring spectra 4.1. Sample preparation 4.2. Measurements

5. Data Treatment 5.1. Classification method: ICA vs. PCA 5.2. Pre-treatment of the spectra

Example of orange juice

5.3. Variable selection : CLV / EWZS

6. Results on other matrices: yoghurts and vinegars 7. Conclusions www.eurofins.com

What does authenticity mean? Authenticating food is being sure that the product is really what it is supposed to be. What kinds of adulteration are we looking for? Origin: • Country • Variety

Production: • Method : Organic / Conventional • Vintage Vi t

Ingredients: • Natural / Synthesized • Undeclared addition

Why it is important to protect the market from adulteration? Fair market place Consumer safety & Consumer trust 4

Orange juice authentication

Orange juice: Most consumed juice worldwide 500 000 000 litres a day* Usual economic frauds in fruit juices: • • • •

Addition of water Additi off sugar Addition Use of co-fruits Addition of others compounds

such as flavours or acids to hide fraud…

* Terra Economica

5

Dairy product authentication

Yoghurt:

ƒ a dairy product produced by bacterial fermentation of milk.

ƒ up to 30 % of added compounds Possible frauds:

ƒ Origin and content of the proteins ƒ Process : • Fruit F it percentage t • Bacteria

ƒ Addition of undeclared compounds 6

Balsamic vinegar authentication

Protected Denomination of Origin

PDO: 2 areas: Modena and Reggio Emilia

Traditional Balsamic (TB)

Balsamic (Ba)

Production

cooked must from Trebbiano grape varieties

cooked must diluted with wine and caramel (2%)

Aging

aged at least 12 years in barrels of different kinds of wood and of decreasing volume

60 days in wooden recipients

7

How can we detect a fraudulent product?

Different sorts of fraud When you get a sample : ƒ What is the possible fraud? ƒ What is the relevant information?

How can I find it? ƒ Sensory characteristics ƒ Analytical approaches

8

Analytical approaches ƒ Target analyte approach • Direct analysis of adulterant or authenticity indicator • Comparison with expected values

ƒ Fingerprint / multivariate approach Pattern comparison of authentic and adulterated products • Multiple physico-chemical measurements • Profiles

9

Analytical approaches ƒ Target analyte approach • Direct analysis of adulterant or authenticity indicator • Comparison with expected values

ƒ Fingerprint / multivariate approach Pattern comparison of authentic and adulterated products • Multiple physico-chemical measurements • Profiles

10

Analytical approaches

ƒ Fingerprint / multivariate approach Pattern comparison of authentic and adulterated products • Multiple physico-chemical measurements • Profiles 1H

NMR spectroscopy

11

1. Food authentication ? 2. Why 1H NMR spectroscopy ? 3. General methodology 4. Acquiring spectra 4.1. Sample preparation 4.2. Measurements

5. Data Treatment 5.1. Classification method: ICA vs. PCA 5.2. Pre-treatment of the spectra

Example of orange juice

5.3. Variable selection : CLV / EWZS

6. Results on other matrices: yoghurts and vinegars 7. Conclusions www.eurofins.com

Why 1H NMR spectroscopy ? ƒ Simplified sample preparation ƒ Rapid measurement ƒ Screening of all protonated

molecules l l even for f complex l mixtures Spectrum of a strawberry yoghurt

8

x 10

2.5

Fast screening method for industry control

2

1.5

1

0.5

0 10

8

6

4

2

0

chemical shift (ppm)

13

1. Food authentication ? 2. Why 1H NMR spectroscopy ? 3. General methodology 4. Acquiring spectra 4.1. Sample preparation 4.2. Measurements

5. Data Treatment 5.1. Classification method: ICA vs. PCA 5.2. Pre-treatment of the spectra

Example of orange juice

5.3. Variable selection : CLV / EWZS

6. Results on other matrices: yoghurts and vinegars 7. Conclusions www.eurofins.com

General methodology

8

20

x 10

15

10

5

0

-5 9

Sample homogenization

Sample collection

Sample conservation

8

7

6

5

4 ppm

3

2

1

0

Tube filling

Sample preparation

NMR measurement

Spectra

15

-1

Methodology 8

20

x 10

3

x

1 0

4

V in e g a r

s p e c t ru m

15 2 . 5

2

10 1 . 5

1

5 0 . 5

0

0

-0 . 5

-5

9

8

7

6

5

4 ppm

3

2

1

0

1 0

9

8

7

6

5 p p m

4

3

2

1

0

-1

On criteria: Warping Log transformation Data reduction (average)

Variance

Independent Component

Clustering of Variables (CLV)

Analysis

(ICA)

Principal Component

Interval selection:

Analysis (PCA)

Interval PLS (iPLS)

Partial Least Squares

Evolving Window Zone Selection

regression

Authentic Adulterated Quantity

(PLS)

(EWZS) Interval-PLS Interval PLS_Cluster Cluster

Spectra g homogenisation

Variable selection

Analysis

Diagnosis

16

In today’s presentation

8

20

x 10

15

10

5

0

-5 9

Sample homogenization

Sample collection

Sample conservation

8

7

6

5

4 ppm

3

2

1

0

Tube filling

Sample preparation

NMR measurement

Spectra

17

-1

In today’s presentation 8

20

x 10

15

3

x

1 0

4

V in e g a r s p e c t ru m

2 .5

10

2

1 .5

1

5

0 .5

0

0

-0 .5

-5

9

8

7

6

5

4 ppm

3

2

1

0

1 0

9

8

7

6

5 p p m

4

3

2

1

0

-1

On criteria: Warping Log transformation

Variance Clustering of Variables (CLV)

Data reduction

Interval selection:

(average)

Interval PLS (iPLS) Evolving Window Zone Selection (EWZS)

Independent Component Analysis (ICA) Principal Component Analysis (PCA)

Authentic Adulterated Quantity

Partial Least Squares regression (PLS)

Interval-PLS Interval PLS_Cluster Cluster

Spectra g homogenisation

Variable selection

Analysis

Diagnosis

18

1. Food authentication ? 2. Why 1H NMR spectroscopy ? 3. General methodology 4. Acquiring spectra 4.1. Sample preparation 4.2. Measurements

5. Data Treatment 5.1. Classification method: ICA vs. PCA 5.2. Pre-treatment of the spectra

Example of orange juice

5.3. Variable selection : CLV / EWZS

6. Results on other matrices: yoghurts and vinegars 7. Conclusions www.eurofins.com

Example of orange and grapefruit juices A known fraud A standard method of detection: IFU* 58 Quantification of Naringin and Hesperidin Study in collaboration with IFR Norwich : ƒ 92 samples : • 59 orange juices (OJ) • 23 grapefruit juices (GJ) Cl Classification ifi ti off the th samples l • 10 blends (OG)

into i t 3 groups

using information contained in the spectra

ƒ Variation around the process:

• Long life / chilled • From concentrate / pure juice • Origin

ƒ Aim: assess the potential of 1H NMR spectroscopy and chemometric tools to discriminate samples from the industry

* IFU : International Federation of Fruit Juice Producers

20

1. Food authentication ? 2. Why 1H NMR spectroscopy ? 3. General methodology 4. Acquiring spectra 4.1. Sample preparation 4.2. Measurements

5. Data Treatment 5.1. Classification method: ICA vs. PCA 5.2. Pre-treatment of the spectra

Example of orange juice

5.3. Variable selection : CLV / EWZS

6. Results on other matrices: yoghurts and vinegars 7. Conclusions www.eurofins.com

What is a good spectrum for authentication? Being similar to other spectra of the same matrix Means

Needs Same information

Stable experimental conditions pH adjustment Sample preparation

Same chemical shifts Same shape

Warping Phase correction Data pre-treatment

Same intensity

Good resolution

Base-line correction Acquisition parameters

N Numerous scans

Longer time of experiment (NS)

Numerous data points (around 32000)

Longer time of experiment (TD)

22

Sample preparation

When ?

Why ? Same physico physicochemical state

Standardisation of the samples

Comparable intensities Same height for shim correction

For Juices

Storage until experiment

Freezing

pH adjustment

4 +/- 0.02

No particles Stable magnetic field

Filling of the tube

How ?

Same viscosity / concentration : Brix adjustment by dilution

Same amount of sample, solvent and lock

Centrifugation 25min / 4500rpm No adjustment 750 μL 250 μL D2O

23

Acquisition parameters Why y?

Parameter

How ?

For Juices

Spectrometer

Control of environment

600 MHz

Temperature

Fixed / influence on viscosity

27°C 27 C

Water peak presaturation

Noesypr1d

AQ : Acquisition Time

Enough to have all the information

2.8 sec

D1 : relaxation delay

Enough to have high intensity

Stable magnetic field Recorded information Pulse sequence Parameters to increase resolution

secthe Every yp parameter that increases the quality q y2 of Not in quantitative condition Digital Resolution : DR spectrum also increases the time of experiment! TD : time domain Numerous datapoints 16384 DR = 2SW / TD DR = 1 / AQ

SW : spectral t l width idth

Parameter to increase NS : number of scans Signal g / Noise

Small spectral p window of observation

11 97 ppm 11.97

Numerous NS ; S/N

128

√(NS)

24

1. Food authentication ? 2. Why 1H NMR spectroscopy ? 3. General methodology 4. Acquiring spectra 4.1. Sample preparation 4.2. Measurements

5. Data Treatment 5.1. Classification method: ICA vs. PCA 5.2. Pre-treatment of the spectra

Example of orange juice

5.3. Variable selection : CLV / EWZS

6. Results on other matrices: yoghurts and vinegars 7. Conclusions www.eurofins.com

Classification method ƒ Leave-one-out cross-validation. ƒ Data D t reduction d ti using i P Principal i i l (or ( Independent) I d d t) Components. C t ƒ Projection of the samples onto the different component spaces. ƒ Left-out Left out sample attributed to the nearest group group, based on the distances from group barycenters.

ƒ All the component p spaces p are taken into account,, best results kept. p With 3 components Æ 7 different spaces : C1 C2 C3 C1&C2 C1&C3 C2&C3 C1&C2&C3

1 Dimension 2 Dimensions 3 Dimensions 26

Classification method In a 2D plane: d1 : Euclidian distance from the sample to goup 1 barycenter y d2 : Euclidian distance from the sample to goup 2 barycenter

1 0.9 Group 2 barycenter 08 0.8 Sample to classify

0.7

d2

0.6

Here d1>d2 so the sample is attributed to group 2.

05 0.5

d1 Group 1 barycenter

0.4 0.3 0.2 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

27

PCA vs. ICA PCA: Principal Component Analysis ƒ ƒ

Classical method of unsupervised group detection Looks for orthogonal directions maximising the dispersion of samples

ICA: Independent Component Analysis ƒ

Based on the (logical) assumption of statistical independence of underlying ("pure") sources

ƒ ƒ

Looks for pure signals in a data set of mixed signals Calculates a demixing transformation that minimises dependencies between the estimates of the "pure" sources

Two types of information summarised in these components 28

Aromatic zone of the fruit juices x 10

Grapefruit juice

5

15 10 5

Gaba

8

x 10

7.8

7.6

N i i Naringin

7.4

Naringin

N i i Gaba* Naringin G b *

7.2

7 ppm

6.8

6.6

6.4

Orange juice

5

15 10 5

Hesperidin 8

6.2

7.8

7.6

7.4

7.2

7

Hesperidin 6.8

6.6

6.4

6.2

ppm

x 10

Blend of orange and grapefruit

5

15 10 Adenosine? Adenosine? 5 8

78 7.8

* Gaba : γ butiric amino acid

76 7.6

Tyrosin Arginine

74 7.4

72 7.2

7 ppm

Arginine

68 6.8

66 6.6

Phlorin 64 6.4

62 6.2

29

Decomposition into 3 Principal Components Principal Component 1 0.1

Principal Component 2 0.05

0.05

0

0 8

7.8

7.6

7.4

7.2

7 ppm

6.8

6.6

6.4

6.2

-0.05 8

7.8

7.6

7.4

7.2

7 ppm

6.8

6.6

6.4

6.2

1H NMR spectrum - OJ mean 1H NMR spectrum GJ mean 5

x 10

Principal Component 3

10 0.05 0

5

-0.05

0

8

8

7.8

7.6

7.4

7.2

7 ppm

6.8

6.6

6.4

7.8

7.6

7.4

7.2

7 ppm

6.8

6.6

6.4

6.2

6.2

Axis of maximum dispersion = Difference between the mean spectra of GJ and OJ

No apparent explanation

30

Decomposition into 3 Independent Components p Independent component 1 4

1° source = Mean signal of OJ

2

0 8

7.8

7.6

7.4

7.2

7 ppm

6.8

6.6

6.4

6.2

6.4

6.2

OJ mean 1H NMR spectrum

5

x 10 5 4 3 2 1 8

7.8

7.6

7.4

7.2

7 ppm

6.8

6.6

31

Decomposition in 3 independent components p Independent component 2 8

2° source ~ Difference Diff between mean signal of GJ and OJ

6 4 2

~ PC 1

0 8

7.8

7.6

7.4

7.2

7 ppm

6.8

6.6

6.4

6.2

GJ mean 1H NMR spectrum - OJ mean 1H NMR spectrum 5 x 10

10

5

0 8

78 7.8

76 7.6

74 7.4

72 7.2

7 ppm

68 6.8

66 6.6

64 6.4

62 6.2

32

Decomposition in 3 independent components p Independent component 3 6 4

3° source = Difference between GJ and OJ nott explained l i d by b IC 2

2 0 -2 -4 8

7.8

7.6

7.4

7.2

7 ppm

6.8

6.6

6.4

6.2

GJ mean 1H NMR spectrum - OJ mean 1H NMR spectrum - 100000xIC 2 400000 200000 0 -200000 8

78 7.8

76 7.6

74 7.4

72 7.2

7 ppm

68 6.8

66 6.6

64 6.4

62 6.2

33

Classification of samples

Total: 92 samples

Number of correctly classified samples

Components used

p for a combination of components

PCA

82

PCs 1 to 3

ICA

86

ICs 2 and 3

Best model of discriminaton with ICs 2 and 3, g the difference between OG and GJ explaining 34

Conclusion on ICA vs. PCA ƒ All of the ICA sources found have a physico-chemical explanation ƒ Apart from PC1, it is not possible to link PCs to signal information

ƒ Better classification rate for ICA

ƒ We W will ill use only l ICA in i the th restt off the th study. t d

35

1. Food authentication ? 2. Why 1H NMR spectroscopy ? 3. General methodology 4. Acquiring spectra 4.1. Sample preparation 4.2. Measurements

5. Data Treatment 5.1. Classification method: ICA vs. PCA 5.2. Pre-treatment of the spectra

Example of orange juice

5.3. Variable selection : CLV / EWZS

6. Results on other matrices: yoghurts and vinegars 7. Conclusions www.eurofins.com

Why pre-treat the data?

To homogenise g the data… unbias and speed up th subsequent the b t analytical l ti l treatments t t t

37

Problem of scale x 10

8

2.5

2

1.5

1

0.5

0 10

x 10

8

6 chemical shift (ppm)

4

2

0

5

4

Scale 105

3 2 1 0 10

9.5

9

8.5

8

7.5

7

6.5

6

chemical shift (ppm)

x 10

Relevant information can be p present at different levels : differences in intensity can be of 10^3.

8

2.5

Scale 108

2

1.5

1

0.5

May bias the multivariate results

0 4.6

4.4

4.2

4

3.8

3.6

3.4

3.2

chemical shift (ppm)

38

LOG Transformation 8

x 10

20

4

15

10

2

5

0 10

9

8

7

6

5 4 ppm

3

2

1

10

0

9

8

7

6

5

4

3

2

ppm ppm

6

x 10 25 2.5

18

2 16

1.5 14

1 12

0.5 10

0 9

8.5

8

7.5

7 ppm

6.5

6

5.5

5

9

8.5

8

7.5

7

6.5

6

5.5

5

ppm

Scale modification and only small transformation of the peak shape

39

Classification of samples On the whole dataset - model with 4 ICs. Total: 92 samples

Number of correctly classified samples

Components used

for a combination of components

No Log

77

ICs 1 and 4

Log

88

ICs 2 and 3

Log transformation enhances the prediction by ICA ( (same results lt were found f d for f PCA) 40

Homogeneisation thanks to: ƒ Phase correction ƒ Base-line correction ƒ Warping : • “Correlation “C l ti O Optimised ti i d Warping” W i ” with ith linear li interpolation i t l ti • Adds and subtracts points from each spectrum to maximise its alignment with a reference spectrum

41

Homogeneisation thanks to:

Citric acid

Succinic acid

ƒ Warping

x 10

8

2

Malic acid

1.5

Before Warping p g

1

Shifts dependent on pH

0.5 0

100 x 10

200

300

400

500

600

700

8

2 1.5

After Warping

1 0.5 0

100

200

300

400

500

600

700

42

Effect of warping on signal decomposition by y ICA Before Warping

IC 1 6

IC 1

IC 1 6

6 4 2 0

4 2 0

4 2 0

100 100

200

300

400

500

600

200

300

700

400

500

600

100

700

200

300

400

500

600

700

500

600

700

500

600

700

500

600

700

IC 2

IC 2

6

IC 2 6

4

4

4

2

2

2

0

0

100

0 100

200

300

400

500

600

200

300

400

500

600

700

100

200

300

6

2 components

400

IC 3

700

IC 3

4

6

2

4

0

2

100

200

300

400

500

600

700

0 100

200

300

400 IC 4

3 components

6 4 2 0 100

For all the components, the contribution of the citric acid signal is like a 1° derivative

200

300

400

4 components

43

Effect of warping on signal decomposition by y ICA After Warping 8 6 4

IC1

2 0 -2

0

100

200

300

400

500

600

700

800

10

Warping enhances the quality of signal decomposition 5

IC2 0

-5

0

100

200

300

400

500

600

700

800

The contribution of citric acid is found in IC1 IC2 contains both the malic acid and succinic acid contributions 44

Conclusion on signal pretreatments The analyses are improved by signal pretreatment because it :

ƒ Homogenises the dataset (Log and warping) ƒ Enhances IC decomposition (warping) In addition, the averaging of 7 points (not presented here) :

ƒ Reduces the data analysis time (data size) ƒ These 3 treatments will be applied in the rest of the study

45

1. Food authentication ? 2. Why 1H NMR spectroscopy ? 3. General methodology 4. Acquiring spectra 4.1. Sample preparation 4.2. Measurements

5. Data Treatment 5.1. Classification method: ICA vs. PCA 5.2. Pre-treatment of the spectra

Example of orange juice

5.3. Variable selection : CLV / EWZS

6. Results on other matrices: yoghurts and vinegars 7. Conclusions www.eurofins.com

Where is the relevant information in the spectrum? p Examples : Aromatics area

Sugars area

Acids area

8

x 10

3.5

3

2.5

2 x

1 0

5

6

1.5 5

4

3

1

2

1

0

0.5

-1 8

7 . 5

7

6 . 5 p p m

6

5 . 5

5

0 10

8

6

4

2

0

-2

ppm

Relevant information can be over the whole spectrum Where is the indicator of authenticity located? Variable selection 47

Types of variable selection method studied

a. Based on criteria on the variables : ƒ Variance ƒ Clustering of Latent Variables (CLV)

b. Based on interval selection : ƒ Interval I t l PLS (iPLS) ƒ Evolving Window Zone Selection (EWZS) ƒ Interval-PLS_Cluster I l PLS Cl

48

Types of variable selection method studied

a. Based on criteria on the variables : ƒ Variance ƒ Clustering of Latent Variables (CLV)

b. Based on interval selection : ƒ Interval I t l PLS (iPLS) ƒ Evolving Window Zone Selection (EWZS) ƒ Interval-PLS_Cluster I l PLS Cl

49

CLV Clustering of Latent Variables

ƒ CLV clusters variables around latent variables ƒ Iterative procedure: does a hierarchical cluster analysis of variables ƒ 1st step: clustering of 2 variables linked either by their correlation or their covariance (covariance here)). The 2 variables replaced by a latent variable = the 1st PC.

ƒ 2nd step: consolidation. Each former variable is assigned to the more correlated latent components. And the consolidated latent variables are calculated. Æ step 1…

50

CLV

ƒ At each clustering step, the latent variables are clustered by minimizing the loss of information (or T criterion)

ƒ T is the sum of the first eigenvalue of each matrix of clustered variables

ƒ Advantages : ƒ No a priori knowledge on data, ƒ Variables that are not spatially linked can be related

51

CLV selection method Dendrogram before consolidation

Variance explained by the latent component = T

∆T evolution ~ loss of info

Number of groups

Clustering the 4 Latent Variables into 3 leads to a loss in information. So it is better to keep 4 Latent Variables. 52

ANOVA on the 4 latent components to predict juice p j types yp Source

Sum off S Squares

df d.f.

Mean M Squares

F

Prob > F

Component 1

0.019

2

0.009

0.85

0.429

Component 2

0.669

2

0.335

90.02

0

Component 3

0.010

2

0.005

0.44

0.645

Component 4

0 014 0.014

2

0 007 0.007

0 61 0.61

0 544 0.544

Only the 2° component is significant for predicting juice types types. 53

CLV selection 19 α-glucose α glucose

Not selected Selected

ß-glucose

18 Citric acid 17 Alanine

16 15

Naringin Histidine

14

Formic acid Niacin

13

Leucine Naringin

12 11 10

9

8 Tyrosine

Hesperidin 7 6

Naringin 5 ppm

4

3

2

1

0

878 variables are selected with CLV CLV. 54

Selected compounds in OJ and GJ

Compound glucose fructose naringin i i

Content in orange juice 20-35 g/l 20-35 g/l 0 0 mg/g 0.0 / iin N Navell and d Valencia cultivars (USDA)

alanine l i citric acid leucine

60-205 60 205 mg/l/l 6.3-17 g/l 3-15 mg/l

Content in grapefruit juice 20-50 g/l 20-50 g/l maximum i 1200 mg/l/l (AIJN) (AIJN), mean of 0.38mg/g on 9 cultivars (USDA) 62 180 mg/l/l 62-180 8-20 g/l 1-10 mg/l

Data from AIJN and USDA

CLV has selected compounds varying in the different juices. Maybe y not all are really y markers but they y do vary y in this dataset. 55

Classification of samples

Total: 92 samples

Number of correctly classified samples

Components used

for a combination of components

Pretreated dataset

88

IC 3

CLV selection l ti

89

IC 1 to ICs t 4

Small increase in correct classification rate after CLV selection Better understanding

56

EWZS

Evolving Window Zone Selection

ƒ EWZS divides the dataset into subsets of variables by using a window with a varying position and size

ƒ It tests the ability of the zone within each window to predict a value. Here the group membership

57

EWZS function 5

Dataset = Xn,2500

x 10

6

4

Value to predict = Yn,1

3 2 1 0 0

1000

2000

Evolving Window Zone Selection A growing sliding window to test zone's ability to predict Y

Xn,1:500

Legend : Step =500 Mi i l size Minimal i = 500 Maximal size = 1500

Spectum Observation window

6

x 10 5 4 3 2 1 0 0

1000

2000

1st zone tested

58

EWZS function 5

x 10

6

4

Dataset = Xn,2500

Value to predict = Yn,1

3 2 1 0 0

1000

2000

Evolving Window Zone Selection function: A growing sliding window to test zone's ability to predict Y

Xn,1:500

Xn,1:1000

Legend : Step =500 Mi i l size Minimal i = 500

Spectum Observation window

Maximal size = 1500 6

x 10 5 4 3 2 1 0 0

6

1000

2000

x 10 5 4 3 2 1 0 0

1000

2000

2nd zone tested

59

EWZS function 5

x 10

6

4

Dataset = Xn,2500

Value to predict = Yn,1

3 2 1 0 0

1000

2000

Evolving Window Zone Selection function: A growing sliding window to test zone's ability to predict Y

Growing

Legend : Step =500 Mi i l size Minimal i = 500

6

x 10 5 4 3 2 1 0 0

6

1000

2000

6

Sliding

Xn,1:500

Xn,1:1000

Xn,1:1500

Xn,500:1000

Xn,500:1500

Xn,500:2000

x 10 5 4 3 2 1 0 0



Xn,1000:2500

x 10 5 4 3 2 1 0 0

x 10 5 4 3 2 1 0 0

6

1000

2000

6

1000

2000

6

Xn,1000:1500 Xn,1000:2000

Spectum Observation window

Maximal size = 1500

x 10 5 4 3 2 1 0 0

1000

2000

1000

2000

1000

2000

1000

2000

6

1000

2000

6

x 10 5 4 3 2 1 0 0

x 10 5 4 3 2 1 0 0

x 10 5 4 3 2 1 0 0

6

1000

2000

x 10 5 4 3 2 1 0 0

60

EWZS function Data reduction

Xn,1:500 n 1:500

Xn,1:1000 n 1:1000

Xn,1:1500 n 1:1500

Prediction

ICA PLS

Xn,500:1000

Xn,500:1500

Xn,500:2000

PCA PLS-DA PLS DA

Xn,1000:1500 Xn,1000:2000

Xn,1000:2500

Criterion

PLS Linear Regression

Yn,1

RMSECV R²





RMSECV (Xn,1:500)

RMSECV (Xn,1:1000)

RMSECV (Xn,1:1500 )

R² (Xn,1:500)

R² (Xn,1:1000)

R² (Xn,1:1500 )

RMSECV (Xn,500:1000)

RMSECV (Xn,500:1500)

RMSECV (Xn,500:2000 )

R² (Xn,500:1000)

R² (Xn,500:1500)

R² (Xn,500:2000 )

RMSECV (Xn,1000:1500) RMSECV (Xn,1000:2000) RMSECV (Xn,1000:2500 )

R² (Xn,1000:1500) R

R² R (Xn,1000:2000)

R R² (Xn,1000:2500 )





MAP

MAP RMSECV value

Selection

R² value

Selected zone = Xn,500:1500 …

… 61

Selection of intervals by EWZS Max Regression R2 0 0.9 200 0.8

400

0.7

600 800

06 0.6

1000

0.5

1200

0.4

1400 0.3 1600

R² value 0,9174 0,8892 0,9296 0,9441 0,9538 0,8904 0,9243 0,9195

Beginning 485 506 748 768 990 1314 1597 1778

End 735 632 915 831 1033 1357 1827 1883

interval 1

interval 2 interval 3 interval te a 4 interval 5 interval 5

0.2

blue: redundancy; red: best result; green: concatenated

1800 0.1

2000 0

50

100

150

200

250

In the coloured map of R² we can select zones that show high values

62

Selection of intervals by EWZS Naringin

Interval 1

Interval 3 Naringin

GJ

14

GJ

Interval 2

Tyrosine

GJ

GJ

14

12

16 14

12

12

7.6

7.4

7.2

7

6.8

6.6

OG

Arginine

14

OG

6.3

6.25

6.2

6.15

6.1

6.05

5.18 5.16 5.14 5.12

5.08 5.06 5.04 5.02

5

4.98

5

4.98

5

4.98

OG

14

16

13

Naringin

14

12

12

5.1

OG

12

7.6

Gaba

OJ

7.4

7.2

7

6.8

OJ

6.6

6.3

Hesperidin

6.25

6.2

6.15

6.1

6.05

5.18 5.16 5.14 5.12

5.1

OJ

14

14

12

12

5.08 5.06 5.04 5.02 OJ

Phlorin

16 14 12

7.6

7.4

7.2

7

6.8

6.6

Phenylalanine

GJ

6.3

6.25

6.2

6.15

6.1

6.05

5.18 5.16 5.14 5.12

GJ

18

GJ

16

16 14

3.3

3.28 3.26 3.24 3.22

3.2

3.18

1.8

1.6

1.4

OG

Interval 5

16

OG

14 3.38 3.36 3.34 3.32

3.3

3.28 3.26 3.24 3.22

3.2

16

1

Arginine 1.8

1.6

1.4

Alanine

1.2

1

OJ

14 3.28 3.26 3.24 3.22

DMP *

3.2

3.18

0.8

0.6

OJ

Ethanol

16

16

3.3

0.6

12

3.18

18

3.38 3.36 3.34 3.32

0.8

Threonin

14

OJ

* DMP: Dimethylproline

1.2 OG

18

OJ

Leucine

12 3.38 3.36 3.34 3.32

OG

5.08 5.06 5.04 5.02

GJ

14

Interval 4

5.1

14 12 1.8

1.6

1.4

1.2

1

0.8

0.6

Glucose 63

Classification of samples Total: 92 samples

Number of correctly classified samples

Components used

on a combination of components

Pretreated dataset

88

IC 3

Interval 1

89

ICs 3 and 4

Interval 2

91

ICs 2 & 3 ; ICs 1 to 3

Interval 3

91

ICs 3 and 4 ; ICs 1 to 4

Interval 4

85

ICs 1 to 4

Interval 5

84

ICs 1 & 3

Intervals 2 and 3 give excellent rates of classification. 64

Selection of intervals by EWZS

Interval 2

Interval 3

Naringin

GJ

GJ 16

14

Hesperidin

Phlorin

12

GJ

14 12

6.3

6.25

6.2

6.15

6.1

5.18

6.05

5.16 5.14

5.12

5.1

5.08 5.06

5.04 5.02

5

4.98

5.04 5.02

5

4.98

5.04 5.02

5

4.98

OG

OG 16

14

Naringin

14

OG

12 12

6.3

6.25

6.2

6.15

6.1

5.18

6.05

5.16 5.14

5.12

5.1

OJ 16

14

OJ

5.08 5.06 OJ

14

12

12

63 6.3

6 25 6.25

62 6.2

6 15 6.15

61 6.1

6 05 6.05

5.18

5.16 5.14

5.12

5.1

5.08 5.06

65

Conclusion on variable selection ƒ

The variable selection methods locate signal components that vary and/or discriminate the samples

ƒ

The 2 known markers : naringin and hesperidin

ƒ

Other spectral features were also detected, so the NMR technique combined with variable selection may be useful to discover new authenticity markers in different matrices

ƒ

However, with EWZS (as with iPLS and Interval-PLS_Cluster) some parameters need to be optimised based on some knowledge about the data.

66

1. Food authentication ? 2. Why 1H NMR spectroscopy ? 3. General methodology 4. Acquiring spectra 4.1. Sample preparation 4.2. Measurements

5. Data Treatment 5.1. Classification method: ICA vs. PCA 5.2. Pre-treatment of the spectra

Example of orange juice

5.3. Variable selection : CLV / EWZS

6. Results on other matrices: yoghurts and vinegars 7. Conclusions www.eurofins.com

Yoghurt

How much fruit preparation does it contain? Vit i C? Vitamin

68

Experimental setup Industrial samples kindly provided by Friesland Foods:

Experimental design for Apple and Mango yoghurt

Yoghurt + Fruit mixtures

Apple and Pear (AP) 21 exp

Apple and red fruits (Mu)

Quantity Q i off fruit f i preparation

Quantity of vit C

0,00

0,00

2,50

0,01

5 00 5,00

0 02 0,02

7,50

0,03

10,00

0,04

12,50

0,05

15,00

0,06

17,50

0,07

20,00

0,08

22,50

0,09

25,00

0,10

27 50 27,50

0 10 0,10

30,00

0,11

32,50

0,12

35,00

0,13

37 50 37,50

0 14 0,14

40,00

0,15

42,50

0,16

45,00

0,17

47,50

0,18

50,00

0,19

Known quantities of21preparation, vitamin C. exp

Apple and Mango (AM) 21 exp

Apple and Pear and Mango (MP) 10 exp variation on Mango and Pear contents

69

Predictions

PLS regression i on fruit f it preparation ti content t t and d Vitamin Vit i C content t t Separation of the samples into 2 homogeneous datasets

Calibration set: 37 samples

Validation set: 36 samples

Apple Mango

11

Apple Mango

10

Apple Pear

11

Apple Pear

10

Apple red fruits

10

Apple red fruits

11

Apple Mango Pear

5

Apple Mango Pear

5

70

Prediction of fruit preparation in yoghurt On the whole data set, with logarithmic transformation 50 4

2 1 0 -1 1 -2 -3 -4 -5

9

8

7

6

5

4 ppm

3

2

1

0

-1

Percent Variance Captured by Regression Model -----X-Block----- -----Y-Block----Comp This Total This Total ---- ------- ------- ------- ------1 96.49 96.49 80.41 80.41 2 1.67 98.17 15.65 96.07 3 0.22 98.39 3.40 99.47 4 0.56 98.95 0.11 99.57 5 0.20 99.15 0.20 99.77 6 0.18 99.32 0.11 99.88 7 0.12 99.44 0.04 99.92 8 0.09 99.53 0.02 99.95 9 0.04 99.57 0.03 99.98 10 0.06 99.63 0.01 99.99

Predicted d amount off fruit preparation

3

40

30

20

R2 = 0.991 3 Latent Variables RMSEC = 0.34568 RMSEP = 2.0227

10

0

-10 0

5

10

15

20

25

30

35

40

45

50

Measured amount of fruit preparation

Average error of prediction = mean(abs(Ypred-Yreal))= 1.27 g/100g 71

Prediction of vitamin C in yoghurt Spiked sample to determine the signals of Vitamin C

72

On the selected areas, with logarithmic g transformation Log data 5

0

-5 9

8

7

6

5

9

2

x10 x 10

4 ppm Data

3

2

1

0

-1

0

8

7

6

5

4 ppm

0.45 0.4 0 35 0.35 0.3

R2= 0.994 4 Latent Variables RMSEC = 0.0063281 RMSEP = 0.014384

0.25 0.2 0.15 0.1 0.05 0 -0.05 0

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

Known concentration of vitamin C (g/100g)

Average error of prediction = 0.0075 g/100g

1

-1 9

Predicted d concentratio on of vitamin C (g/100g)

Prediction of vitamin C in yoghurt

3

2

1

0

-1 1

Percent Variance Captured by Regression Model -----X-Block----- -----Y-Block----Compp This Total This Total 1 99.75 99.75 80.39 80.39 2 0.20 99.95 13.97 94.36 3 0.04 99.99 4.42 98.78 4 0.00 99.99 0.92 99.70 5 0 00 99.99 0.00 99 99 0 07 99.77 0.07 99 77 6 0.00 100.00 0.07 99.84… 73

Balsamic vinegar Common or t diti traditional l?

74

Discrimination of vinegars Samples :

ƒ 15 Traditional balsamic vinegars (TB) / 14 Balsamic vinegars (Ba) ƒ Ba: 7 from the market and 7 from producers in Emilia. 3 are aged. Sample preparation :

ƒ Augusta Caligiani, University of Parma. ƒ Dilution of vinegar in D2O + TSP

75

Discrimination of vinegar by y Interval-PLS_Cluster _ TBV Ba

Spectral zone

Formic acid

8.35

8.3

8.25

8.2

20

Aged 6 years

8.15 ppm 1

28 20 17 15 1629 8 10 2521

10 0

6 17

10

15

8

From Emilia

14 93

5

8.05

27 26 19 18 2423 22

-10 -20 0

8.1

20

25

From the market

1311 12 4 5 2

30

35

Interval-PLS_Cluster may detect unexpected sub-clusters. 76

Discrimination of vinegars by EWZS/ICA (3-IC model) 8 7

7

Zone 1

Zone 2

6 6 5 5 8.34 8.32 8.3 8.28 8.26 8.24 8.22

7.58

7.56

7.54

7.52

7.5

10 7

Zone 3

8

Zone 4 Zone 4

6 6.7

6.68

6.66

6.64

9 8

0

28.3 24

2 5

5 08 5.06 5.08 5 06

8.3 2

8 . 3

8.2 8

8.2 6

8.2 4

0 4

Zone 6

3 95 3.95

39 3.9

2 7.58 5

7.56

7.54

5.2 6

5.2 4

7.52

7. 5

0 6. 7

6.6 8

6.6 6

6.6 4

5 5

5.2 8

5.2 2

5. 2

0

0 55.1 6

6

6 5 16 5.14 5.16 5 14 5.12 5 12 5.1 51

0

8

7

12

5.28 5.26 5.24 5.22 5.2 10

Zone 5

2

0

6 5

2

5.1 4

5.1 2

5. 1

5.0 8

5

4

3.95

IC 1 ------IC 2 ------IC 3 -------

3. 9

Zone 7

10 8 6 2.25 2.2 2.15 2.1 2.05 2 1.95

Balsamic : ---Traditional balsamic:----

– IC1 looks like the signal of Traditional balsamic vinegar

Selection contains signal of HMF* HMF , acetic acid acid, formic acid, glucose and fructose * HMF : Hydroxymethylfurfural

77

Discrimination of vinegar by EWZS/ICA (3 ICs model) Ba

1.2

1.1 Ba

Ba

1 Ba

I IC2

0.9

Ba

Ba Ba

Ba Ba

0.8

Ba Ba

0.7

0.6

TB TB TB TB TB TB TB TB TB TB

Ba TB TB TB

TB

0.5 -1.1

-1

-0.9

Ba

Ba

TB

-0.8

-0.7 -0.6 IC1

-0.5

-0.4

-0.3

-0.2

All samples correctly classified by leave-1-out cross-validation (the 3 LVs are taken into account) 78

1. Food authentication ? 2. Why 1H NMR spectroscopy ? 3. General methodology 4. Acquiring spectra 4.1. Sample preparation 4.2. Measurements

5. Data Treatment 5.1. Classification method: ICA vs. PCA 5.2. Pre-treatment of the spectra

Example of orange juice

5.3. Variable selection : CLV / EWZS

6. Results on other matrices: yoghurts and vinegars 7. Conclusions www.eurofins.com

Conclusions

Chemometric tools applied to 1H NMR spectra:

ƒ Pretreatments (warping and log transformation) give a significative p in the results improvement

ƒ Variable selection improved the results and gives meaning to the results ƒ ICA has proved to be more interpretable for analysis of NMR spectra and should be useful for other signals

80

Conclusions Potential of 1H NMR spectroscopy in food authentication : Very good results obtained on all 3 matrices :

ƒ Orange juice adulteration ƒ Quantification in yoghourt ƒ Discrimination of vinegars Sample Sa p e preparation p epa at o and a d acquisition acqu s t o procedure p ocedu e are a e easy However some technical difficulties to industrialise this analysis :

ƒ High cost of investment ƒ Signal treatment needs critical analysis: visual analysis for new type of fraud optimisation of the chemometric methods for each kind of fraud. fraud, fraud 81

Conclusions

Future works

ƒ Develop automated procedures (software) for the most important frauds

ƒ Investigate other liquid or freeze-dried products ƒ Transfer these chemometrics methods to other types of measurements (eventually online)

82

Acknowledgment

For evaluating this work: All the jury members

For their help, advice and time: My colleagues at AgroParisTech and Eurofins

For financial support: Eurofins French Ministry of Research support of the convention CIFRE n° 169-2005

F being For b i here: h Friends, family and colleagues

83

84