Food authentication by 1H NMR spectroscopy t i combination in bi ti with ith different chemometric tools Application to fruit juices, yoghurts and vinegars PhD Viva 7th March 2008 Marion Cuny
www.eurofins.com
Presentation outline 1. Food authentication ? 2 Why 1H NMR spectroscopy ? 2. 3. General methodology 4 Acquiring spectra 4. 4.1. Sample preparation 4 2 Measurements 4.2.
5. Data Treatment 5 1 Classification method: ICA vs 5.1. vs. PCA 5.2. Pre-treatment of the spectra
Example of orange juice
5.3. Variable selection : CLV / EWZS
6. Results on other matrices: yoghurts and vinegars 7 Conclusions 7. 2
1. Food authentication ? 2. Why 1H NMR spectroscopy ? 3. General methodology 4. Acquiring spectra 4.1. Sample preparation 4.2. Measurements
5. Data Treatment 5.1. Classification method: ICA vs. PCA 5.2. Pre-treatment of the spectra
Example of orange juice
5.3. Variable selection : CLV / EWZS
6. Results on other matrices: yoghurts and vinegars 7. Conclusions www.eurofins.com
What does authenticity mean? Authenticating food is being sure that the product is really what it is supposed to be. What kinds of adulteration are we looking for? Origin: • Country • Variety
Production: • Method : Organic / Conventional • Vintage Vi t
Ingredients: • Natural / Synthesized • Undeclared addition
Why it is important to protect the market from adulteration? Fair market place Consumer safety & Consumer trust 4
Orange juice authentication
Orange juice: Most consumed juice worldwide 500 000 000 litres a day* Usual economic frauds in fruit juices: • • • •
Addition of water Additi off sugar Addition Use of co-fruits Addition of others compounds
such as flavours or acids to hide fraud…
* Terra Economica
5
Dairy product authentication
Yoghurt:
a dairy product produced by bacterial fermentation of milk.
up to 30 % of added compounds Possible frauds:
Origin and content of the proteins Process : • Fruit F it percentage t • Bacteria
Addition of undeclared compounds 6
Balsamic vinegar authentication
Protected Denomination of Origin
PDO: 2 areas: Modena and Reggio Emilia
Traditional Balsamic (TB)
Balsamic (Ba)
Production
cooked must from Trebbiano grape varieties
cooked must diluted with wine and caramel (2%)
Aging
aged at least 12 years in barrels of different kinds of wood and of decreasing volume
60 days in wooden recipients
7
How can we detect a fraudulent product?
Different sorts of fraud When you get a sample : What is the possible fraud? What is the relevant information?
How can I find it? Sensory characteristics Analytical approaches
8
Analytical approaches Target analyte approach • Direct analysis of adulterant or authenticity indicator • Comparison with expected values
Fingerprint / multivariate approach Pattern comparison of authentic and adulterated products • Multiple physico-chemical measurements • Profiles
9
Analytical approaches Target analyte approach • Direct analysis of adulterant or authenticity indicator • Comparison with expected values
Fingerprint / multivariate approach Pattern comparison of authentic and adulterated products • Multiple physico-chemical measurements • Profiles
10
Analytical approaches
Fingerprint / multivariate approach Pattern comparison of authentic and adulterated products • Multiple physico-chemical measurements • Profiles 1H
NMR spectroscopy
11
1. Food authentication ? 2. Why 1H NMR spectroscopy ? 3. General methodology 4. Acquiring spectra 4.1. Sample preparation 4.2. Measurements
5. Data Treatment 5.1. Classification method: ICA vs. PCA 5.2. Pre-treatment of the spectra
Example of orange juice
5.3. Variable selection : CLV / EWZS
6. Results on other matrices: yoghurts and vinegars 7. Conclusions www.eurofins.com
Why 1H NMR spectroscopy ? Simplified sample preparation Rapid measurement Screening of all protonated
molecules l l even for f complex l mixtures Spectrum of a strawberry yoghurt
8
x 10
2.5
Fast screening method for industry control
2
1.5
1
0.5
0 10
8
6
4
2
0
chemical shift (ppm)
13
1. Food authentication ? 2. Why 1H NMR spectroscopy ? 3. General methodology 4. Acquiring spectra 4.1. Sample preparation 4.2. Measurements
5. Data Treatment 5.1. Classification method: ICA vs. PCA 5.2. Pre-treatment of the spectra
Example of orange juice
5.3. Variable selection : CLV / EWZS
6. Results on other matrices: yoghurts and vinegars 7. Conclusions www.eurofins.com
General methodology
8
20
x 10
15
10
5
0
-5 9
Sample homogenization
Sample collection
Sample conservation
8
7
6
5
4 ppm
3
2
1
0
Tube filling
Sample preparation
NMR measurement
Spectra
15
-1
Methodology 8
20
x 10
3
x
1 0
4
V in e g a r
s p e c t ru m
15 2 . 5
2
10 1 . 5
1
5 0 . 5
0
0
-0 . 5
-5
9
8
7
6
5
4 ppm
3
2
1
0
1 0
9
8
7
6
5 p p m
4
3
2
1
0
-1
On criteria: Warping Log transformation Data reduction (average)
Variance
Independent Component
Clustering of Variables (CLV)
Analysis
(ICA)
Principal Component
Interval selection:
Analysis (PCA)
Interval PLS (iPLS)
Partial Least Squares
Evolving Window Zone Selection
regression
Authentic Adulterated Quantity
(PLS)
(EWZS) Interval-PLS Interval PLS_Cluster Cluster
Spectra g homogenisation
Variable selection
Analysis
Diagnosis
16
In today’s presentation
8
20
x 10
15
10
5
0
-5 9
Sample homogenization
Sample collection
Sample conservation
8
7
6
5
4 ppm
3
2
1
0
Tube filling
Sample preparation
NMR measurement
Spectra
17
-1
In today’s presentation 8
20
x 10
15
3
x
1 0
4
V in e g a r s p e c t ru m
2 .5
10
2
1 .5
1
5
0 .5
0
0
-0 .5
-5
9
8
7
6
5
4 ppm
3
2
1
0
1 0
9
8
7
6
5 p p m
4
3
2
1
0
-1
On criteria: Warping Log transformation
Variance Clustering of Variables (CLV)
Data reduction
Interval selection:
(average)
Interval PLS (iPLS) Evolving Window Zone Selection (EWZS)
Independent Component Analysis (ICA) Principal Component Analysis (PCA)
Authentic Adulterated Quantity
Partial Least Squares regression (PLS)
Interval-PLS Interval PLS_Cluster Cluster
Spectra g homogenisation
Variable selection
Analysis
Diagnosis
18
1. Food authentication ? 2. Why 1H NMR spectroscopy ? 3. General methodology 4. Acquiring spectra 4.1. Sample preparation 4.2. Measurements
5. Data Treatment 5.1. Classification method: ICA vs. PCA 5.2. Pre-treatment of the spectra
Example of orange juice
5.3. Variable selection : CLV / EWZS
6. Results on other matrices: yoghurts and vinegars 7. Conclusions www.eurofins.com
Example of orange and grapefruit juices A known fraud A standard method of detection: IFU* 58 Quantification of Naringin and Hesperidin Study in collaboration with IFR Norwich : 92 samples : • 59 orange juices (OJ) • 23 grapefruit juices (GJ) Cl Classification ifi ti off the th samples l • 10 blends (OG)
into i t 3 groups
using information contained in the spectra
Variation around the process:
• Long life / chilled • From concentrate / pure juice • Origin
Aim: assess the potential of 1H NMR spectroscopy and chemometric tools to discriminate samples from the industry
* IFU : International Federation of Fruit Juice Producers
20
1. Food authentication ? 2. Why 1H NMR spectroscopy ? 3. General methodology 4. Acquiring spectra 4.1. Sample preparation 4.2. Measurements
5. Data Treatment 5.1. Classification method: ICA vs. PCA 5.2. Pre-treatment of the spectra
Example of orange juice
5.3. Variable selection : CLV / EWZS
6. Results on other matrices: yoghurts and vinegars 7. Conclusions www.eurofins.com
What is a good spectrum for authentication? Being similar to other spectra of the same matrix Means
Needs Same information
Stable experimental conditions pH adjustment Sample preparation
Same chemical shifts Same shape
Warping Phase correction Data pre-treatment
Same intensity
Good resolution
Base-line correction Acquisition parameters
N Numerous scans
Longer time of experiment (NS)
Numerous data points (around 32000)
Longer time of experiment (TD)
22
Sample preparation
When ?
Why ? Same physico physicochemical state
Standardisation of the samples
Comparable intensities Same height for shim correction
For Juices
Storage until experiment
Freezing
pH adjustment
4 +/- 0.02
No particles Stable magnetic field
Filling of the tube
How ?
Same viscosity / concentration : Brix adjustment by dilution
Same amount of sample, solvent and lock
Centrifugation 25min / 4500rpm No adjustment 750 μL 250 μL D2O
23
Acquisition parameters Why y?
Parameter
How ?
For Juices
Spectrometer
Control of environment
600 MHz
Temperature
Fixed / influence on viscosity
27°C 27 C
Water peak presaturation
Noesypr1d
AQ : Acquisition Time
Enough to have all the information
2.8 sec
D1 : relaxation delay
Enough to have high intensity
Stable magnetic field Recorded information Pulse sequence Parameters to increase resolution
secthe Every yp parameter that increases the quality q y2 of Not in quantitative condition Digital Resolution : DR spectrum also increases the time of experiment! TD : time domain Numerous datapoints 16384 DR = 2SW / TD DR = 1 / AQ
SW : spectral t l width idth
Parameter to increase NS : number of scans Signal g / Noise
Small spectral p window of observation
11 97 ppm 11.97
Numerous NS ; S/N
128
√(NS)
24
1. Food authentication ? 2. Why 1H NMR spectroscopy ? 3. General methodology 4. Acquiring spectra 4.1. Sample preparation 4.2. Measurements
5. Data Treatment 5.1. Classification method: ICA vs. PCA 5.2. Pre-treatment of the spectra
Example of orange juice
5.3. Variable selection : CLV / EWZS
6. Results on other matrices: yoghurts and vinegars 7. Conclusions www.eurofins.com
Classification method Leave-one-out cross-validation. Data D t reduction d ti using i P Principal i i l (or ( Independent) I d d t) Components. C t Projection of the samples onto the different component spaces. Left-out Left out sample attributed to the nearest group group, based on the distances from group barycenters.
All the component p spaces p are taken into account,, best results kept. p With 3 components Æ 7 different spaces : C1 C2 C3 C1&C2 C1&C3 C2&C3 C1&C2&C3
1 Dimension 2 Dimensions 3 Dimensions 26
Classification method In a 2D plane: d1 : Euclidian distance from the sample to goup 1 barycenter y d2 : Euclidian distance from the sample to goup 2 barycenter
1 0.9 Group 2 barycenter 08 0.8 Sample to classify
0.7
d2
0.6
Here d1>d2 so the sample is attributed to group 2.
05 0.5
d1 Group 1 barycenter
0.4 0.3 0.2 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
27
PCA vs. ICA PCA: Principal Component Analysis
Classical method of unsupervised group detection Looks for orthogonal directions maximising the dispersion of samples
ICA: Independent Component Analysis
Based on the (logical) assumption of statistical independence of underlying ("pure") sources
Looks for pure signals in a data set of mixed signals Calculates a demixing transformation that minimises dependencies between the estimates of the "pure" sources
Two types of information summarised in these components 28
Aromatic zone of the fruit juices x 10
Grapefruit juice
5
15 10 5
Gaba
8
x 10
7.8
7.6
N i i Naringin
7.4
Naringin
N i i Gaba* Naringin G b *
7.2
7 ppm
6.8
6.6
6.4
Orange juice
5
15 10 5
Hesperidin 8
6.2
7.8
7.6
7.4
7.2
7
Hesperidin 6.8
6.6
6.4
6.2
ppm
x 10
Blend of orange and grapefruit
5
15 10 Adenosine? Adenosine? 5 8
78 7.8
* Gaba : γ butiric amino acid
76 7.6
Tyrosin Arginine
74 7.4
72 7.2
7 ppm
Arginine
68 6.8
66 6.6
Phlorin 64 6.4
62 6.2
29
Decomposition into 3 Principal Components Principal Component 1 0.1
Principal Component 2 0.05
0.05
0
0 8
7.8
7.6
7.4
7.2
7 ppm
6.8
6.6
6.4
6.2
-0.05 8
7.8
7.6
7.4
7.2
7 ppm
6.8
6.6
6.4
6.2
1H NMR spectrum - OJ mean 1H NMR spectrum GJ mean 5
x 10
Principal Component 3
10 0.05 0
5
-0.05
0
8
8
7.8
7.6
7.4
7.2
7 ppm
6.8
6.6
6.4
7.8
7.6
7.4
7.2
7 ppm
6.8
6.6
6.4
6.2
6.2
Axis of maximum dispersion = Difference between the mean spectra of GJ and OJ
No apparent explanation
30
Decomposition into 3 Independent Components p Independent component 1 4
1° source = Mean signal of OJ
2
0 8
7.8
7.6
7.4
7.2
7 ppm
6.8
6.6
6.4
6.2
6.4
6.2
OJ mean 1H NMR spectrum
5
x 10 5 4 3 2 1 8
7.8
7.6
7.4
7.2
7 ppm
6.8
6.6
31
Decomposition in 3 independent components p Independent component 2 8
2° source ~ Difference Diff between mean signal of GJ and OJ
6 4 2
~ PC 1
0 8
7.8
7.6
7.4
7.2
7 ppm
6.8
6.6
6.4
6.2
GJ mean 1H NMR spectrum - OJ mean 1H NMR spectrum 5 x 10
10
5
0 8
78 7.8
76 7.6
74 7.4
72 7.2
7 ppm
68 6.8
66 6.6
64 6.4
62 6.2
32
Decomposition in 3 independent components p Independent component 3 6 4
3° source = Difference between GJ and OJ nott explained l i d by b IC 2
2 0 -2 -4 8
7.8
7.6
7.4
7.2
7 ppm
6.8
6.6
6.4
6.2
GJ mean 1H NMR spectrum - OJ mean 1H NMR spectrum - 100000xIC 2 400000 200000 0 -200000 8
78 7.8
76 7.6
74 7.4
72 7.2
7 ppm
68 6.8
66 6.6
64 6.4
62 6.2
33
Classification of samples
Total: 92 samples
Number of correctly classified samples
Components used
p for a combination of components
PCA
82
PCs 1 to 3
ICA
86
ICs 2 and 3
Best model of discriminaton with ICs 2 and 3, g the difference between OG and GJ explaining 34
Conclusion on ICA vs. PCA All of the ICA sources found have a physico-chemical explanation Apart from PC1, it is not possible to link PCs to signal information
Better classification rate for ICA
We W will ill use only l ICA in i the th restt off the th study. t d
35
1. Food authentication ? 2. Why 1H NMR spectroscopy ? 3. General methodology 4. Acquiring spectra 4.1. Sample preparation 4.2. Measurements
5. Data Treatment 5.1. Classification method: ICA vs. PCA 5.2. Pre-treatment of the spectra
Example of orange juice
5.3. Variable selection : CLV / EWZS
6. Results on other matrices: yoghurts and vinegars 7. Conclusions www.eurofins.com
Why pre-treat the data?
To homogenise g the data… unbias and speed up th subsequent the b t analytical l ti l treatments t t t
37
Problem of scale x 10
8
2.5
2
1.5
1
0.5
0 10
x 10
8
6 chemical shift (ppm)
4
2
0
5
4
Scale 105
3 2 1 0 10
9.5
9
8.5
8
7.5
7
6.5
6
chemical shift (ppm)
x 10
Relevant information can be p present at different levels : differences in intensity can be of 10^3.
8
2.5
Scale 108
2
1.5
1
0.5
May bias the multivariate results
0 4.6
4.4
4.2
4
3.8
3.6
3.4
3.2
chemical shift (ppm)
38
LOG Transformation 8
x 10
20
4
15
10
2
5
0 10
9
8
7
6
5 4 ppm
3
2
1
10
0
9
8
7
6
5
4
3
2
ppm ppm
6
x 10 25 2.5
18
2 16
1.5 14
1 12
0.5 10
0 9
8.5
8
7.5
7 ppm
6.5
6
5.5
5
9
8.5
8
7.5
7
6.5
6
5.5
5
ppm
Scale modification and only small transformation of the peak shape
39
Classification of samples On the whole dataset - model with 4 ICs. Total: 92 samples
Number of correctly classified samples
Components used
for a combination of components
No Log
77
ICs 1 and 4
Log
88
ICs 2 and 3
Log transformation enhances the prediction by ICA ( (same results lt were found f d for f PCA) 40
Homogeneisation thanks to: Phase correction Base-line correction Warping : • “Correlation “C l ti O Optimised ti i d Warping” W i ” with ith linear li interpolation i t l ti • Adds and subtracts points from each spectrum to maximise its alignment with a reference spectrum
41
Homogeneisation thanks to:
Citric acid
Succinic acid
Warping
x 10
8
2
Malic acid
1.5
Before Warping p g
1
Shifts dependent on pH
0.5 0
100 x 10
200
300
400
500
600
700
8
2 1.5
After Warping
1 0.5 0
100
200
300
400
500
600
700
42
Effect of warping on signal decomposition by y ICA Before Warping
IC 1 6
IC 1
IC 1 6
6 4 2 0
4 2 0
4 2 0
100 100
200
300
400
500
600
200
300
700
400
500
600
100
700
200
300
400
500
600
700
500
600
700
500
600
700
500
600
700
IC 2
IC 2
6
IC 2 6
4
4
4
2
2
2
0
0
100
0 100
200
300
400
500
600
200
300
400
500
600
700
100
200
300
6
2 components
400
IC 3
700
IC 3
4
6
2
4
0
2
100
200
300
400
500
600
700
0 100
200
300
400 IC 4
3 components
6 4 2 0 100
For all the components, the contribution of the citric acid signal is like a 1° derivative
200
300
400
4 components
43
Effect of warping on signal decomposition by y ICA After Warping 8 6 4
IC1
2 0 -2
0
100
200
300
400
500
600
700
800
10
Warping enhances the quality of signal decomposition 5
IC2 0
-5
0
100
200
300
400
500
600
700
800
The contribution of citric acid is found in IC1 IC2 contains both the malic acid and succinic acid contributions 44
Conclusion on signal pretreatments The analyses are improved by signal pretreatment because it :
Homogenises the dataset (Log and warping) Enhances IC decomposition (warping) In addition, the averaging of 7 points (not presented here) :
Reduces the data analysis time (data size) These 3 treatments will be applied in the rest of the study
45
1. Food authentication ? 2. Why 1H NMR spectroscopy ? 3. General methodology 4. Acquiring spectra 4.1. Sample preparation 4.2. Measurements
5. Data Treatment 5.1. Classification method: ICA vs. PCA 5.2. Pre-treatment of the spectra
Example of orange juice
5.3. Variable selection : CLV / EWZS
6. Results on other matrices: yoghurts and vinegars 7. Conclusions www.eurofins.com
Where is the relevant information in the spectrum? p Examples : Aromatics area
Sugars area
Acids area
8
x 10
3.5
3
2.5
2 x
1 0
5
6
1.5 5
4
3
1
2
1
0
0.5
-1 8
7 . 5
7
6 . 5 p p m
6
5 . 5
5
0 10
8
6
4
2
0
-2
ppm
Relevant information can be over the whole spectrum Where is the indicator of authenticity located? Variable selection 47
Types of variable selection method studied
a. Based on criteria on the variables : Variance Clustering of Latent Variables (CLV)
b. Based on interval selection : Interval I t l PLS (iPLS) Evolving Window Zone Selection (EWZS) Interval-PLS_Cluster I l PLS Cl
48
Types of variable selection method studied
a. Based on criteria on the variables : Variance Clustering of Latent Variables (CLV)
b. Based on interval selection : Interval I t l PLS (iPLS) Evolving Window Zone Selection (EWZS) Interval-PLS_Cluster I l PLS Cl
49
CLV Clustering of Latent Variables
CLV clusters variables around latent variables Iterative procedure: does a hierarchical cluster analysis of variables 1st step: clustering of 2 variables linked either by their correlation or their covariance (covariance here)). The 2 variables replaced by a latent variable = the 1st PC.
2nd step: consolidation. Each former variable is assigned to the more correlated latent components. And the consolidated latent variables are calculated. Æ step 1…
50
CLV
At each clustering step, the latent variables are clustered by minimizing the loss of information (or T criterion)
T is the sum of the first eigenvalue of each matrix of clustered variables
Advantages : No a priori knowledge on data, Variables that are not spatially linked can be related
51
CLV selection method Dendrogram before consolidation
Variance explained by the latent component = T
∆T evolution ~ loss of info
Number of groups
Clustering the 4 Latent Variables into 3 leads to a loss in information. So it is better to keep 4 Latent Variables. 52
ANOVA on the 4 latent components to predict juice p j types yp Source
Sum off S Squares
df d.f.
Mean M Squares
F
Prob > F
Component 1
0.019
2
0.009
0.85
0.429
Component 2
0.669
2
0.335
90.02
0
Component 3
0.010
2
0.005
0.44
0.645
Component 4
0 014 0.014
2
0 007 0.007
0 61 0.61
0 544 0.544
Only the 2° component is significant for predicting juice types types. 53
CLV selection 19 α-glucose α glucose
Not selected Selected
ß-glucose
18 Citric acid 17 Alanine
16 15
Naringin Histidine
14
Formic acid Niacin
13
Leucine Naringin
12 11 10
9
8 Tyrosine
Hesperidin 7 6
Naringin 5 ppm
4
3
2
1
0
878 variables are selected with CLV CLV. 54
Selected compounds in OJ and GJ
Compound glucose fructose naringin i i
Content in orange juice 20-35 g/l 20-35 g/l 0 0 mg/g 0.0 / iin N Navell and d Valencia cultivars (USDA)
alanine l i citric acid leucine
60-205 60 205 mg/l/l 6.3-17 g/l 3-15 mg/l
Content in grapefruit juice 20-50 g/l 20-50 g/l maximum i 1200 mg/l/l (AIJN) (AIJN), mean of 0.38mg/g on 9 cultivars (USDA) 62 180 mg/l/l 62-180 8-20 g/l 1-10 mg/l
Data from AIJN and USDA
CLV has selected compounds varying in the different juices. Maybe y not all are really y markers but they y do vary y in this dataset. 55
Classification of samples
Total: 92 samples
Number of correctly classified samples
Components used
for a combination of components
Pretreated dataset
88
IC 3
CLV selection l ti
89
IC 1 to ICs t 4
Small increase in correct classification rate after CLV selection Better understanding
56
EWZS
Evolving Window Zone Selection
EWZS divides the dataset into subsets of variables by using a window with a varying position and size
It tests the ability of the zone within each window to predict a value. Here the group membership
57
EWZS function 5
Dataset = Xn,2500
x 10
6
4
Value to predict = Yn,1
3 2 1 0 0
1000
2000
Evolving Window Zone Selection A growing sliding window to test zone's ability to predict Y
Xn,1:500
Legend : Step =500 Mi i l size Minimal i = 500 Maximal size = 1500
Spectum Observation window
6
x 10 5 4 3 2 1 0 0
1000
2000
1st zone tested
58
EWZS function 5
x 10
6
4
Dataset = Xn,2500
Value to predict = Yn,1
3 2 1 0 0
1000
2000
Evolving Window Zone Selection function: A growing sliding window to test zone's ability to predict Y
Xn,1:500
Xn,1:1000
Legend : Step =500 Mi i l size Minimal i = 500
Spectum Observation window
Maximal size = 1500 6
x 10 5 4 3 2 1 0 0
6
1000
2000
x 10 5 4 3 2 1 0 0
1000
2000
2nd zone tested
59
EWZS function 5
x 10
6
4
Dataset = Xn,2500
Value to predict = Yn,1
3 2 1 0 0
1000
2000
Evolving Window Zone Selection function: A growing sliding window to test zone's ability to predict Y
Growing
Legend : Step =500 Mi i l size Minimal i = 500
6
x 10 5 4 3 2 1 0 0
6
1000
2000
6
Sliding
Xn,1:500
Xn,1:1000
Xn,1:1500
Xn,500:1000
Xn,500:1500
Xn,500:2000
x 10 5 4 3 2 1 0 0
…
Xn,1000:2500
x 10 5 4 3 2 1 0 0
x 10 5 4 3 2 1 0 0
6
1000
2000
6
1000
2000
6
Xn,1000:1500 Xn,1000:2000
Spectum Observation window
Maximal size = 1500
x 10 5 4 3 2 1 0 0
1000
2000
1000
2000
1000
2000
1000
2000
6
1000
2000
6
x 10 5 4 3 2 1 0 0
x 10 5 4 3 2 1 0 0
x 10 5 4 3 2 1 0 0
6
1000
2000
x 10 5 4 3 2 1 0 0
60
EWZS function Data reduction
Xn,1:500 n 1:500
Xn,1:1000 n 1:1000
Xn,1:1500 n 1:1500
Prediction
ICA PLS
Xn,500:1000
Xn,500:1500
Xn,500:2000
PCA PLS-DA PLS DA
Xn,1000:1500 Xn,1000:2000
Xn,1000:2500
Criterion
PLS Linear Regression
Yn,1
RMSECV R²
…
…
RMSECV (Xn,1:500)
RMSECV (Xn,1:1000)
RMSECV (Xn,1:1500 )
R² (Xn,1:500)
R² (Xn,1:1000)
R² (Xn,1:1500 )
RMSECV (Xn,500:1000)
RMSECV (Xn,500:1500)
RMSECV (Xn,500:2000 )
R² (Xn,500:1000)
R² (Xn,500:1500)
R² (Xn,500:2000 )
RMSECV (Xn,1000:1500) RMSECV (Xn,1000:2000) RMSECV (Xn,1000:2500 )
R² (Xn,1000:1500) R
R² R (Xn,1000:2000)
R R² (Xn,1000:2500 )
…
…
MAP
MAP RMSECV value
Selection
R² value
Selected zone = Xn,500:1500 …
… 61
Selection of intervals by EWZS Max Regression R2 0 0.9 200 0.8
400
0.7
600 800
06 0.6
1000
0.5
1200
0.4
1400 0.3 1600
R² value 0,9174 0,8892 0,9296 0,9441 0,9538 0,8904 0,9243 0,9195
Beginning 485 506 748 768 990 1314 1597 1778
End 735 632 915 831 1033 1357 1827 1883
interval 1
interval 2 interval 3 interval te a 4 interval 5 interval 5
0.2
blue: redundancy; red: best result; green: concatenated
1800 0.1
2000 0
50
100
150
200
250
In the coloured map of R² we can select zones that show high values
62
Selection of intervals by EWZS Naringin
Interval 1
Interval 3 Naringin
GJ
14
GJ
Interval 2
Tyrosine
GJ
GJ
14
12
16 14
12
12
7.6
7.4
7.2
7
6.8
6.6
OG
Arginine
14
OG
6.3
6.25
6.2
6.15
6.1
6.05
5.18 5.16 5.14 5.12
5.08 5.06 5.04 5.02
5
4.98
5
4.98
5
4.98
OG
14
16
13
Naringin
14
12
12
5.1
OG
12
7.6
Gaba
OJ
7.4
7.2
7
6.8
OJ
6.6
6.3
Hesperidin
6.25
6.2
6.15
6.1
6.05
5.18 5.16 5.14 5.12
5.1
OJ
14
14
12
12
5.08 5.06 5.04 5.02 OJ
Phlorin
16 14 12
7.6
7.4
7.2
7
6.8
6.6
Phenylalanine
GJ
6.3
6.25
6.2
6.15
6.1
6.05
5.18 5.16 5.14 5.12
GJ
18
GJ
16
16 14
3.3
3.28 3.26 3.24 3.22
3.2
3.18
1.8
1.6
1.4
OG
Interval 5
16
OG
14 3.38 3.36 3.34 3.32
3.3
3.28 3.26 3.24 3.22
3.2
16
1
Arginine 1.8
1.6
1.4
Alanine
1.2
1
OJ
14 3.28 3.26 3.24 3.22
DMP *
3.2
3.18
0.8
0.6
OJ
Ethanol
16
16
3.3
0.6
12
3.18
18
3.38 3.36 3.34 3.32
0.8
Threonin
14
OJ
* DMP: Dimethylproline
1.2 OG
18
OJ
Leucine
12 3.38 3.36 3.34 3.32
OG
5.08 5.06 5.04 5.02
GJ
14
Interval 4
5.1
14 12 1.8
1.6
1.4
1.2
1
0.8
0.6
Glucose 63
Classification of samples Total: 92 samples
Number of correctly classified samples
Components used
on a combination of components
Pretreated dataset
88
IC 3
Interval 1
89
ICs 3 and 4
Interval 2
91
ICs 2 & 3 ; ICs 1 to 3
Interval 3
91
ICs 3 and 4 ; ICs 1 to 4
Interval 4
85
ICs 1 to 4
Interval 5
84
ICs 1 & 3
Intervals 2 and 3 give excellent rates of classification. 64
Selection of intervals by EWZS
Interval 2
Interval 3
Naringin
GJ
GJ 16
14
Hesperidin
Phlorin
12
GJ
14 12
6.3
6.25
6.2
6.15
6.1
5.18
6.05
5.16 5.14
5.12
5.1
5.08 5.06
5.04 5.02
5
4.98
5.04 5.02
5
4.98
5.04 5.02
5
4.98
OG
OG 16
14
Naringin
14
OG
12 12
6.3
6.25
6.2
6.15
6.1
5.18
6.05
5.16 5.14
5.12
5.1
OJ 16
14
OJ
5.08 5.06 OJ
14
12
12
63 6.3
6 25 6.25
62 6.2
6 15 6.15
61 6.1
6 05 6.05
5.18
5.16 5.14
5.12
5.1
5.08 5.06
65
Conclusion on variable selection
The variable selection methods locate signal components that vary and/or discriminate the samples
The 2 known markers : naringin and hesperidin
Other spectral features were also detected, so the NMR technique combined with variable selection may be useful to discover new authenticity markers in different matrices
However, with EWZS (as with iPLS and Interval-PLS_Cluster) some parameters need to be optimised based on some knowledge about the data.
66
1. Food authentication ? 2. Why 1H NMR spectroscopy ? 3. General methodology 4. Acquiring spectra 4.1. Sample preparation 4.2. Measurements
5. Data Treatment 5.1. Classification method: ICA vs. PCA 5.2. Pre-treatment of the spectra
Example of orange juice
5.3. Variable selection : CLV / EWZS
6. Results on other matrices: yoghurts and vinegars 7. Conclusions www.eurofins.com
Yoghurt
How much fruit preparation does it contain? Vit i C? Vitamin
68
Experimental setup Industrial samples kindly provided by Friesland Foods:
Experimental design for Apple and Mango yoghurt
Yoghurt + Fruit mixtures
Apple and Pear (AP) 21 exp
Apple and red fruits (Mu)
Quantity Q i off fruit f i preparation
Quantity of vit C
0,00
0,00
2,50
0,01
5 00 5,00
0 02 0,02
7,50
0,03
10,00
0,04
12,50
0,05
15,00
0,06
17,50
0,07
20,00
0,08
22,50
0,09
25,00
0,10
27 50 27,50
0 10 0,10
30,00
0,11
32,50
0,12
35,00
0,13
37 50 37,50
0 14 0,14
40,00
0,15
42,50
0,16
45,00
0,17
47,50
0,18
50,00
0,19
Known quantities of21preparation, vitamin C. exp
Apple and Mango (AM) 21 exp
Apple and Pear and Mango (MP) 10 exp variation on Mango and Pear contents
69
Predictions
PLS regression i on fruit f it preparation ti content t t and d Vitamin Vit i C content t t Separation of the samples into 2 homogeneous datasets
Calibration set: 37 samples
Validation set: 36 samples
Apple Mango
11
Apple Mango
10
Apple Pear
11
Apple Pear
10
Apple red fruits
10
Apple red fruits
11
Apple Mango Pear
5
Apple Mango Pear
5
70
Prediction of fruit preparation in yoghurt On the whole data set, with logarithmic transformation 50 4
2 1 0 -1 1 -2 -3 -4 -5
9
8
7
6
5
4 ppm
3
2
1
0
-1
Percent Variance Captured by Regression Model -----X-Block----- -----Y-Block----Comp This Total This Total ---- ------- ------- ------- ------1 96.49 96.49 80.41 80.41 2 1.67 98.17 15.65 96.07 3 0.22 98.39 3.40 99.47 4 0.56 98.95 0.11 99.57 5 0.20 99.15 0.20 99.77 6 0.18 99.32 0.11 99.88 7 0.12 99.44 0.04 99.92 8 0.09 99.53 0.02 99.95 9 0.04 99.57 0.03 99.98 10 0.06 99.63 0.01 99.99
Predicted d amount off fruit preparation
3
40
30
20
R2 = 0.991 3 Latent Variables RMSEC = 0.34568 RMSEP = 2.0227
10
0
-10 0
5
10
15
20
25
30
35
40
45
50
Measured amount of fruit preparation
Average error of prediction = mean(abs(Ypred-Yreal))= 1.27 g/100g 71
Prediction of vitamin C in yoghurt Spiked sample to determine the signals of Vitamin C
72
On the selected areas, with logarithmic g transformation Log data 5
0
-5 9
8
7
6
5
9
2
x10 x 10
4 ppm Data
3
2
1
0
-1
0
8
7
6
5
4 ppm
0.45 0.4 0 35 0.35 0.3
R2= 0.994 4 Latent Variables RMSEC = 0.0063281 RMSEP = 0.014384
0.25 0.2 0.15 0.1 0.05 0 -0.05 0
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45
Known concentration of vitamin C (g/100g)
Average error of prediction = 0.0075 g/100g
1
-1 9
Predicted d concentratio on of vitamin C (g/100g)
Prediction of vitamin C in yoghurt
3
2
1
0
-1 1
Percent Variance Captured by Regression Model -----X-Block----- -----Y-Block----Compp This Total This Total 1 99.75 99.75 80.39 80.39 2 0.20 99.95 13.97 94.36 3 0.04 99.99 4.42 98.78 4 0.00 99.99 0.92 99.70 5 0 00 99.99 0.00 99 99 0 07 99.77 0.07 99 77 6 0.00 100.00 0.07 99.84… 73
Balsamic vinegar Common or t diti traditional l?
74
Discrimination of vinegars Samples :
15 Traditional balsamic vinegars (TB) / 14 Balsamic vinegars (Ba) Ba: 7 from the market and 7 from producers in Emilia. 3 are aged. Sample preparation :
Augusta Caligiani, University of Parma. Dilution of vinegar in D2O + TSP
75
Discrimination of vinegar by y Interval-PLS_Cluster _ TBV Ba
Spectral zone
Formic acid
8.35
8.3
8.25
8.2
20
Aged 6 years
8.15 ppm 1
28 20 17 15 1629 8 10 2521
10 0
6 17
10
15
8
From Emilia
14 93
5
8.05
27 26 19 18 2423 22
-10 -20 0
8.1
20
25
From the market
1311 12 4 5 2
30
35
Interval-PLS_Cluster may detect unexpected sub-clusters. 76
Discrimination of vinegars by EWZS/ICA (3-IC model) 8 7
7
Zone 1
Zone 2
6 6 5 5 8.34 8.32 8.3 8.28 8.26 8.24 8.22
7.58
7.56
7.54
7.52
7.5
10 7
Zone 3
8
Zone 4 Zone 4
6 6.7
6.68
6.66
6.64
9 8
0
28.3 24
2 5
5 08 5.06 5.08 5 06
8.3 2
8 . 3
8.2 8
8.2 6
8.2 4
0 4
Zone 6
3 95 3.95
39 3.9
2 7.58 5
7.56
7.54
5.2 6
5.2 4
7.52
7. 5
0 6. 7
6.6 8
6.6 6
6.6 4
5 5
5.2 8
5.2 2
5. 2
0
0 55.1 6
6
6 5 16 5.14 5.16 5 14 5.12 5 12 5.1 51
0
8
7
12
5.28 5.26 5.24 5.22 5.2 10
Zone 5
2
0
6 5
2
5.1 4
5.1 2
5. 1
5.0 8
5
4
3.95
IC 1 ------IC 2 ------IC 3 -------
3. 9
Zone 7
10 8 6 2.25 2.2 2.15 2.1 2.05 2 1.95
Balsamic : ---Traditional balsamic:----
– IC1 looks like the signal of Traditional balsamic vinegar
Selection contains signal of HMF* HMF , acetic acid acid, formic acid, glucose and fructose * HMF : Hydroxymethylfurfural
77
Discrimination of vinegar by EWZS/ICA (3 ICs model) Ba
1.2
1.1 Ba
Ba
1 Ba
I IC2
0.9
Ba
Ba Ba
Ba Ba
0.8
Ba Ba
0.7
0.6
TB TB TB TB TB TB TB TB TB TB
Ba TB TB TB
TB
0.5 -1.1
-1
-0.9
Ba
Ba
TB
-0.8
-0.7 -0.6 IC1
-0.5
-0.4
-0.3
-0.2
All samples correctly classified by leave-1-out cross-validation (the 3 LVs are taken into account) 78
1. Food authentication ? 2. Why 1H NMR spectroscopy ? 3. General methodology 4. Acquiring spectra 4.1. Sample preparation 4.2. Measurements
5. Data Treatment 5.1. Classification method: ICA vs. PCA 5.2. Pre-treatment of the spectra
Example of orange juice
5.3. Variable selection : CLV / EWZS
6. Results on other matrices: yoghurts and vinegars 7. Conclusions www.eurofins.com
Conclusions
Chemometric tools applied to 1H NMR spectra:
Pretreatments (warping and log transformation) give a significative p in the results improvement
Variable selection improved the results and gives meaning to the results ICA has proved to be more interpretable for analysis of NMR spectra and should be useful for other signals
80
Conclusions Potential of 1H NMR spectroscopy in food authentication : Very good results obtained on all 3 matrices :
Orange juice adulteration Quantification in yoghourt Discrimination of vinegars Sample Sa p e preparation p epa at o and a d acquisition acqu s t o procedure p ocedu e are a e easy However some technical difficulties to industrialise this analysis :
High cost of investment Signal treatment needs critical analysis: visual analysis for new type of fraud optimisation of the chemometric methods for each kind of fraud. fraud, fraud 81
Conclusions
Future works
Develop automated procedures (software) for the most important frauds
Investigate other liquid or freeze-dried products Transfer these chemometrics methods to other types of measurements (eventually online)
82
Acknowledgment
For evaluating this work: All the jury members
For their help, advice and time: My colleagues at AgroParisTech and Eurofins
For financial support: Eurofins French Ministry of Research support of the convention CIFRE n° 169-2005
F being For b i here: h Friends, family and colleagues
83
84